Release Date: January 16, 2024
This release includes some quality improves, bug fixes and sets up some foundations for future optimizations.
Improved Data Cardinality. With our new preprocessing methods, you’ll now see improved data quality for parent/child cardinality in multi-table datasets. For example, if 40% of the real parents rows have 1 child while 60% have 2+ children, then the same will be true in the synthetic data.
Specify Column Relationships. Do you have multiple columns that encode the same concept? Now, you can specify these column relationships directly in the metadata. In this release, you can annotate when multiple columns together represent a single physical mailing address. More concepts are coming soon!
Foundational Codebase Changes. Behind-the-scenes, we’ve been making a few changes to the way our codebase is compiled. There are no changes to the Python API — this is just to help us make optimizations in the future. Please feel free to retry our existing features and let us know if you notice any issues!
Additional updates
- You can now opt to create random, anonymized data from any of the options in the Faker library. Even complex concepts such as currency.
- The Inequality constraint now works correctly for datetime columns, especially with complex date formats.
- The CTGANSynthesizer now works with the FixedCombinations constraint.