Starting from the latest SDV Enterprise release, you can specify composite keys directly in your SDV metadata. Then, you can supply the metadata (containing composite keys) to a synthesizer, and use it to create realistic and valid synthetic composite keys. Before this change, composite keys had to be specified using a constraint. This is no longer needed.
This change makes it easier to integrate a complex schema that has composite keys. It also represents an important separation between the types of features that belong in the metadata versus constraints.
The complexities of composite keys are all contained in the metadata
A composite key is when multiple columns together form a primary or foreign key reference. Within each individual column, the values are allowed to repeat and represent different types of data. But together, the columns work to uniquely identify a row (composite primary key) or reference a row in another table (composite foreign key).
Composite keys can add complexity to your dataset. Consider an example in a medical setting. The Patient Visits table has a composite primary key of Patient ID and Date. Together, these two columns uniquely identify each visit. The individual Date column contains statistical information while the individual Patient ID column is itself a foreign key into another Patient table.
You can now specify composite primary/foreign key connections directly within the SDV metadata. For the schema, we described, that would look something like this:
{
"tables": {
"Patient visits": {
"primary_key": ["Patient ID", "Date"],
"columns": {
"Patient ID": { "sdtype": "id", "regex_format": "P-[0-9]{5}" },
"Date": { "sdtype": "datetime", "datetime_format": "%Y-%m-%d" },
...
}
},
"Patients": {
"primary_key": "Patient ID",
"columns": {
"Patient ID": { "sdtype": "id", "regex_format": "P-[0-9]{5}" },
...
}
}
},
"relationships": [{
"parent_table_name": "Patients",
"parent_primary_key": "Patient ID",
"child_table_name": "Patient visits",
"child_foreign_key": "Patient ID"
}]
}
Because all the composite keys and relationships are specified in the metadata, you can easily visualize the dataset (using our metadata visualization tool).
The rest of the modeling, sampling, and evaluation process is now simpler
Now that the metadata contains composite keys, the rest of the synthetic data creation process can proceed more simply. SDV Enterprise users can pass in the metadata into any synthesizer for modeling. No need to add any extra constraints or tune any other settings.
from sdv.multi_table import HSASynthesizer
synthesizer = HSASynthesizer(composite_key_metadata)
synthesizer.fit(data)
synthetic_data = synthesizer.sample()
We’ve also updated the diagnostic and quality report to factor in composite keys. Now, when you run these reports with the metadata, they will apply the correct metrics based on the column type and whether they are involved in a composite key. For example, when this diagnostic report checks for primary key uniqueness, it will now check that the combination of Patient ID and Date is unique, because it’s a composite primary key.
from sdv.evaluation.multi_table import run_diagnostic
diagnostic_report = run_diagnostic(
real_data=data,
synthetic_data=synthetic_data,
metadata=composite_key_metadata
)
SDV guarantees a score of 100% for the diagnostic report meaning:
- Primary composite keys will always be unique (considering all the columns)
- Foreign composite keys will always reference a primary composite key (considering all the columns, no broken links!)
- Any column in the composite key will contain realistic values (Regex formats for IDs, min/max bounds for datetimes, etc.)
Schema descriptions belong in the metadata, not as constraints
Previously, composite keys could only be added into your synthesizer as constraints (if you had purchased the CAG bundle). Now, all SDV Enterprise users can specify and model composite keys using the metadata.
We made this change after careful deliberation about which features should be specified in SDV’s metadata versus which ones belong to constraints. We decided to draw the line between structural, schema representation versus business logic.
-
SDV Metadata contains a description of your database schema, including the tables, columns, types of data in each column, and information about how the tables are related. If a typical relational database offers a particular feature, then it should probably be described in SDV Metadata. This is why primary keys, foreign keys, and composite keys all belong in the metadata.
-
Constraints, on the other hand, describes business logic that is layered on top of the enforcement that a typical relational database can provide. This means that the logic is maintained independently, typically by the application that modifies the database. This is why something like an Inequality (where column A is always greater than column B) is a constraint; this rule is not something that a database typically enforces for you.
Based on these definitions, a composite key is something that belongs in SDV Metadata – not constraints. We’ve also identified two additional features that belong in the SDV Metadata instead of constraints: bridge tables, and primary to primary key connections. These also used to be constraints, but it is now possible to add them to your metadata.
Our goal: If a standard relational database can specify it, then SDV should be able to model it!
Our goal is to make SDV the most comprehensive synthetic data platform for creating synthetic multi-table data. We aim to be compatible with any feature that a standard relational database (like SQL) can support. This includes:
- Primary key (uniqueness)
- Foreign keys
- Composite keys
- Referential integrity
- Null values
- And any combination of the above!
With the addition of the composite keys feature, SDV is able to handle a large majority of complex, interconnected database schemas such as the one shown below.
Is there any relational database feature that is missing from SDV that you would like to see included? Comment below!

