Including a code table for reference only

This question was originally filed here by @rizwan. I’m separating it out into a new thread so that we can discuss it specifically.

Which software are you using? SDV Enterprise

Software Details SDV 0.44, Python 3.13

Description

How to include code table for reference only and define foreign keys? For example, Address table has column Country which is a foreign key from Countries table. In Countries table, there are two columns CountryID, CountryName. How to not synthesize the Countries table but only include as a reference to Address table and keep the CountryID intact in Countries table?

Follow-up Question

I see Constraint API has ReferenceTable constraint and it is currently in Beta. When it is expected to be release? Is there a workaround?

Hi @rizwan,

I believe scenario you are describing perfectly fits the use case for the ReferenceTable constraint. The Countries table would be a reference table, which will tell SDV to re-use the same information as the real data (i.e. do not synthesize any new countries). You would still be able to create synthetic data in all of the other tables; they will be able to reference the Countries table (which would have the same country information as the original).

Code Snippet: Using the ReferenceTable constraint

Code-wise, it would look something like this:

from sdv.cag import ReferenceTable

country_reference_constraint = ReferenceTable(reference_table_names=['Countries'])

synthesizer = HSASynthesizer(metadata)
synthesizer.add_constraints([country_reference_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

Beta Designation

I assume you are referring to this page where there is a yellow box?

The “in Beta” box was leftover in the docs by accident. I can confirm that the ReferenceTable constraint is in not Beta anymore and is ready-to-use as part of the CAG bundle. We have removed the box from the documentation – thanks for letting us know!

Functionality is available in the CAG bundle

Note that in order to use the ReferenceTable constraint, you would need access to the CAG bundle, which is an optional add-on to SDV Enterprise. Is this of interest to you?

Thank you for the reply and the sample code. I see that the CAG is an optional add-on that’s why it was giving an error.

No problem! For future reference, you can use the SDV Installer list-packages command to see what you currently have access to.