Maintaining Cardinality Between Tables

giovanni.circo · August 23, 2024, 7:15pm

We have a database of claims, procedures, and diagnoses, along with a header table. The header table uniquely defines records by MBR_ID, CLM_ID, and CLM_HASH_KEY. The expected output is that every claim in the clm_details table will have multiple CLM_HASH_KEY values per CLM_ID.

We define the metadata such that claims, procedures, and diagnoses are children and linked to the header, based on the CLM_HASH_KEY:

Here is the (expected) distribution of values (number of unique hash keys, per claim id):

n
1      1584
2       531
3       139
4       121
5        70
6        57
7        57
8        24
9        12
10       22

and here is what the model generates:

n
1    9762

So clearly something is wrong with the model such that the relationship is not being captured properly. I have done a few tests and am not certain the issue here. I’m fairly certain it is probably a way I am specifying the metadata. Any advice on how to diagnose this would be appreciated.

Thanks very much sdv team for the support so far!

andrew · August 27, 2024, 12:50am

Hello @giovanni.circo ,

I believe that since CLM_HASH_KEY is being labeled as an id but not a foreign key, the column is just being generated as a random id. To confirm, after fitting your synthesizer, you can run

synthesizer.get_transformers('clm_details')['CLM_HASH_KEY']

If you see something like

AnonymizedFaker(function_name='bothify', function_kwargs={'text': 'sdv-id-??????'}

then it is just generating ids independently of the other tables for that column.

A work around might be to add CLM_HASH_KEY as a second foreign key for clm_details, clm_diag and clm_proc. Please let me know if that works!

giovanni.circo · August 27, 2024, 8:41pm

Hi @andrew

Yes, going through some tests it looks like the primary issue is that the newly synthesized header table doesn’t maintain the correct cardinality (e.g. 1 MBR_ID should have 2-5 CLM_IDs, which should have 5-6 CLM_HASH_KEYS). However it does appear that splitting the tables up and adding HASH_KEYS as a new foreign key does work (albeit with a bit of extra work).

Thanks.

andrew · August 27, 2024, 9:13pm

Hi @giovanni.circo,

Glad you were able to get it to work! For more reference, the reason I suggested that is because the model will not learn cardinality for columns that are labeled with an sdtype of id unless they are a primary or foreign keys. This is because by default, id columns are dropped before modeling.

Can I ask what extra work you needed to do?
Thanks!

Topic		Replies	Views
Generating complex relationships in multi-table data Synthetic Data Creation multi-table	1	28	June 21, 2024
Tips for maintaining many-to-one relationships across multiple tables Synthetic Data Creation multi-table	1	42	January 26, 2026
[Resolved] Metadata detect Synthetic Data Creation metadata	29	106	April 29, 2024
Specify Composite Keys in the Metadata Inside the Vault data-integration , metadata , multi-table	0	22	March 30, 2026
SDV Enterprise 0.41 (Feb 18, 2026) Release Notes	0	19	February 18, 2026

Maintaining Cardinality Between Tables

Related topics