Generating complex relationships in multi-table data

giovanni.circo · June 21, 2024, 4:31pm

The multi-table synthesizer does a good job out-of-the-box creating simple flat tables that are linked. However, most of our use cases involve complex many-to-one relationships. For example: A claims header table has a single claim id, but links to a diagnosis table that has between 1 to 20 diagnoses (1 per row).

Maintaining the proper cardinality seems difficult without doing some post-processing after generation.

I think this problem is related to a similar feature request here. I am not sure if you have other suggestions.

neha · June 21, 2024, 4:55pm

Hi @giovanni.circo, thanks for starting this topic. I’d love to understand a bit more about the scenario you’re mentioning so we don’t miss anything.

Maintaining the proper cardinality seems difficult without doing some post-processing after generation.

Is this something you have already observed in the synthetic data you’re creating?

Our multi-table synthesizers are designed to learn the min/max cardinality as well as the most and least frequent cardinalities. In your example, a claim (parent table) may be connected to many diagnoses (child table). The synthesizer should learn:

Each claim must be connected to at least 1 diagnosis (min cardinality))
Each claim must be connected to at most 20 diagnoses (max cardinality)
The typical # of diagnoses for each claim (eg. 50% of claims are connected 1 diagnosis each, 10% have 2 diagnoses, etc,)

The synthetic data should then follow these patterns. Of course, I am assuming that the claims table has 1 row per claim and the diagnosis table has 1 row per diagnosis (linked to a claim).

Is there something I’m missing? Or did you try it out and observe issues with the synthetic data? Perhaps an example may help.

Topic		Replies	Views
Tips for maintaining many-to-one relationships across multiple tables Synthetic Data Creation multi-table	1	56	January 26, 2026
Maintaining Cardinality Between Tables Evaluation and Benchmarking quality , multi-table	3	56	August 27, 2024
[Resolved] Cross-table constraints Synthetic Data Creation constraints , multi-table	3	57	January 26, 2026
Multi-table models should support sequential data Synthetic Data Creation multi-table , sequential , feature-request	0	31	June 6, 2024
Fix combinations between tables Synthetic Data Creation feature-request	0	22	January 26, 2026

Generating complex relationships in multi-table data

Related topics