Do fitted multi-table synthesizer models retain original data properties (PHI \ PII data concerns)?

ecastaner · July 2, 2024, 12:29am

@giovanni.circo could you please add a short description of the question?
Thanks!

giovanni.circo · July 11, 2024, 5:21pm

The question is in reference to the fitted MultiTableSynthesizer object. Specifically, if I fit a model to a dataset that contains observed PII\PHI data, does the fitted model contain identifiable references to the original PII\PHI data? What we would like to do is fit a model in a higher environment where PHI data is available, fit a model, and then pass this model to other developers in a lower environment to use for their own synthesis.

neha · July 15, 2024, 7:04pm

Hi @giovanni.circo if you have PII/PHI data, I would recommend you carefully inspect the metadata and ensure that all the affected columns are marked as pii: True. All SDV synthesizers use the metadata as the ground source-of-truth to determine what is PII.

Provided that you mark these columns are PII in the metadata, then the software will ensure that no identifying information will be learned or saved in the model. Meaning that if someone has access to the fitted synthesizer (pkl file), they should not be able to recover any of the original, private values.

Note that SDV Enterprise may learn some non-identifying information in order to ensure that the data is realistic. For example, it may learn the format of phone numbers, the total number of unique values, etc. But you can turn this off if you want absolutely 0 information to be learned. Let us know if you prefer this and I can walk you setting it up.

Topic		Replies	Views
SDV Enterprise Version 0.4.0 Release Notes	0	14	August 15, 2023
Distributions of features in synthetic data Evaluation and Benchmarking quality	16	153	February 4, 2025
[Resolved] Metadata detect Synthetic Data Creation metadata	27	246	April 29, 2024
Provide your segments to the SegmentSynthesizer Inside the Vault quality	0	37	February 26, 2026
SDV Enterprise Version 0.8.0 Release Notes	0	17	December 19, 2023

Do fitted multi-table synthesizer models retain original data properties (PHI \ PII data concerns)?

Related topics