@giovanni.circo could you please add a short description of the question?
Thanks!
The question is in reference to the fitted MultiTableSynthesizer object. Specifically, if I fit a model to a dataset that contains observed PII\PHI data, does the fitted model contain identifiable references to the original PII\PHI data? What we would like to do is fit a model in a higher environment where PHI data is available, fit a model, and then pass this model to other developers in a lower environment to use for their own synthesis.
Hi @giovanni.circo if you have PII/PHI data, I would recommend you carefully inspect the metadata and ensure that all the affected columns are marked as pii: True. All SDV synthesizers use the metadata as the ground source-of-truth to determine what is PII.
Provided that you mark these columns are PII in the metadata, then the software will ensure that no identifying information will be learned or saved in the model. Meaning that if someone has access to the fitted synthesizer (pkl file), they should not be able to recover any of the original, private values.
Note that SDV Enterprise may learn some non-identifying information in order to ensure that the data is realistic. For example, it may learn the format of phone numbers, the total number of unique values, etc. But you can turn this off if you want absolutely 0 information to be learned. Let us know if you prefer this and I can walk you setting it up.