[Resolved] HSASynthesizer FutureWarning

yuntien.lee · May 8, 2024, 3:32am

Hi Support,
Using HMASynthesizer our data passed through but the error messages showed up using HSASynthesizer. Can you help? Thanks.

C:\Users\YunTien.Lee\Anaconda3\envs\python312\Lib\site-packages\sdv\multi_table\base.py:375: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df[“col”][row_indexer] = value

Use df.loc[row_indexer, "col"] = values instead, to perform the assignment in a single step and ensure this keeps updating the original df.

See the caveats in the documentation: Indexing and selecting data — pandas 3.0.0 documentation

augmented_data = self._augment_tables(processed_data)
C:\Users\YunTien.Lee\Anaconda3\envs\python312\Lib\site-packages\sdv\multi_table\base.py:375: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df[“col”][row_indexer] = value

Use df.loc[row_indexer, "col"] = values instead, to perform the assignment in a single step and ensure this keeps updating the original df.

See the caveats in the documentation: Indexing and selecting data — pandas 3.0.0 documentation

augmented_data = self._augment_tables(processed_data)
C:\Users\YunTien.Lee\Anaconda3\envs\python312\Lib\site-packages\sdv\multi_table\base.py:375: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df[“col”][row_indexer] = value

Use df.loc[row_indexer, "col"] = values instead, to perform the assignment in a single step and ensure this keeps updating the original df.

See the caveats in the documentation: Indexing and selecting data — pandas 3.0.0 documentation

augmented_data = self._augment_tables(processed_data)
C:\Users\YunTien.Lee\Anaconda3\envs\python312\Lib\site-packages\sdv\multi_table\base.py:375: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df[“col”][row_indexer] = value

Use df.loc[row_indexer, "col"] = values instead, to perform the assignment in a single step and ensure this keeps updating the original df.

See the caveats in the documentation: Indexing and selecting data — pandas 3.0.0 documentation

augmented_data = self._augment_tables(processed_data)

neha · May 8, 2024, 3:10pm

Hi @yuntien.lee, appreciate you providing the full details.

Based on the output that I am seeing printed, it appears that this message is a warning only, and not an error. There is a key difference here, because if it’s a warning, you should be able to continue using the SDV without any issue. So I would recommend you continue to sample synthetic data from the HSASynthesizer.

What is this warning tell us?

As you mention, pandas is warning that some internal functionality we are using in SDV will not be compatible with future versions of pandas. Do note that pandas 3.0 is still in development and has not yet been released. So currently, the SDV code is ok.

Rest assured hat I have surfaced this to the team and we will make sure to update SDV by the time that pandas 3.0 is out.

Let us know if you have any questions.

yuntien.lee · May 8, 2024, 3:31pm

Great. Thanks.

neha · May 13, 2024, 5:24pm

Hi @yuntien.lee, good news is that the team is actively looking into this now and hope to have this fixed soon.

To help us be confident that this will fix what you’re seeing, could you be able to share some code that may help us to replicate the issue? For example, are you using HSA with any customizations, constraints, etc.

And also, could you confirm which version of SDV you are using? (Both public and enterprise)

import sdv

print('SDV Enterprise version', sdv.version.enterprise)
print('SDV Public version', sdv.version.public)

yuntien.lee · May 13, 2024, 6:14pm

metadata

metadata
{
“tables”: {
“mem”: {
“columns”: {
“Member_ID”: {
“sdtype”: “id”
},
“DOB”: {
“sdtype”: “datetime”
},
“Gender”: {
“sdtype”: “categorical”
},
“Exposure_Months”: {
“sdtype”: “numerical”
}
},
“primary_key”: “Member_ID”
},
“med”: {
“columns”: {
“Member_ID”: {
“sdtype”: “id”
},
“ClaimID”: {
“sdtype”: “unknown”,
“pii”: true
},
“FromDate”: {
“sdtype”: “datetime”
},
“ToDate”: {
“sdtype”: “datetime”
},
“PaidDate”: {
“sdtype”: “datetime”
},
“ICDDiag01”: {
“sdtype”: “categorical”
},
“ICDDiag02”: {
“sdtype”: “categorical”
},
“ICDDiag03”: {
“sdtype”: “categorical”
},
“ICDDiag04”: {
“sdtype”: “categorical”
},
“ICDDiag05”: {
“sdtype”: “categorical”
},
“ICDDiag06”: {
“sdtype”: “categorical”
},
“ICDDiag07”: {
“sdtype”: “categorical”
},
“ICDDiag08”: {
“sdtype”: “categorical”
},
“ICDDiag09”: {
“sdtype”: “categorical”
},
“ICDDiag10”: {
“sdtype”: “categorical”
},
“ICDDiag11”: {
“sdtype”: “categorical”
},
“ICDDiag12”: {
“sdtype”: “categorical”
},
“ICDDiag13”: {
“sdtype”: “categorical”
},
“ICDDiag14”: {
“sdtype”: “categorical”
},
“ICDDiag15”: {
“sdtype”: “categorical”
},
“ICDDiag16”: {
“sdtype”: “categorical”
},
“ICDDiag17”: {
“sdtype”: “categorical”
},
“ICDDiag18”: {
“sdtype”: “categorical”
},
“ICDDiag19”: {
“sdtype”: “categorical”
},
“ICDDiag20”: {
“sdtype”: “categorical”
},
“ICDDiag21”: {
“sdtype”: “categorical”
},
“ICDDiag22”: {
“sdtype”: “categorical”
},
“ICDDiag23”: {
“sdtype”: “categorical”
},
“ICDDiag24”: {
“sdtype”: “categorical”
},
“ICDDiag25”: {
“sdtype”: “categorical”
},
“ICDDiag26”: {
“sdtype”: “categorical”
},
“ICDDiag27”: {
“sdtype”: “categorical”
},
“ICDDiag28”: {
“sdtype”: “categorical”
},
“ICDDiag29”: {
“sdtype”: “categorical”
},
“ICDDiag30”: {
“sdtype”: “categorical”
},
“ProcCode”: {
“sdtype”: “categorical”
},
“POS”: {
“sdtype”: “categorical”
},
“MR_Allowed”: {
“sdtype”: “numerical”
},
“MR_Paid”: {
“sdtype”: “numerical”
}
}
},
“pharm”: {
“columns”: {
“Member_ID”: {
“sdtype”: “id”
},
“NDC”: {
“sdtype”: “categorical”
},
“ClaimID”: {
“sdtype”: “unknown”,
“pii”: true
},
“FillDate”: {
“sdtype”: “datetime”
},
“ProviderID”: {
“sdtype”: “categorical”
},
“MR_Allowed”: {
“sdtype”: “numerical”
},
“MR_Paid”: {
“sdtype”: “numerical”
},
“Days_Supplied”: {
“sdtype”: “numerical”
},
“Qty_Dispensed”: {
“sdtype”: “numerical”
}
}
}
},
“relationships”: [
{
“parent_table_name”: “mem”,
“child_table_name”: “pharm”,
“parent_primary_key”: “Member_ID”,
“child_foreign_key”: “Member_ID”
},
{
“parent_table_name”: “mem”,
“child_table_name”: “med”,
“parent_primary_key”: “Member_ID”,
“child_foreign_key”: “Member_ID”
}
],
“METADATA_SPEC_VERSION”: “MULTI_TABLE_V1”
}

code snippet
metadata = MultiTableMetadata()
metadata.detect_from_dataframes(data = all_data)
all_data = poc.drop_unknown_references(metadata, all_data)

synthesizer = HSASynthesizer(metadata)

for table_name in all_data.keys():
synthesizer.set_table_parameters(
table_name=table_name,
table_synthesizer=‘GaussianCopulaSynthesizer’,
table_parameters={
#‘enforce_min_max_values’: True,
#‘default_distribution’: ‘truncnorm’})
‘default_distribution’: ‘norm’})

synthesizer.fit(all_data)
f = open(‘hsa_syn.pickle’, ‘wb’)
pickle.dump(synthesizer, f)
f.close()

yuntien.lee · May 13, 2024, 6:15pm

SDV Enterprise version 0.12.1
SDV Public version 1.12.1

neha · May 14, 2024, 8:35pm

Thanks @yuntien.lee. Unfortunately, the eng team is telling me that they are unable to trigger that exact warning message but we will keep trying.

One thing might be that you have an older version of pandas installed. This library is a dependency for using SDV so may be worth checking which version you’re on. Could you let us know what’s printed out if you run the code below?

import pandas as pd

print('pandas version:', pd.__version__)

yuntien.lee · May 15, 2024, 3:19pm

pandas version: 2.2.1

neha · May 24, 2024, 7:17pm

Hi @yuntien.lee, appreciate all the info you were able to share.

Our team was able to replicate this and the fix is now available in the latest SDV Enterprise version (0.13.0), released earlier this week on May 21. If you upgrade your SDV version, the warning messages should now be fixed. Try it out and let us know if you’re still having problems.

Resource:

Topic		Replies	Views
[Resolved] Error when fitting HSASynthesizer Synthetic Data Creation bug	1	18	May 2, 2024
SDV Enterprise Version 0.12.0 Release Notes	0	3	April 16, 2024
[Resolved] Metadata detect Synthetic Data Creation metadata	29	96	April 29, 2024
SDV Enterprise Version 0.3.0 Release Notes	0	3	July 18, 2023
Distributions of features in synthetic data Evaluation and Benchmarking quality	17	66	February 4, 2025

[Resolved] HSASynthesizer FutureWarning

What is this warning tell us?

Related topics