Hello,
I get bellow error during hsasynthesizer.fit():
Traceback (most recent call last):
File “C:\Users\anna.popovychenko\project\sdv_trial\gen.py”, line 74, in
synthesizer.fit(real_data)
File “C:\Users\anna.popovychenko.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\multi_table\base.py”, line 364, in fit
processed_data = self.preprocess(data)
File “packaging\sdv_enterprise\sdv\multi_table\hsa\hsa.pyx”, line 75, in sdv_enterprise.sdv.multi_table.hsa.hsa.expirable.wrapper
File “packaging\sdv_enterprise\sdv\multi_table\hsa\hsa.pyx”, line 161, in sdv_enterprise.sdv.multi_table.hsa.hsa.HSASynthesizer.preprocess
File “C:\Users\anna.popovychenko.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\multi_table\base.py”, line 319, in preprocess
self._assign_table_transformers(synthesizer, table_name, table_data)
File “C:\Users\anna.popovychenko.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\multi_table\base.py”, line 233, in _assign_table_transformers
synthesizer.auto_assign_transformers(table_data)
File “C:\Users\anna.popovychenko.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\single_table\base.py”, line 277, in auto_assign_transformers
self._data_processor.prepare_for_fitting(data)
File “C:\Users\anna.popovychenko.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\data_processing\data_processor.py”, line 721, in prepare_for_fitting
self._fit_formatters(data)
File “C:\Users\anna.popovychenko.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\data_processing\data_processor.py”, line 696, in _fit_formatters
self.formatters[column_name].learn_format(data[column_name])
File “C:\Users\anna.popovychenko.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\data_processing\numerical_formatter.py”, line 95, in learn_format
self._rounding_digits = self._learn_rounding_digits(column)
File “C:\Users\anna.popovychenko.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\data_processing\numerical_formatter.py”, line 59, in _learn_rounding_digits
roundable_data = data[~(np.isinf(data) | pd.isna(data))]
TypeError: ufunc ‘isinf’ not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ‘‘safe’’
I added debug logs and found out column name. Column type is int, no null values. What can be reason?
Hello @anna.popovychenko , Could you make sure that the column is of dtype float or int. I believe that it could be that the column contains either an element or something that is not supposed to be numerical and therefore the pandas function is not able to cast it.
I would suggest you to cast it to float or int and see if you get any errors before fitting the model, for example:
data[column_name] = data[column_name].astype(float)
or
data[column_name] = data[column_name].astype(int)
If none of those two work, please feel free to provide the pandas.dtype and sdtype of the column so we can assist you.
I would recommend removing the second line (convert_dtypes command). And perhaps loading multiple CSVs at once for convenience.
from sdv.datasets.local import load_csvs
# load multiple CSV files from within a folder
real_data = load_csvs(folder_name='my_folder/')
# the data is ready for training
# no modifications should be needed
Just an update to everyone on this issue: The team is actively looking into dtypes to expand support.
Until then, we recommend keeping the original numpy sdtypes and not trying to do any other optimization. SDV is designed to work best after simply reading your data from its original source (CSV, etc.).
Closing this issue out since it has been resolved. But please feel free to start a new topic (perhaps a feature request) for supporting additional sdtypes specifically.