Which software are you using? (SDV Community or SDV Enterprise?):sdv community
Software Details (What is your SDV version? Python version?):sdv - latest , python 3.12
Description
Hi team,
I have wrapped the SDV Python library as an API. When I call this API, I encounter the following error for some database tables:
File \“/usr/app/.venv/lib/python3.12/site-packages/src/api/service/sdv_service.py\”, line 97, in generate_synthetic_data\n model.fit(real_data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/single_table/base.py\”, line 698, in fit\n processed_data = self.preprocess(data)\n ^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/single_table/base.py\”, line 634, in preprocess\n preprocess_data = self._preprocess(data)\n ^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/single_table/base.py\”, line 437, in _preprocess\n self._data_processor.fit(data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/data_processing/data_processor.py\”, line 878, in fit\n self._fit_hyper_transformer(data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/data_processing/data_processor.py\”, line 812, in _fit_hyper_transformer\n self._hyper_transformer.fit(data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/hyper_transformer.py\”, line 800, in fit\n data = self._fit_field_transformer(data, field, self.field_transformers[field])\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/hyper_transformer.py\”, line 726, in _fit_field_transformer\n data = transformer.transform(data)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/base.py\”, line 57, in wrapper\n return function(self, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/base.py\”, line 424, in transform\n transformed_data = self._transform(columns_data)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/categorical.py\”, line 190, in _transform\n return data_with_none.map(map_labels).astype(float)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/pandas/core/series.py\”, line 4675, in map\n new_values = self._map_values(func, na_action=na_action)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/pandas/core/base.py\”, line 1022, in _map_values\n return algorithms.map_array(arr, mapper, na_action=na_action)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/pandas/core/algorithms.py\”, line 1710, in map_array\n return lib.map_infer(values, mapper)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“pandas/_libs/lib.pyx\”, line 3071, in pandas._libs.lib.map_infer\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/categorical.py\”, line 188, in map_labels\n return np.random.uniform(self.intervals[label][0], self.intervals[label][1])\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“numpy/random/mtrand.pyx\”, line 1179, in numpy.random.mtrand.RandomState.uniform\nOverflowError: Range exceeds valid bounds\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):
The full traceback is attached above.
What’s puzzling:
The same code and dependency versions work perfectly in my local environment, both as an API and as a direct Python script.
In the QA environment, the error occurs for certain tables, even though the versions of depedencies are matched between local and QA.
What I’ve tried:
Ensured all environments use the same dependency versions.
Tested with the same tables and data locally and in QA.
The error only appears in QA, never locally.
Questions:
What could cause this OverflowError in SDV?
Are there known issues with environment-specific type inference (e.g., pandas or SQLAlchemy handling of PostgreSQL NUMERIC columns)?
What else should I check to resolve this discrepancy between local and QA?
Any insights or suggestions would be greatly appreciated!
Thank you.
.