OverflowError: Range exceeds valid bounds

Which software are you using? (SDV Community or SDV Enterprise?):sdv community

Software Details (What is your SDV version? Python version?):sdv - latest , python 3.12

Description

Hi team,

I have wrapped the SDV Python library as an API. When I call this API, I encounter the following error for some database tables:

File \“/usr/app/.venv/lib/python3.12/site-packages/src/api/service/sdv_service.py\”, line 97, in generate_synthetic_data\n model.fit(real_data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/single_table/base.py\”, line 698, in fit\n processed_data = self.preprocess(data)\n ^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/single_table/base.py\”, line 634, in preprocess\n preprocess_data = self._preprocess(data)\n ^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/single_table/base.py\”, line 437, in _preprocess\n self._data_processor.fit(data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/data_processing/data_processor.py\”, line 878, in fit\n self._fit_hyper_transformer(data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/data_processing/data_processor.py\”, line 812, in _fit_hyper_transformer\n self._hyper_transformer.fit(data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/hyper_transformer.py\”, line 800, in fit\n data = self._fit_field_transformer(data, field, self.field_transformers[field])\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/hyper_transformer.py\”, line 726, in _fit_field_transformer\n data = transformer.transform(data)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/base.py\”, line 57, in wrapper\n return function(self, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/base.py\”, line 424, in transform\n transformed_data = self._transform(columns_data)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/categorical.py\”, line 190, in _transform\n return data_with_none.map(map_labels).astype(float)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/pandas/core/series.py\”, line 4675, in map\n new_values = self._map_values(func, na_action=na_action)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/pandas/core/base.py\”, line 1022, in _map_values\n return algorithms.map_array(arr, mapper, na_action=na_action)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/pandas/core/algorithms.py\”, line 1710, in map_array\n return lib.map_infer(values, mapper)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“pandas/_libs/lib.pyx\”, line 3071, in pandas._libs.lib.map_infer\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/categorical.py\”, line 188, in map_labels\n return np.random.uniform(self.intervals[label][0], self.intervals[label][1])\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“numpy/random/mtrand.pyx\”, line 1179, in numpy.random.mtrand.RandomState.uniform\nOverflowError: Range exceeds valid bounds\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):
The full traceback is attached above.

What’s puzzling:

The same code and dependency versions work perfectly in my local environment, both as an API and as a direct Python script.
In the QA environment, the error occurs for certain tables, even though the versions of depedencies are matched between local and QA.
What I’ve tried:

Ensured all environments use the same dependency versions.
Tested with the same tables and data locally and in QA.
The error only appears in QA, never locally.
Questions:

What could cause this OverflowError in SDV?
Are there known issues with environment-specific type inference (e.g., pandas or SQLAlchemy handling of PostgreSQL NUMERIC columns)?
What else should I check to resolve this discrepancy between local and QA?
Any insights or suggestions would be greatly appreciated!

Thank you.

.

Hi @Mariam, nice to meet you.

I have not come across such an error before. I wonder whether the data in your QA environment has slightly different properties than your local environment. For example, maybe the data that is being read into Python is somehow a different type? Are you reading from a database using SQLAlchemy?

SDV will make the correct data-type inferences if you use the SDV-provided AI Connectors bundle when importing data from a database. If you’re reading in the data yourself, it might be best to check the data types – we recommend every column should be represented as either object, int64, or float64. You can use <dataframe_name>.dtypes to print out a list of your data types.