OverflowError: Range exceeds valid bounds

Which software are you using? (SDV Community or SDV Enterprise?):sdv community

Software Details (What is your SDV version? Python version?):sdv - latest , python 3.12

Description

Hi team,

I have wrapped the SDV Python library as an API. When I call this API, I encounter the following error for some database tables:

File \“/usr/app/.venv/lib/python3.12/site-packages/src/api/service/sdv_service.py\”, line 97, in generate_synthetic_data\n model.fit(real_data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/single_table/base.py\”, line 698, in fit\n processed_data = self.preprocess(data)\n ^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/single_table/base.py\”, line 634, in preprocess\n preprocess_data = self._preprocess(data)\n ^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/single_table/base.py\”, line 437, in _preprocess\n self._data_processor.fit(data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/data_processing/data_processor.py\”, line 878, in fit\n self._fit_hyper_transformer(data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/sdv/data_processing/data_processor.py\”, line 812, in _fit_hyper_transformer\n self._hyper_transformer.fit(data)\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/hyper_transformer.py\”, line 800, in fit\n data = self._fit_field_transformer(data, field, self.field_transformers[field])\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/hyper_transformer.py\”, line 726, in _fit_field_transformer\n data = transformer.transform(data)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/base.py\”, line 57, in wrapper\n return function(self, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/base.py\”, line 424, in transform\n transformed_data = self._transform(columns_data)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/categorical.py\”, line 190, in _transform\n return data_with_none.map(map_labels).astype(float)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/pandas/core/series.py\”, line 4675, in map\n new_values = self._map_values(func, na_action=na_action)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/pandas/core/base.py\”, line 1022, in _map_values\n return algorithms.map_array(arr, mapper, na_action=na_action)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“/usr/app/.venv/lib/python3.12/site-packages/pandas/core/algorithms.py\”, line 1710, in map_array\n return lib.map_infer(values, mapper)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“pandas/_libs/lib.pyx\”, line 3071, in pandas._libs.lib.map_infer\n File \“/usr/app/.venv/lib/python3.12/site-packages/rdt/transformers/categorical.py\”, line 188, in map_labels\n return np.random.uniform(self.intervals[label][0], self.intervals[label][1])\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \“numpy/random/mtrand.pyx\”, line 1179, in numpy.random.mtrand.RandomState.uniform\nOverflowError: Range exceeds valid bounds\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):
The full traceback is attached above.

What’s puzzling:

The same code and dependency versions work perfectly in my local environment, both as an API and as a direct Python script.
In the QA environment, the error occurs for certain tables, even though the versions of depedencies are matched between local and QA.
What I’ve tried:

Ensured all environments use the same dependency versions.
Tested with the same tables and data locally and in QA.
The error only appears in QA, never locally.
Questions:

What could cause this OverflowError in SDV?
Are there known issues with environment-specific type inference (e.g., pandas or SQLAlchemy handling of PostgreSQL NUMERIC columns)?
What else should I check to resolve this discrepancy between local and QA?
Any insights or suggestions would be greatly appreciated!

Thank you.

.

Hi @Mariam, nice to meet you.

I have not come across such an error before. I wonder whether the data in your QA environment has slightly different properties than your local environment. For example, maybe the data that is being read into Python is somehow a different type? Are you reading from a database using SQLAlchemy?

SDV will make the correct data-type inferences if you use the SDV-provided AI Connectors bundle when importing data from a database. If you’re reading in the data yourself, it might be best to check the data types – we recommend every column should be represented as either object, int64, or float64. You can use <dataframe_name>.dtypes to print out a list of your data types.

This one is resolved , I dont know how, havent done any changes but I wonder why i got it in the first place, for a few days it was there. does it can anyway related to how we read data , i mean if something like this happens is it because of the datatype issue?
When i tried to understand, came to the point that it comes from the numpy library where they have a check for the interval range difference - does the issue is because of the null values? - I dont understand the reason but it is working now

To answer the questions you asked , yes i am using sqlalchemy to read from the database

Hi @Mariam,

SDV Community is designed to be able to handle null values, even in edge cases where a column is mostly or completely null, so I don’t think that should be the issue.

I’m glad it is resolved! If you’d still like to continue investigating:

Is it possible that your QA environment is set up slightly different than your other environments? For example, maybe the version of the database is different, you have a different driver installed for SQLAlchemy, or perhaps there are subtle difference in the database types (eg. INTEGER vs BIGINT)?

One other thing to check is SDV Metadata. Were you auto-detecting it every time (eg. once from the QA environment, and another time in the local environment)? Differences in the metadata may be related to why it was only happening in QA. I would recommend detecting the metadata just once, manually verifying it, and saving it. That way, you always have one consistent metadata to use for any environment.

If you’re able to share your metadata and the dtypes of the loaded data, I can dig into this more. (We will never ask to see your actual data! Just metadata/dtypes.)