Hello,
I’m trying to apply below constraint for 2 datetime columns:
subtr_close_date_ineq_constr = {
'constraint_class': 'Inequality',
'table_name': 'SubtrajectData',
'constraint_parameters': {
'low_column_name': 'openingsdatum', # opening_date
'high_column_name': 'afsluitdatum', # close_date
'strict_boundaries': False
}
}
Some of the values in “afsluitdatum” column are missing (NA).
Example of those 2 columns:
openingsdatum afsluitdatum
0 2022-08-30 2022-09-09
1 2020-07-28 2021-02-16
2 2022-03-05 2022-04-26
3 2020-06-16 2020-06-16
4 2021-06-05 2021-06-07
5 2020-07-26 2020-08-10
6 2023-01-24 <NA>
7 2021-07-28 2021-08-09
8 2020-05-13 2020-07-13
9 2021-09-28 <NA>
10 2020-01-27 2021-02-21
11 2022-12-19 2023-01-16
12 2019-07-24 2019-10-09
13 2020-06-27 <NA>
14 2020-11-04 2021-01-15
15 2022-06-08 2022-06-21
16 2022-11-28 2022-11-28
17 2019-10-20 2022-02-14
18 2022-04-20 2022-05-05
19 2022-12-24 2023-01-23
Metadata related part:
"columns": {
"openingsdatum": {
"sdtype": "datetime",
"datetime_format": "%Y-%m-%d"
},
"afsluitdatum": {
"sdtype": "datetime",
"datetime_format": "%Y-%m-%d"
},
As result I get error:
Step 2: Train synthesizer ...
Traceback (most recent call last):
File "C:\Users\anna.popovychenko\project\sdv_trial\fit.py", line 159, in <module>
synthesizer.fit(real_data)
File "C:\Users\anna.popovychenko\.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\multi_table\base.py", line 364, in fit
processed_data = self.preprocess(data)
File "packaging\\sdv_enterprise\\sdv\\multi_table\\hsa\\hsa.pyx", line 75, in sdv_enterprise.sdv.multi_table.hsa.hsa.expirable.wrapper
File "packaging\\sdv_enterprise\\sdv\\multi_table\\hsa\\hsa.pyx", line 161, in sdv_enterprise.sdv.multi_table.hsa.hsa.HSASynthesizer.preprocess
File "C:\Users\anna.popovychenko\.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\multi_table\base.py", line 308, in preprocess
self.validate(data)
File "C:\Users\anna.popovychenko\.pyenv\pyenv-win\versions\3.9.13\lib\site-packages\sdv\multi_table\base.py", line 222, in validate
raise InvalidDataError(errors)
sdv.errors.InvalidDataError: The provided data does not match the metadata:
boolean value of NA is ambiguous
But in docs it says that constraint ignores missing values ( Inequality - Synthetic Data Vault (sdv.dev))
Could you help with that? Or it is expected behavior?
Thanks