[Resolved] New way to add constraints

Hey I’d like to ask a little quick question about the new way to add constraints.

I was used to add constraints using dictionnary and saw the note saying that I have to use sdv.cag (I do now) but some of the constraints are no longer supported : Positive, Negative …

I usually work with Jupyter Notebook and used to replay multiple blocks. When adding and a constraint for the first time it works great but when I add the same constraint again, it says that some columns are missing.

For example:
I have a “patient” table with “height” and “wieght”. I wanted to add the Inequality constraint. So I followed the tutorial :

from sdv.cag import Inequality
my_constraint = Inequality(table_name="patient", low_column_name="weight", high_column_name="height", strict_boundaries=True)

synthesizer.add_constraints([my_constraint])

When I play it a second time I have this error :

Table ‘patient’ is missing columns ‘weight’, ‘height’.

The question is : Am I doing something wrong ? There is no room for duplication then ?

Cheers,
Charles

Hi @epicvu, thanks for reaching out.

some of the constraints are no longer supported : Positive, Negative …

Yes, Positive, Negative, ScalarRange, and ScalarInequality constraints are no longer supported in the CAG framework because the same functionality can be achieved using other SDV features. For more information about the recommended usage, you can click here to learn more.

If there is anything specific you were hoping to achieve with these particular constraints, please let us know. We’d be happy to provide you the recommended approach. (I would recommend starting a new thread so as to keep it separate from your other Q.)

Regarding the following Q:

When adding and a constraint for the first time it works great but when I add the same constraint again, it says that some columns are missing.

Thanks for the code example. I was able to replicate this error on multi-table synthesizers. I will check in with our engineering team and get back to you!

There is no room for duplication then ?

In the meantime, I’d love to understand the use case more. I understand that you may be re-running a cell in the notebook, and that this error can be a usability concern when doing so. However, adding the same constraint multiple times isn’t really recommended because (a) it will ultimately have the same effect on the synthetic data as just adding the constraint once, and (b) it might actually hurt the performance of your synthesizer since it’s now trying to adhere to multiple constraints.

So I’m curious what you mean by “room for duplication”? Would you be able to expand on that?

Hello!
It’s not really duplication I wanted to know if adding the same constraint twice is considered to be 2 unique constraints or should I have some sort of checkpoint before adding a constraint and have a get_constraint and compare the constraint I want to add and the list from get_constraints?

Hi @epicvu, I have confirmed that what you are seeing is, indeed, a bug. Even in the case of duplication, the synthesizer shouldn’t really crash (it may just get slower). The root case has been filed and we are hoping to have the fix up in an upcoming release.

That being said, I would still recommend having some sort of checkpoint before adding the constraint, as it may help with performance and avoiding unnecessary computations.

Other notes:

  • I think you’re technically adding the logic as 2 separate constraints, as each time you re-run your Jupyter notebook cell, it creates a new constraint object and adds it to the synthesizer. The synthesizer will separately consider every constraint object you give it – even if the logic is the same.
  • Adding the same constraint object would be the case of creating the constraint only once, and then adding it multiple times. This is something SDV can proactively check for and warn you against – I will file this feedback.

Hi @neha!
Thank you for the answer I’ll make sure to make some sort of checkpoint before adding the constraints.

1 Like

Confirming that this bug was fixed in Version 0.31.0. I’m marking this thread as resolved!