[Resolved] Add a constraint for null values (that occur together)

Let’s use this topic to discuss this Q from @yuntien.lee:

Can you also advise how to set constraints e.g. if AdmitType is null then AdmitDate is null?

Hi @yuntien.lee I started this thread to specifically advise you on this custom constraint. To make sure we provide the right guidance, I would like to clarify one case:

You mention that if AdmitType is null then AdmitDate must also be null. Does this inverse logic also apply – if AdmitType is NOT null then AdmitDate must also be NOT null?

I am also curious if this type of logic is only present for AdmitType and AdmitDate columns or whether you have other instances of very similar logic (all values must be null together, or not at all) for different groups of columns in your dataset?

No, the inverse does not apply, ie if AdmitType is NOT null then AdmitDate can sometimes be NULL due to data quality, but in the source data there are certain rows with AdmitType null but AdmitDate NOT null so we cleaned the data for that.
Also the same rule applies for, if DischargeStatus is null then DischDate will be null.

Got it. So to generalize this a bit, would it be fair to say: Whatever combinations of null values appear in the real data should be the only combinations that appear in the synthetic data?

So for the AdmitType, AdmitDate columns, only 3 of the 4 possible combinations are allowed:

AdmitType AdmitDate
Not Null Not Null
Not Null Null
Null Null

Yes that should be correct according to our understanding.

Following up on this thread: We added the FixedNullCombinations constraint in version 0.28.0. This constraint is able to accomplish the logic, so I’m marking this thread as resolved.