This question was originally filed here by @rizwan. I’m separating it out into a new thread so that we can discuss it specifically.
Which software are you using? SDV Enterprise
Software Details SDV 0.44, Python 3.13
Description
How to combine two columns and generate the value for the third column when data is synthesized.
For example, Employee table has Name, EmployeeID and SalaryGrid column. There is another column EmployeeKey (Primary Key) which is a combination of three columns and the value is ‘Name EmployeeID ! SalaryGrid’. For example, ‘John 151 ! 5’. How to define this relationship in metadata?
If I’m understanding correctly, it seems like the EmployeeKey value can be described with a formulaic function that combines the EmployeeID, Name, and SalaryGrid column. Is that correct? Something like <Name> <Employee ID> ! <SalaryGrid>?
This generally isn’t something you’d specify in the metadata. The recommended approach would be to program your own constraint that: (a) removes the EmployeeKey column before modeling, as it is not really necessary to learn separately since it can be programmatically created by the other columns, and (b) re-creates the column after creating the synthetic data. In fact, the “Program you own constraint” tutorial does something very similar.
However, this might be more tricky because you mention the EmployeeKey column is a primary key. Does this mean that there are other tables that refer to the EmployeeKey (using foreign keys)?
The DataCebo team will be able to help you out with the programmable constraint. To better understand this case, I am curious: Is the EmployeeID also unique throughout the table? (i.e. is it similar to a primary key in that it uniquely identifies each record)
Hi @neha , thank you for the answer and your suggestion to use a formula after the data is synthesized. To answer your question, yes the EmployeeID is also unique.