Hello! As said on Slack, I am currently working with DayZSynthesizer. I asked for locales = fr_FR in the synthesizer settings for 2 columns (country code and address). The addresses seem to have french names but the country code is not Fr all the time. Even if I add the address relationship it seems not to register. I am wondering if the “locales” setting has effects on the country code sdtype.
Bests,
Charles VU
Hi @epicvu, thanks for filing the issue here.
As the DayZSynthesizer is meant to create random training data, we don’t yet support column relationships such as addresses. Our team can add this feature request. May I ask if you have a use case for this? Or are you mostly exploring SDV Enterprise capabilities at the moment?
If you have a more involved use case that requires column relationships, constraints, etc. I would recommend getting some training data and using HSASynthesizer.
Hi @neha, I was just exploring the SDV Enterprise capabilities towards creating fake but realistic data in some sort. But maybe for future use cases let’s say that I have a patients informations table that contains one column for the address and the other that contains country code test out if i could create “realistic” data. (I don’t have real data for training) Just wanted to see if I could add this “logic” to my data generation.
Hi @epicvu, got it, thanks for the context. The recommended way to supply an address is to add it as a column_relationship in your metadata. See Python API. The resulting metadata JSON will look something like this:
"column_relationships": [{
"type": "address",
"column_names": ["country_code", "street_address"]
}]
Given this information, other synthesizers should be able to create valid rows where address and country code aligns with the given locale. Unfortunately, this not yet available for DayZSynthesizer so we will add it as a feature request.
Since you are only considering locale fr_FR, the only possible country code value should be "FR". For this case, a simple workaround would be to override the synthetic data column to be "FR":
# this example assumes you're using single table DayZSynthesizer
synthesizer = DayZSynthesizer(metadata, locales=['fr_FR'])
synthetic_data = synthesizer.sample(num_rows=100)
# override the column
synthetic_data['country_code'] = 'FR'
You can use this while the feature is still being developed.
Hi @neha, yeah I felt like doing so for the work around since it’s all french adresses thanks for the confirmation! Thanks for the help ![]()
Hi @epicvu following up on this: We’ve created a feature request internally to track this and hope to include the following behavior in a future release:
Behavior: It should be possible for a DayZSynthesizer to create a realistic data when a column relationship is specified in the metadata. The relationship should follow the locale. Eg. if you address relationship between a street address and country code, the combinations of these will be based on the locale –fr_FR will always yield addresses & countries inside of France.
I’m updating the title of this topic to match.
Alright! That’s nice to hear thank you for the all the informations ![]()