Hi!
I have a little issue while trying to set my number of rows per table when using the MultiTable DayZSynthesizer. I get a little message whenever I try to force a number (the default settings works just fine) but not when I want a specific number of rows for all the tables.
{
“name”: “AttributeError”,
“message”: “‘int’ object has no attribute ‘get’”,
“stack”: "---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[27], line 1
----> 1 synthetic_data = synthesizer.sample(num_rows_per_table=1000)
File packaging\\sdv_enterprise\\sdv\\multi_table\\dayz\\day_zero.pyx:75, in sdv_enterprise.sdv.multi_table.dayz.day_zero.expirable.wrapper()
File packaging\\sdv_enterprise\\sdv\\multi_table\\dayz\\day_zero.pyx:223, in sdv_enterprise.sdv.multi_table.dayz.day_zero.DayZSynthesizer.sample()
AttributeError: ‘int’ object has no attribute ‘get’"
}
I get this message I wanted to know if I did something wrong.
For the context I wrote this piece of code
Hey Charles, it looks like you stumbled into a bug! That function and parameter combo is supposed to work an integer to drive the sampling across all of your tables. I opened an issue internally to have the team investigate and fix.
For now as a workaround, I recommend using num_rows instead:
After some more digging, it looks like our documentation here is a bit confusing and the error message that’s returned isn’t that helpful!
The num_rows_per_table parameter expects you to pass in a dictionary of values, while num_rows will accept an integer value.
So the library is working as intended but I thought I’d clarify some things further to help you better understand the mental model here.
So there are 2 ways of specify how much data you want synthesizes using DayZSynthesizer.sample() and you can use these parameters together if you want:
the num_rows parameter lets you define the number of rows (as an integer) you want as the default amount for all tables
the num_rows_per_table parameter lets you specify the number of rows you want specifically at the table level
So there are a few different ways to use these:
Specify a uniform # of rows for all tables
# 1000 rows from every table
synthetic_data = synthesizer.sample(num_rows=1000)
Synthesize a default # of rows, but provide specific guidance for some tables
The following will generate 1000 rows like the hotels table but only 10 rows like the guests table. If you had more tables than these 2, then 1000 rows would be synthesized like those ones too since it’s the default value!
# 1000 rows from every table (as a default)
sampling_dict = {'guests': 10}
synthetic_data = synthesizer.sample(num_rows=1000, num_rows_per_table=sampling_dict)
Choose specific row counts for every table
This will synthesize 15 rows that resemble the guests table and 15,000 rows that resemble the hotels table.
Hi, thanks for the clarifications! I was following the documentation and you could see at the top the sample with num_rows_per_table=1000. Probably a typo
Once again thanks for the help !