[Resolved] Num_row_per_table=1000 DayZSynthesizer Multitable

Hi!
I have a little issue while trying to set my number of rows per table when using the MultiTable DayZSynthesizer. I get a little message whenever I try to force a number (the default settings works just fine) but not when I want a specific number of rows for all the tables.

{
“name”: “AttributeError”,
“message”: “‘int’ object has no attribute ‘get’”,
“stack”: "---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[27], line 1
----> 1 synthetic_data = synthesizer.sample(num_rows_per_table=1000)

File packaging\\sdv_enterprise\\sdv\\multi_table\\dayz\\day_zero.pyx:75, in sdv_enterprise.sdv.multi_table.dayz.day_zero.expirable.wrapper()

File packaging\\sdv_enterprise\\sdv\\multi_table\\dayz\\day_zero.pyx:223, in sdv_enterprise.sdv.multi_table.dayz.day_zero.DayZSynthesizer.sample()

AttributeError: ‘int’ object has no attribute ‘get’"
}
I get this message I wanted to know if I did something wrong.
For the context I wrote this piece of code

synthetic_data = synthesizer.sample(num_rows_per_table=1000)

You can try on any metadata.
Thanks!

Hey Charles, it looks like you stumbled into a bug! That function and parameter combo is supposed to work an integer to drive the sampling across all of your tables. I opened an issue internally to have the team investigate and fix.

For now as a workaround, I recommend using num_rows instead:

from sdv.multi_table import DayZSynthesizer
from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='multi_table',
    dataset_name='fake_hotels'
)

guests_table = data['guests']
hotels_table = data['hotels']

synthesizer = DayZSynthesizer(metadata)
synthetic_data = synthesizer.sample(num_rows=1000)

This will synthesize 1000 rows from each table and conform to the metadata.

1 Like

Thanks a lot, I’ll use the recommended parameters :slight_smile: !

After some more digging, it looks like our documentation here is a bit confusing and the error message that’s returned isn’t that helpful!

The num_rows_per_table parameter expects you to pass in a dictionary of values, while num_rows will accept an integer value.

So the library is working as intended but I thought I’d clarify some things further to help you better understand the mental model here.

So there are 2 ways of specify how much data you want synthesizes using DayZSynthesizer.sample() and you can use these parameters together if you want:

  • the num_rows parameter lets you define the number of rows (as an integer) you want as the default amount for all tables
  • the num_rows_per_table parameter lets you specify the number of rows you want specifically at the table level

So there are a few different ways to use these:

  1. Specify a uniform # of rows for all tables
# 1000 rows from every table
synthetic_data = synthesizer.sample(num_rows=1000)
  1. Synthesize a default # of rows, but provide specific guidance for some tables

The following will generate 1000 rows like the hotels table but only 10 rows like the guests table. If you had more tables than these 2, then 1000 rows would be synthesized like those ones too since it’s the default value!

# 1000 rows from every table (as a default)
sampling_dict = {'guests': 10}
synthetic_data = synthesizer.sample(num_rows=1000, num_rows_per_table=sampling_dict)
  1. Choose specific row counts for every table

This will synthesize 15 rows that resemble the guests table and 15,000 rows that resemble the hotels table.

sampling_dict = {'guests': 15, 'hotels': 15_000 }
synthetic_data = synthesizer.sample(num_rows_per_table=sampling_dict)

Hi, thanks for the clarifications! I was following the documentation and you could see at the top the sample with num_rows_per_table=1000. Probably a typo :slight_smile:
Once again thanks for the help !

Updated first code snippet in the documentation to reflect this (may need to clear cache). Thanks for the feedback!

2 Likes