[Resolved] InvalidDataError: All references in table are unknown and must be dropped

Hi @ashok.kumar.muthimen and @kalyan I am starting this new topic to specifically discuss the error you are seeing when you are using drop_unknown_references. Here is the link to the original thread.

Traceback (most recent call last):
File “”, line 1, in
File “/sasdata/python3.8/lib/python3.8/site-packages/sdv/utils/poc.py”, line 65, in drop_unknown_references
raise InvalidDataError([
sdv.errors.InvalidDataError: The provided data does not match the metadata:
All references in table ‘person’ are unknown and must be dropped.Try providing different data for this table.

Hi @ashok.kumar.muthimen With regards to what you point out, it is not an exception (a.k.a unexpected behavior). It is expected behavior, as the message suggests:

All references in table ‘person’ are unknown and must be dropped.Try providing different data for this table.

It seems like you also picked up data for Master that has no references in Person table for `SDV to work with.

That is there is data in df_master and df_person, but they don’t have any similar CONT_IDs so there is no data to connect.

  • How much data is in the source database for Master and Person ?
  • How much did you pull for each to model using SDV?

SDV will be able to model this if you do get the data that can be connected. Right now you have got data that is not connected.

Also, as @Wim points out it does not make sense to model 1-1, we do recommend that you get larger data and more tables that have relationships that can be synthesized.

Hi @ashok.kumar.muthimen, could you please also provide us the metadata JSON so that we can verify your schema? Thanks.

A post was split to a new topic: KeyError when fitting HSASynthesizer

Hi @ashok.kumar.muthimen thank you for providing the metadata. As you are initially learning SDV, the metadata will help us point you in the right direction. Once we have done our initial POC and education about SDV, it will not be necessary to send the metadata to us in future projects.

The original error in this thread is InvalidDataError: All references in table are unknown and must be dropped (see title). From your latest response, it seems like you have made progress to get past this. Can you confirm if the original InvalidDataError error was resolved? Did you resolve it by getting some new data?

As for the latest KeyError, I will spin it up into a new topic. We like to keep 1 thread per issue. Otherwise it may become confusing to follow what’s going on.