SDV foreign key detection in community version

Hi SDV Team, I’m using the community version of SDV, and I’ve noticed that the foreign key detection relies on the column_name_match heuristic. In our database, every table uses ‘id’ as its primary key. As a result, SDV is inferring foreign key relationships between all tables on the ‘id’ column, even though these are just the primary keys for each table and not actual foreign keys.

Is there a recommended way to prevent this behavior, or is this the expected outcome with the current detection logic? Any guidance or best practices for handling this scenario would be greatly appreciated.

Hi @Mariam,

In SDV Community, you’re only able to use a column name match when inferring foreign key relationships. I realize that this is creating too many connections right now since you have many columns named id. For this case, I would recommend simply turning off the foreing key detection and then supplying the connections yourself.

from sdv.metadata import Metadata

# auto-detect metadata without any relationships
metadata = Metadata.detect_from_dataframes(
    data=my_dictionary,
    infer_keys='primary_only' # do not infer foreign keys
)

# add the foreign key relationships yourself
metadata.add_relationship(
    parent_table_name='my_parent_table',
    child_table_name='my_child_table',
    parent_primary_key='id',
    child_foreign_key='parent_id'
)

For more information, refer to the full Metadata API docs.

(Of course, if you ever decide to switch to SDV Enterprise, you would no longer have to do this because the metadata inference for SDV Enterprise is actually based on the data, and not the column names.)

Hope that helps!