Synthesizing a subset of a table

neha · April 13, 2026, 3:17pm

This question was originally filed here by @rizwan. I’m separating it out into a new thread so that we can discuss it specifically.

Which software are you using? SDV Enterprise

Software Details SDV 0.44, Python 3.13

Description

How to synthesize only a subset of a table? For example, Employee table has columns; EmployeeID, FirstName, Surname, Address, StartDate. How to synthesize only FirstName and Surname and merge the synthetic data back to the table?

neha · April 13, 2026, 3:48pm

Hi @rizwan,

The general premise of SDV is to create brand new synthetic data for each of the tables in your database. The synthetic rows that you will receive are completely new entities that do not correspond to any one, original entity.

For example if you are synthesizing an Employees table, then each row of the synthetic Employees table corresponds to a brand new Employee that doesn’t really map to any one, original employee. This is what allows SDV to scale up the synthetic data – for example, creating 100x or even 1000x the number of original employees.

Clarifying your use case

If the desire is to synthesize only the FirstName and Surname columns of the table, I’m not sure whether your desire is to synthesize brand-new employees as opposed to just anonymizing existing information?

This can be achieved through some other functionality that SDV provides (like RDTs or targeted sampling). To better help you, would you be able to clarify this use case? Is the desire to keep the original employees exactly as-is and just create new names for them? What about other tables that may be connected to the Employees table?

(Related the concept of synthesizing brand new entities, I’d recommend this blog post.)

rizwan · April 13, 2026, 5:54pm

Hi @neha , thanks for the reply. As you mentioned, anonymizing just names won’t create a synthetic data. I guess I would need to synthesize the entire table.

neha · April 14, 2026, 2:33pm

Right, I think for most cases you’d probably want to synthesize entire tables (unless they are reference tables that need to remain static).

Though even when it comes to synthetic data, you can “fix” a few values by using the conditional sampling feature. Usually, these are statistical values. The synthetic data would then be created with that in mind. For example, creating exactly 50 male and 50 female employees.

Let me know if there are any follow-ups or if you have a use case that requires partial anonymization.

Topic		Replies	Views
Combining two columns, include code table for reference only and synthesize subset of table Synthetic Data Creation	2	33	April 13, 2026
Scaling specific tables in a multi-table Synthetic Data Vault (SDV) model Synthetic Data Creation	5	60	May 12, 2026
Including a code table for reference only Synthetic Data Creation	3	18	April 14, 2026
Combining two columns into a single column Synthetic Data Creation	2	20	April 13, 2026
SDV Enterprise Version 0.13.0 Release Notes	0	8	May 21, 2024

Synthesizing a subset of a table

Description

Clarifying your use case

Related topics