[Resolved] Specifying regex format for primary key and not having them in sequence

kalyan · April 22, 2024, 5:11am

If primary key has a specific regex format, user would like to specify it and also they would like to be not in sequence.

For example if the primary key is of 5 digits, currently SDV would produce

whereas user would like to have it like it to be

Solution

The regex format could be specified as for any other ID column.
SDV will natively incorporate scrambling. Though currently, it is not clear if it hurts downstream testing.

@neha @ashok.kumar.muthimen

neha · April 29, 2024, 10:14pm

Hi @ashok.kumar.muthimen, I have marked this topic as a Feature Request.

To ensure that we can provide you with the best possible solution, it would be helpful if you could describe why your downstream application requires IDs in a random order.

What we know from other customers: Many of our customers will use synthetic data for software testing. They have mentioned that for a software testing suite, it typically does not matter which order the IDs are in (sequential, random, etc.). As the software testing suite is typically only using those IDs as references; it is not checking anything about their order.

Additional information requested: I am curious how/where your use case differs. If the synthetic regexes are in sequential order, would it cause some problem for your software testing framework? Or is there another concern you have about sequential IDs?

neha · January 26, 2026, 8:38pm

I’m marking this feature request as resolved now, as we have supported this feature in SDV Enterprise now starting from SDV Enterprise 0.13.0. By default, SDV Enterprise users should see that Regexes are created in a random order now.

Please note that that for the specific case where you have a primary key and you desire a random Regex order, then sampling may be more performance intensive. Primary keys need to be unique, and if SDV needs to create them in a random order, then it needs to remember which ones have already been created. This can increase sampling performance and the memory usage of the synthesizer.

If you are running into these issues, we recommend falling back to sequentially creating the Regex. Please start a new discussion if you are running into this problem, and we’d be happy to assist!

Topic		Replies	Views
[Resolved] Specifying regex format for ID columns Synthetic Data Creation metadata	9	35	April 29, 2024
Customizing id column Synthetic Data Creation	5	29	March 20, 2026
Generate IDs using complex Regexes Synthetic Data Creation feature-request	5	21	March 27, 2025
SDV Enterprise Version 0.12.1 Release Notes	0	3	April 19, 2024
Specify Composite Keys in the Metadata Inside the Vault data-integration , metadata , multi-table	0	19	March 30, 2026

[Resolved] Specifying regex format for primary key and not having them in sequence

Related topics