[Resolved] Specifying regex format for primary key and not having them in sequence

If primary key has a specific regex format, user would like to specify it and also they would like to be not in sequence.

  • For example if the primary key is of 5 digits, currently SDV would produce
1
2
3 
4 
.... 
10000

whereas user would like to have it like it to be

8983
1239
2938
...
0023

Solution

  • The regex format could be specified as for any other ID column.

  • SDV will natively incorporate scrambling. Though currently, it is not clear if it hurts downstream testing.

@neha @ashok.kumar.muthimen

Hi @ashok.kumar.muthimen, I have marked this topic as a Feature Request.

To ensure that we can provide you with the best possible solution, it would be helpful if you could describe why your downstream application requires IDs in a random order.

What we know from other customers: Many of our customers will use synthetic data for software testing. They have mentioned that for a software testing suite, it typically does not matter which order the IDs are in (sequential, random, etc.). As the software testing suite is typically only using those IDs as references; it is not checking anything about their order.

Additional information requested: I am curious how/where your use case differs. If the synthetic regexes are in sequential order, would it cause some problem for your software testing framework? Or is there another concern you have about sequential IDs?

I’m marking this feature request as resolved now, as we have supported this feature in SDV Enterprise now starting from SDV Enterprise 0.13.0. By default, SDV Enterprise users should see that Regexes are created in a random order now.

Please note that that for the specific case where you have a primary key and you desire a random Regex order, then sampling may be more performance intensive. Primary keys need to be unique, and if SDV needs to create them in a random order, then it needs to remember which ones have already been created. This can increase sampling performance and the memory usage of the synthesizer.

If you are running into these issues, we recommend falling back to sequentially creating the Regex. Please start a new discussion if you are running into this problem, and we’d be happy to assist!