Connecting to MSSQL

Which software are you using? SDV Enterprise

Software Details (What is your SDV version? Python version?): 0.37, 8.10

Description

I have followed the instructions on docs.sdv.dev for MSSQL connections. I can connect, load the data. Can I read just few columns from a table instead of whole table? Also, exporting the systhesized data back to database, how to write on a new table, new database?

I am using AdventureWorks2019 dataset for testing on these three tables; Person.Person, Person.PersonPhone, Person.PhoneNumberType. Getting this error message:

sqlalchemy.exec.CompileError: in table PhoneNumberType, column Name. Can’t generate DDL for NullType(). Did you forget to specify a type on this column?

I tried to set the sdtype of the column like

metadata.update_column(
column_name=’Name’,
sdtype=’categorical’,
table_name=’PhoneNumberType’
)

but that did not resolve the issue. Then I removed the table from the metadata like:

metadata.remove_table(‘PhoneNumberType’)

but still getting the same error.

How do I define the ‘Name’ column type in DDL that is when it is creating a new table to insert synthesize data?

My output code is:

synthetic_data_custom={
    f”{table_name}_synthetic”: df
    for table_name, df in synthetic_data.items()
}

connector.export {
    data=synthetic_data_custom,
    mode=’write’,
    verbose=True
}

Additional Context

(Provide any other background that would be helpful. For example, what overall problem are you trying to solve?)

For additional resources see the API Docs.

Hi @rizwan,

To make sure I’m understanding right: It seems that you were able to successfully use the MSSQL Connector to (a) authenticate and make a connection to the database, (b) create metadata and load data from the database, and (c) create a synthesizer with that metadata/data, and (d) sample synthetic data from that synthesizer. The problem is that when you try to export the synthetic data, you are receiving an error.

Intended usage: The intended usage is to use the same connector instance for import and export. During import, the connector instance saves the SQL schema from the original database. Then during export, the connector creates a new database schema (using the saved SQL) and populates it with the synthetic data.

Looking at your code, it seems that you are changing the names of the tables to be <table_name>_synthetic, which isn’t supported.

Here’s my recommendation:

  1. After sampling synthetic data, do not modify it in any way. For example, don’t change the name of the tables in the dictionary, and don’t add/remove any columns. We designed the connector to export on the exact synthetic data that is sampled by the synthesizer. If the synthetic data has been modified, there may be a mismatch between the SQL schema and the synthetic data, which could lead to an error.
  2. Set an export config for a new database schema. The connector is looking to create a new schema (not reuse the real one) when exporting the synthetic data.
    Eg.
connector.set_export_config(
    schema_name='synthetic_adventureworks', # provide the name of a new schema to create
    auth={
        'username': <username>,
        'password': <password>,
        'database': 'my_database', # you may export back to the same database, as long as the schema is different
        'server': ...
    }
)

connector.export(
    data=synthetic_data,
    mode='write',
    verbose=True
)

Hope that’s helpful. Let me know if I’m misunderstanding something!

Hi @neha,

Thank you for your reply. Yes, all steps are completed successfully except the last step which is to write the synthetic data back to database. I removed the code where I was modifying the table names but still getting the error when writing back to database for PhoneNumberType.

Hi @rizwan, thanks for confirming.

It appears that the creator of the AdventureWorks dataset has defined some new data types in addition to usual, predefined SQL types (eg. INT, VARCHAR, DATE, etc.). These additional user-defined types may be causing some problems with the connector’s export functionality.

Our engineering team is looking into this now, and we will provide you with an update shortly!

Hi @neha ,

I looked at the data types and yes they are different than the standard ones like Name:nvarchar and NameStyle:bit . These datatypes are also new to me. Thank you for looking into this.

Hi @neha ,

I was able to write back synthetic data to MSSQL database by creating a new test database with few tables. Thanks for your help.

Thanks for confirming @rizwan. Was this a new dataset or are you still using AdventureWorks?

One workaround (for now) would be to create an entirely new instance of the connector and just use it for the export. The new connector instance would export to a new schema based on the synthetic data itself, and it will default to using the native MSSQL data types (int, varchar, etc.). So the exact schema types won’t match the original, but you’d still be able to export the data. Hope that helps!

# create a new connector instance for export-only
export_connector = MSSQLConnector()

export_connector.set_export_config(
    schema_name='synthetic_adventureworks', # provide the name of a new schema to create
    auth={
        'username': <username>,
        'password': <password>,
        'database': 'my_database', # you may export back to the same database, as long as the schema is different
        'server': ...
    }
)

export_connector.export(
    data=synthetic_data,
    mode='write',
    verbose=True
)

I have escalated the issue of non-standard data types to the engineering team, and they are figuring out a solution. Out of curiosity, do your actual databases have these types of data?

Hi @neha, I created a test database and sample records.

Thank you for suggesting to create a new connector for exporting data. It worked with AdventureWorks dataset.

Our dataset won’t have these types of datatype.

Thanks again for your help and suggestions!

1 Like

Just a quick update here (for tracking purposes): Our engineering team is performing a comprehensive audit of advanced database features (such as user-defined data types) to make sure that we’d be able to handle them across-the-board. Timeline-wise, I will be able to provide an update (with clearer timelines) around early March.

For any additional feedback or questions about these advanced database features, please feel free to reply to this thread. (We will only mark it as “resolved” after the fix is in.)