Data Profiles of SQL Server Information Services stored in tables

The task Data Profile of SQL Server Information Services stores the results of profiling in an XML document that can be examined with the Data Profile Viewer. Article Dataprofiling with SQL Server 2008 explains how to use this new Task in SSIS.

Although this method is very simple, sometimes may not be sufficient. Addressing a data quality project may involve, for example, storing a history of profiles to assess how data quality of processed data has been improving.

The best way to work with historical data is using a database and storing the data in tables, where you can make queries, reports and comparisons. To achieve that all you would need to do is moving the metadata that the profiling task has been storing in the XML file to database tables.

Well, someone has already prepared an easy way to do it. Thomas Frisendal from the website Information Quality Solutions explains how to create an XSLT file for each type of profiling that is used to extract the XML generated by the Data Profile Task SSIS into one or more XML files with a format that can be directly imported to tables .  

 

Package SSIS para pasar Data Profiles a Tablas

Thus, you can store profileing data in tables by creating a process that applies a XSLT for every kind of profiling, and then loading each resulting XML file into a table.

Note: All files include a field that reports the name of the source table. Then, with one table for each type of profile and you can store the profiles of all the tables that are processed.

Data Flow Task de SSIS ProfileToSQL

 

I have also added to the table a date field that stores the date on which the process is done, and for now the result has been good. 

In the graph we can see the distribution of different via types for the Spanish street according to INE data. We realize, for example, that identifiers are not too well defined, since we can find things like more of a handle to the same type of street ('STREET', 'C', 'C/'), or enough values with a point ('.') as an identifier.

 

Tabla de SSIS Data Profile de valores diferentes para tipo de via

 

In Free tool for automation of SQL Server the author explains in detail how this solution works and how to obtain the style sheets. In Usage recommandations for the ProfileToSQL stylesheets he explains in detail how to use XSLT, and includes a disclaimer clarifying that this software is a test version.