White Rabbit is a database profiling tool that was built primarily to assist in the preparation for ETLs (Extraction, Transformation, Loading) of healthcare databases but can be used to profile a wide variety of databases. The tool can scan a data source and output a Scan Report which provides a data profile on the tables and columns without displaying any personally identifiable information. Full details and documentation can be found at this link. Within the Metadata Exchange, this Scan Report can be used to create a new data model, add profiling data to an existing data model or update the profiling data of a data model.
Instructions
1. Installation of White Rabbit: please see instructions and downloads.
2. Once White Rabbit is installed and opened, the first step is to specify the location of data:
- Working folder: specify the folder where the Scan Report will be exported into.
- Data Type: select the database format. White Rabbit can work with a variety of different source file types (delimited text files, MySQL, Oracle, SQL Server, and PostgreSQL) but in this example we will be using a PostgreSQL database. The instructions for each source type differ slightly so full instructions can be found here.
- Server location: for PostgreSQL, the server location should contain the host IP and the name of the database (<host>/<database>) e.g. ‘127.0.0.1/worlddb’.
- User name: enter the name of the user connecting to the server.
- Password: enter the password for the user connecting to the server.
- Database name: enter the name of the schema containing the tables (default is often ‘public’).
3. Once these fields have been filled, select Test Connection. If successful, the following message should be displayed.
4. After navigating over to the tab named “Scan”, the next step is to select which tables you would like to include in the scan. To add in all tables in the database automatically, click “Add all in DB’. To manually select individual tables, click on “Add” and select the tables. To remove one or more tables, simply select the table/s and click “Remove”.
5. Once you have chosen which tables to include in the report, there are some additional settings that can be configured:
Scan field values: checking this box directs White Rabbit to review the contents of each column/ the raw data items within each table selected and include a frequency count in the Scan Report.
- “Min cell count” can be adjusted as needed. In this example, the min cell count that has been selected is 5, which means that values that appear less than 5 times will be not be included in the report.
- “Max Distinct Values” – adjusting this will set the maximum number of distinct values per field to be reported on.
- “Rows per table” – this refers to the number of rows that will be scanned per table.
If “Scan field values” is unchecked, White Rabbit will not report on the raw data values within the tables.
6. When the settings have been adjusted to your needs, click on “Scan tables”. The Scan Report will be exported to the working folder selected previously.
Related articles
Add Comment