Introduction
When you use the importer to import structural metadata in a custom excel spreadsheet format, you will need to provide an XML mapping file alongside your excel file. This XML file tells the importer how to map the data in each column to the various catalogue elements that are being created from your excel file.
The importer can handle the following catalogue elements:
Data Classes
Data Elements (*Mandatory)
Data Types
Measurement Units
Business Rules
As well as the following properties:
Metadata (key value pairs)
Multiplicity (mandatory or optional)
Each row in your Excel spreadsheet defines a single Data Element which may be part of a group of data element, called a Data Class. As well as the id, name and description, a Data Element may have metadata (also known as extensions or key-value pairs) and Business Rules. It may also have its multiplicity (whether mandatory or not) specified.
Mapping Catalogue Elements to XML Elements
The following table explains each of the XML elements and how they map to the catalogue elements:
Catalogue Element | XML Element Name | XML Attributes | Description | Notes |
---|---|---|---|---|
Data Class | dataClassID | The model catalogue id of this data class | ||
Data Class | dataClassName* | separator | The data class name. If separator is specified, it is a path name allowing nested classes | Recommended |
Data Class | dataClassDescription | The data class description | Recommended | |
Data Element | dataElementID | The model catalogue id of this data element | ||
Data Element | dataElementName* | The data element name | Mandatory - must be included in your spreadsheet | |
Data Element | dataElementDescription | The data element description | Recommended | |
Data Element | dataElementMultiplicity | The data element multiplicity (not yet implemented). | ||
Data Type | dataTypeID | The model catalogue id of this data type | ||
Data Type | dataTypeName* | The data type name | Recommended | |
Data Type | dataTypeDescription | The data type description | Optional | |
Data Type | dataTypeRule | The data type rule (usually a Groovy expression like this: x ==~ /[a-zA-Z0-9=]*/ | Optional | |
Data Type | dataTypeEnums | Not used at present | Optional | |
Measurement Unit | measurementUnitID | The model catalogue id of this measurement unit | ||
Measurement Unit | measurementUnitName* | The measurement unit name | Recommended (if available/relevant) | |
Measurement Unit | measurementUnitSymbol | The measurement unit symbol | Optional | |
Business Rule | validationRules | separator | This sets the description and name for each of a list of business rules (the names are truncated to 255 characters) | Optional |
Data Element | mandatoryRule | Specifies if this data element is mandatory or optional (any value means it is mandatory) | Optional | |
Data Element | metadata | separator keyValueSeparator | See section below: Multiple Metadata and Enumeration Definitions | Recommended (if available/relevant) |
Enumerated Type | enumerations | separator keyValueSeparator | See section below: Multiple Metadata and Enumeration Definitions | Recommended (if available/relevant) |
headerRow | Specifies the row with the header names that are specified in the XML file (first row is 1). If not specified, the default is 1 | |||
dataStartRow | Specifies the first row with a data element definition. If not specified, the default is 2 |
Data Element name is the only mandatory catalogue element.
Other recommended catalogue elements have also been noted in the table.
Generally ID fields are also left blank as these will be generated by the system.
Each XML element can only appear once with the sole exception being metadata which can appear in multiple columns (as long as each column name is unique).
Advanced: Specifying Multiple Metadata and Enumeration Definitions in a Single Cell
It is possible to specify multiple key-value pairs for metadata and enumerations in a single spreadsheet cell by using the separator
and keyValueSeparator
attributes. For example, this entry in the XML mapping:
<enumerations separator="\n" keyValueSeparator="|">Enumerations</enumerations>
would allow a definition like this:
0|Not known |
Similarly for metadata. It also allows arbitrary metadata keys whereas normally each key must be specified in the XML mapping file. Enumerations can just have keys in which case there is no keyValueSeparator
:
<enumerations separator="\n">Enumerations</enumerations>
would allow this sort of definition:
yes |
Metadata must always have both a key and value so must have both separators (or neither).
If a metadata entry has no separators, it (the column name) is used as the metadata key and the value is the value in the cell.
Steps
Firstly, prepare your excel file for import. Ensure the Excel file you wish to import has a header row at the top with each column with a unique header name that is used to identify that column. It’s also important to check that any elements which are reused throughout the spreadsheet i.e. data type names are spelled the same to avoid creating duplicate catalogue elements.
Checklist:
- Do I have a header row with unique names?
- Have I included a column of Data Element names?
- Have I checked the catalogue element names for correct spelling?
- Have I checked the separators / key value separators for correct formatting? (if used)
Next, open up your XML mapping file. A blank template XML file has been created here:
Alternatively it can be generated from the Metadata XML Schema found on the importer pages:It is now time to map the header row to each of the catalogue elements/properties. Simply enter the column header names into the correct sections. In this example file they have been labelled as
EnterColumnHeaderHere
. Simply replace this text with the name of your column. For example :<dataClassID>EnterColumnHeaderHere</dataClassID>
→<dataClassID>Data_Class</dataClassID>
Remove any sections that are not present in your spreadsheet. For example, if you have no additional metadata associated with each data element, you can delete the following line.
<metadata>EnterColumnHeaderHere</metadata>
Ensure your spreadsheet columns follow the order of the XML file e.g. Data Class, Data Class Description Data Element, Data Element Description etc.
On the “Import a new data model with Structural Metadata (Excel)” page , upload your Excel and XML files and then click Submit.
Worked Examples
Example 1 (Simple)
This simplest possible spreadsheet to import is a single column of data elements:
Data Element |
---|
NHS Number |
Date of Test |
Morphology |
The corresponding XML mapping file is as follows:
<?xml version="1.0" encoding="UTF-8"?> <headersMap xmlns="https://schemas.metadata.works/headersMap" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://schemas.metadata.works/headersMap https://schemas.metadata.works/headersMap.xsd"> <dataElementName>Data Element</dataElementName> </headersMap>
This would create a data model with 3 data elements and nothing else.
Files:
Example 2 (Complex)
Most models are more complex - for example the default structure used for most structural metadata uploads in the onboarding wizard is this:
Table Name | Table Description | Column Name | Column Description | Data Type | Sensitive |
---|---|---|---|---|---|
Summary | The summary | Identifier | The identifier | Integer | FALSE |
Summary | The summary | Title | The title | String | FALSE |
Summary | The summary | Description | The description | String | TRUE |
Required | The required elements | Release Date | The release date | Date | FALSE |
Required | The required elements | Data Controller | The controller | String | FALSE |
Required | The required elements | Data Processor | The data processor | String | TRUE |
Required | The required elements | License | The license | String | FALSE |
The corresponding XML mapping file is as follows:
<?xml version="1.0" encoding="UTF-8"?> <headersMap xmlns="https://schemas.metadata.works/headersMap" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://schemas.metadata.works/headersMap https://schemas.metadata.works/headersMap.xsd"> <dataClassName>Table Name</dataClassName> <dataClassDescription>Table Description</dataClassDescription> <dataElementName>Column Name</dataElementName> <dataElementDescription>Column Description</dataElementDescription> <dataTypeName>Data Type</dataTypeName> <metadata>Sensitive</metadata> </headersMap>
This would create a data model with two classes (Summary and Required) with three and four data elements respectively. It would also create three data types (Integer, String and Date) and each data element would have a single metadata item with key Sensitive and a value of TRUE or FALSE.
Files:
Example 3 (Advanced)
Here is an example of a more advanced set of structural metadata with nested classes, rules and enumerations:
Class | Element | Type | Rule | Enums |
---|---|---|---|---|
Registration/Patient | Laterality | LateralityType | L|Left | |
Registration/Patient | Multifocal | yesnonank | N|No | |
Registration/Sample | SampleId | String | x ==~ /\d{9}/ | |
Registration/Sample | RackWell | rackWell | x ==~ /[A-H][1-9]|[A-H][1][0-2]/ |
This XML mapping file was generated to import the following spreadsheet:
<?xml version="1.0" encoding="UTF-8"?> <headersMap xmlns="https://schemas.metadata.works/headersMap" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://schemas.metadata.works/headersMap https://schemas.metadata.works/headersMap.xsd"> <dataClassName separator="/">Class</dataClassName> <dataElementName>Element</dataElementName> <dataTypeName>Type</dataTypeName> <dataTypeRule>Rule</dataTypeRule> <enumerations separator="\n" keyValueSeparator="|">Enums</enumerations> </headersMap>
Files:
0 Comments