How to Create an XML Mapping FIle

Introduction

When you use the importer to import structural metadata in a custom excel spreadsheet format, you will need to provide an XML mapping file alongside your excel file. This XML file tells the importer how to map the data in each column to the various catalogue elements that are being created from your excel file.

The importer can handle the following catalogue elements:

  • Data Classes

  • Data Elements (*Mandatory)

  • Data Types

  • Measurement Units

  • Business Rules

As well as the following properties:

  • Metadata (key value pairs)

  • Multiplicity (mandatory or optional)

Each row in your Excel spreadsheet defines a single Data Element which may be part of a group of data element, called a Data Class. As well as the id, name and description, a Data Element may have metadata (also known as extensions or key-value pairs) and Business Rules. It may also have its multiplicity (whether mandatory or not) specified.

Mapping Catalogue Elements to XML Elements

The following table explains each of the XML elements and how they map to the catalogue elements:

Catalogue Element

XML Element Name

XML Attributes

Description

Notes

Catalogue Element

XML Element Name

XML Attributes

Description

Notes

Data Class

dataClassID

 

The model catalogue id of this data class

 

Data Class

dataClassName*

separator

The data class name. If separator is specified, it is a path name allowing nested classes

Recommended

Data Class

dataClassDescription

 

The data class description

Recommended

Data Element

dataElementID

 

The model catalogue id of this data element

 

Data Element

dataElementName*

 

The data element name

Mandatory - must be included in your spreadsheet

Data Element

dataElementDescription

 

The data element description

Recommended

Data Element

dataElementMultiplicity

 

The data element multiplicity (not yet implemented).

 

Data Type

dataTypeID

 

The model catalogue id of this data type

 

Data Type

dataTypeName*

 

The data type name

Recommended

Data Type

dataTypeDescription

 

The data type description

Optional

Data Type

dataTypeRule

 

The data type rule (usually a Groovy expression like this:

x ==~ /[a-zA-Z0-9=]*/

Optional

Data Type

dataTypeEnums

 

Not used at present

Optional

Measurement Unit

measurementUnitID

 

The model catalogue id of this measurement unit

 

Measurement Unit

measurementUnitName*

 

The measurement unit name

Recommended (if available/relevant)

Measurement Unit

measurementUnitSymbol

 

The measurement unit symbol

Optional

Business Rule

validationRules

separator

This sets the description and name for each of a list of business rules (the names are truncated to 255 characters)

Optional

Data Element

mandatoryRule

 

Specifies if this data element is mandatory or optional (any value means it is mandatory)

Optional

Data Element

metadata

separator

keyValueSeparator

See section below: Multiple Metadata and Enumeration Definitions

Recommended (if available/relevant)

Enumerated Type

enumerations

separator

keyValueSeparator

See section below: Multiple Metadata and Enumeration Definitions

Recommended (if available/relevant)

 

headerRow

 

Specifies the row with the header names that are specified in the XML file (first row is 1). If not specified, the default is 1

 

 

dataStartRow

 

Specifies the first row with a data element definition. If not specified, the default is 2

 

  • Data Element name is the only mandatory catalogue element.

  • Other recommended catalogue elements have also been noted in the table.

  • Generally ID fields are also left blank as these will be generated by the system.

  • Each XML element can only appear once with the sole exception being metadata which can appear in multiple columns (as long as each column name is unique).


Steps

  1. Firstly, prepare your excel file for import. Ensure the Excel file you wish to import has a header row at the top with each column with a unique header name that is used to identify that column. It’s also important to check that any elements which are reused throughout the spreadsheet i.e. data type names are spelled the same to avoid creating duplicate catalogue elements.

Checklist:

Do I have a header row with unique names?
Have I included a column of Data Element names?
Have I checked the catalogue element names for correct spelling?
Have I checked the separators / key value separators for correct formatting? (if used)
  1. Next, open up your XML mapping file. A blank template XML file has been created here:
    (Alternatively it can be generated from the Metadata XML Schema found on the importer pages: )

  2. It is now time to map the header row to each of the catalogue elements/properties. Simply enter the column header names into the correct sections. In this example file they have been labelled as EnterColumnHeaderHere. Simply replace this text with the name of your column. For example : <dataClassID>EnterColumnHeaderHere</dataClassID> → <dataClassID>Data_Class</dataClassID>

  3. If you have used enumerations or multiple metadata definitions, remember to also specify the separator you have used. See section below for further details: How to Create an XML Mapping FIle | Advanced: Specifying Multiple Metadata and Enumeration Definitions in a Single ...

  4. Remove any sections that are not present in your spreadsheet. For example, if you have no additional metadata associated with each data element, you can delete the following line.

<metadata>EnterColumnHeaderHere</metadata>
  1. Ensure your excel spreadsheet columns follow the order of the XML file e.g. Data Class, Data Class Description Data Element, Data Element Description etc.

  2. On the Import a new data model with Structural Metadata (Excel) page , upload your Excel and XML files and then click Submit.


Advanced: Specifying Multiple Metadata and Enumeration Definitions in a Single Cell

It is possible to specify multiple key-value pairs for metadata and enumerations in a single spreadsheet cell by using the separator and keyValueSeparatorattributes. For example, this entry in the XML mapping:

<enumerations separator="\n" keyValueSeparator="|">Enumerations</enumerations>

would allow a definition like this:

0|Not known
1|Male
2|Female
9|Not specified

Similarly for metadata. It also allows arbitrary metadata keys whereas normally each key must be specified in the XML mapping file. Enumerations can just have keys in which case there is no keyValueSeparator:

<enumerations separator="\n">Enumerations</enumerations>

would allow this sort of definition:

yes
no
unknown

Metadata must always have both a key and value so must have both separators (or neither).

If a metadata entry has no separators, it (the column name) is used as the metadata key and the value is the value in the cell.

The template XML file contains each variation of enumeration (with a separator or with a key value separator) and metadata (without a separator, with a separator, with a keyvalue separator) of XML mapping , please only fill out the relevant lines.


Worked Examples

Example 1 (Simple)

This simplest possible spreadsheet to import is a single column of data elements:

Data Element

Data Element

NHS Number

Date of Test

Morphology

The corresponding XML mapping file is as follows:

<?xml version="1.0" encoding="UTF-8"?> <headersMap xmlns="https://schemas.metadata.works/headersMap" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://schemas.metadata.works/headersMap https://schemas.metadata.works/headersMap.xsd"> <dataElementName>Data Element</dataElementName> </headersMap>

This would create a data model with 3 data elements and nothing else.

Files:


Example 2 (Complex)

Most models are more complex - for example the default structure used for most structural metadata uploads in the onboarding wizard is this:

Table Name

Table Description

Column Name

Column Description

Data Type

Sensitive

Table Name

Table Description

Column Name

Column Description

Data Type

Sensitive

Summary

The summary

Identifier

The identifier

Integer

FALSE

Summary

The summary

Title

The title

String

FALSE

Summary

The summary

Description

The description

String

TRUE

Required

The required elements

Release Date

The release date

Date

FALSE

Required

The required elements

Data Controller

The controller

String

FALSE

Required

The required elements

Data Processor

The data processor

String

TRUE

Required

The required elements

License

The license

String

FALSE

The corresponding XML mapping file is as follows:

<?xml version="1.0" encoding="UTF-8"?> <headersMap xmlns="https://schemas.metadata.works/headersMap" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://schemas.metadata.works/headersMap https://schemas.metadata.works/headersMap.xsd"> <dataClassName>Table Name</dataClassName> <dataClassDescription>Table Description</dataClassDescription> <dataElementName>Column Name</dataElementName> <dataElementDescription>Column Description</dataElementDescription> <dataTypeName>Data Type</dataTypeName> <metadata>Sensitive</metadata> </headersMap>

This would create a data model with two classes (Summary and Required) with three and four data elements respectively. It would also create three data types (Integer, String and Date) and each data element would have a single metadata item with key Sensitive and a value of TRUE or FALSE.

Files:


Example 3 (Advanced)

Here is an example of a more advanced set of structural metadata with nested classes, rules and enumerations:

Class

Element

Type

Rule

Enums

Class

Element

Type

Rule

Enums

Registration/Patient

Laterality

LateralityType

 

L|Left
R|Right
M|Midline
B|Bilateral
8|Not applicable
9|Not known

Registration/Patient

Multifocal

yesnonank

 

N|No
Y|Yes
8|Not applicable
9|Not known

Registration/Sample

SampleId

String

x ==~ /\d{9}/

 

Registration/Sample

RackWell

rackWell

x ==~ /[A-H][1-9]|[A-H][1][0-2]/

 

This XML mapping file was generated to import the following spreadsheet:

<?xml version="1.0" encoding="UTF-8"?> <headersMap xmlns="https://schemas.metadata.works/headersMap" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://schemas.metadata.works/headersMap https://schemas.metadata.works/headersMap.xsd"> <dataClassName separator="/">Class</dataClassName> <dataElementName>Element</dataElementName> <dataTypeName>Type</dataTypeName> <dataTypeRule>Rule</dataTypeRule> <enumerations separator="\n" keyValueSeparator="|">Enums</enumerations> </headersMap>

 

Files: