Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Data integration is key to enabling applications in different departments to interoperate.  One of the first problems is the fact that different systems will represent the same concept in a different way, meaning that transfering data from system A to system B about that concept can run into difficulties.

This is illustrated below, where both systems (in this case Oncology and Haematology) have a data element for a patient's sex, and to indicate if that patient is subject to a multidisciplinary team discussion(MDTD).

The oncology people record the patient's sex as a string which contains either the characters 'MALE' or the characters 'FEMALE'. The Haematology people use the characters 'M', 'F' or 'U'

The oncology people record the multidiscipinary team discussion as string which can be 'yes' or 'no.'

However when just these two small data elements are moved to the Haematology department a problem appears, because the Haematology people record the same concepts differently.

And this means that the data being moved across has to be changed - the string 'MALE' or 'FEMALE' has to be changed to 'M' or 'F'

and the MDTD Indicator string needs to be transferred to an Integer of value 1 or 0.

 

 

Now in this case the problem is small as we are only talking about 2 data elements, we can see the disjoint, and a good programmer could code a solution in a few minutes.

What happens if we have 300 data elements in one system and 6500 data elements in the other system.

It maybe that some data elements overlap, some may lose information.

If we transfer the information about patient sex from Haematology to Oncology, what happens to the patients who have been classified as 'U' - do they appear as 'MALE' or 'FEMALE' in Oncology?

It can take months to write a program to transfer the data across from 1 system to another system.

And then the prople who produced the Oncology department system update their program, and change 50 data elements....

 

One solution is to use a metadata registry which stores metadata about all the data elements in each system.

Different versions of datasets can be registered and data elements can then be linked and compared easily across differing systems.

Differences between datasets can be easily identified, and descisions made about solving this data integration problem.

 

As a result it is relatively easy to make decisions about differing dataset beging used in different systems.