Genomics England

Cancer Database Discovery

Genomics England embarked on a discovery phase, looking to provide researchers with a new database to interrogate and explore data collected across the 100k Genome Project and beyond.


Problem:

Genomics England, like all organisations processing heterogeneous data, faced challenges to ensure data was consistent and standardised, as well as providing linked and anonymised data for research and analysis. They engaged us to evaluate a number of different technologies and common data models that could be used to store data for the 100k and beyond. 

The users include 3500 researchers who want access to the anonymised data and a technical team of database administrators and developers who need to curate the data. The curators need to link and extract data for analysis and management information. They also provide core data for the research environment and need to anonymise, extract, uploads and processes data for use by analysts.


The high level business drivers included:

  • Consistently understand, exchange, share data from multiple sources.  
  • Decrease the barriers to re-use and support collaborative working 
  • Facilitate information integration
  • Leverage standardised clinical terminology
  • Enable alignment with related external data modelling initiatives


Solution:

We worked with Genomics England and a 3rd party contractor to evaluate a number of different technologies and models, and complete a discovery phase of work that migrated data from various repositories into the OMOP (OHDSI) common data repository.

In the discovery phase we:

  • Interviewed users and stakeholders
  • Created wireframes, designed ETL and completed databases.
  • Used open source tools to achieve agreed requirements.

To address these problems, we deployed our open source metadata registry, the Metadata Exchange which allowed us to evaluate the different target data models against the following high-level criteria:

  • Data Element Completeness: All data elements within the scope of the project are accommodated 
  • Domain Completeness:  All project domains are accommodated 
  • Integrity: The extent to which associations match the specific project needs
  • Flexibility: Ease of adapting to changes 
  • Simplicity: Ease of querying for cohort identification
  • Integration: Extent to which ontologies and controlled terminologies are supported
  • Efficacy: Extent to which specified ese cases are supported 


Outcome:

The result was a robust set of user requirements, a populated database and complete evaluation of the discovery phase based on a real end-product. The project was successfully completed under-budget and they are moving forward to the next phase of the project.


Why Parity and MetadataWorks

  • Experience of PHE NCRAS Data as part of the Genomics England dataset
  • Recently completed Cancer database discovery project
  • World-leading metadata management tools
  • [parity]


Quote:

“When we started our work on the 100,000 Genomes Project, the complexity of managing data across so many different sites made it clear that our traditional methods for managing data definitions and processes were not going to work. Metadata Consulting have helped to ensure consistent and standardised data quality and processes for the 100,000 Genomes Project, supporting genome genomic analysis for participants."

Amanda O’Neill,

Director of Clinical Data, Genomics England