Quality of the source data
Understanding source data is crucial when selecting the appropriate end goals.
To check if the source data is fit for purpose, we investigate whether the source tables comply with the input format of the ETL.
Data quality assessment is an integral part of data conversion that ensures smooth collaborations in the life sciences, whether for the purpose of in-house or federated data analytics. The Hyve can help you verify the quality of data after each instance of data conversion into the OMOP common data model (CDM) because of our extensive knowledge of and experience with the OHDSI data quality assessment tools. By following this approach, we can guarantee the highest quality of analysis-ready CDM-harmonised data.
Understanding source data is crucial when selecting the appropriate end goals.
To check if the source data is fit for purpose, we investigate whether the source tables comply with the input format of the ETL.
Carefully designing the transformation process, including syntactic and semantic mapping and Extract Transform Load (ETL) pipeline development is the next essential step toward achieving high data quality in the target CDM. We develop end-to-end tests to check whether the transformation rules have been implemented correctly in the ETL pipelines.
After the data have been converted to the OMOP CDM and the ETL has been tested, we proceed to check the quality of the harmonized data. We leverage OHDSI data quality tools, such as the Data Quality Dashboard (DQD) and Achilles, to assess the quality of mapped data.
We apply a multitude of checks to compare the source and mapped data. These can be simple sanity checks, for example, whether the number of patients in the mapped data matches that of the source data. Or more elaborate inquiries, such as checks whether extreme cases have been detected.
DARWIN EU® is a federated network of high-quality healthcare databases across Europe. The network provides expertise and services to support regulatory decision-making throughout the lifecycle of a pharmaceutical product. The Hyve, as a subcontractor of the DARWIN EU® Coordination Centre, is responsible for data quality assurance. Our OMOP experts support data partners joining the network and ensure their data is of high quality by reviewing the outputs of the data quality dashboard (DQD) and the CDM onboarding report provided by the data partner candidates.
For harmonized data to be usable for a united effort, all involved stakeholders need to understand and trust the quality of the data. The concern is that poor quality of the data (garbage in), will inevitably result in poor research results (garbage out). Therefore, it is advisable to check the quality of your data prior to running a study.
Essentially, assessment of data quality always requires the understanding of the use case or the research question to understand whether the data is fit-for-purpose. That is to be able to provide answers to your research questions based on your expectations of coverage, validity and reliability.
We use the Data Quality Dashboard (DQD) and Achilles. No source data information can be accessed from these tools, so the sensitive patient information stays anonymous.
Additionally we use Rabbit in a Hat (RiaH), an OHDSI tool maintained by The Hyve, which provides a clear interface to work with the mappings between the source and target tables.
The security of your data is of the highest priority for us. As a ISO 27001 certified company, The Hyve will ensure your sensitive data remains safe.
The Hyve can help you verify the quality of data after each instance of data conversion into the OMOP common data model (CDM) because of our extensive knowledge of and experience with the OHDSI data quality assessment tools. By following this approach, we can guarantee the highest quality of analysis-ready CDM-harmonised data.