The challenge
Many companies and research institutions struggle with integration of their research data; data lives in siloed systems and vendor solutions, and is scattered over departments. In addition, experimental results are communicated via email, SharePoint, or PowerPoint presentations. Furthermore, there is often no conformance to a shared standard for experimental data. This is a missed opportunity as annotating and integrating data enables scientists to answer broader research questions and therefore increases the value of their data.
The Hyve has helped to solve this challenge for a global, top-10 pharmaceutical company with drug discovery and development programs in several therapeutic areas. More specifically: we built a semantic model and knowledge graph for their Immuno-Oncology & Cell Therapy (IOCT) domain.
How we solved it
We tackled the data-integration issue by first creating a semantic model that captures the IOCT research and business domain. This then served as a foundation for generating a knowledge graph, using data from different systems within the company.
We collaborated with customer key stakeholders to make an inventory of data sources that needed to be mapped to the semantic model. For the model, both public domain ontologies (such as OBI and BFO) as well as customer specific ontologies were used. This phase of the project involved investigating relevant use cases, systems, and data. Besides, we held regular feedback sessions.
After creating the semantic model, we mapped the data from different sources to entities in the model to create the knowledge graph. The Hyve evaluated several strategies and tools to perform the extract, transform and load (ETL) process to populate this knowledge graph before deploying the graph on the client’s internal infrastructure where it can be browsed, searched, and queried.
The outcome
In this project, we delivered a semantic model that builds on public domain ontologies and aligns with other semantic models that had been previously developed for the client. The model provides our client with a stable representation of the entities and procedures used as it is not depending on structures dictated by vendor applications. By building the semantic model, the client also got a clear picture of what information was missing in the different systems to be able to more thoroughly understand and integrate their research data. Thus, the model now serves as a reference for modelling newly generated research data and is used to integrate research data assets into the company’s enterprise knowledge graph.
Once the semantic model was created and the knowledge graph populated with the data, queries based on the proposed use cases could be run. It demonstrated that end-to-end use cases can be answered using this knowledge graph. The knowledge graph transcends research domains and departments and is therefore a major step forward in the customers’ ongoing efforts to unlock research data from siloed systems.