Building a knowledge graph unlocking immuno-oncology and cell therapy data

04-08-2021 5 min read

The challenge

Many companies and research institutions struggle with integration of their research data; data lives in siloed systems and vendor solutions, and is scattered over departments. In addition, experimental results are communicated via email, SharePoint, or PowerPoint presentations. Furthermore, there is often no conformance to a shared standard for experimental data. This is a missed opportunity as annotating and integrating data enables scientists to answer broader research questions and therefore increases the value of their data.

The Hyve has helped to solve this challenge for a global, top-10 pharmaceutical company with drug discovery and development programs in several therapeutic areas. More specifically: we built a semantic model and knowledge graph for their Immuno-Oncology & Cell Therapy (IOCT) domain.

How we solved it

We tackled the data-integration issue by first creating a semantic model that captures the IOCT research and business domain. This then served as a foundation for generating a knowledge graph, using data from different systems within the company.

Figure 1. Schematic representation of the process to create a knowledge graph

We collaborated with customer key stakeholders to make an inventory of data sources that needed to be mapped to the semantic model. For the model, both public domain ontologies (such as OBI and BFO) as well as customer specific ontologies were used. This phase of the project involved investigating relevant use cases, systems, and data. Besides, we held regular feedback sessions.

Figure 2. Example of the semantic representation (left) of a CAR (chimeric antigen receptor) synthesis (right).

After creating the semantic model, we mapped the data from different sources to entities in the model to create the knowledge graph. The Hyve evaluated several strategies and tools to perform the extract, transform and load (ETL) process to populate this knowledge graph before deploying the graph on the client’s internal infrastructure where it can be browsed, searched, and queried.

Figure 3. Application centric view (left) versus domain centric view (right) of the data

The outcome

In this project, we delivered a semantic model that builds on public domain ontologies and aligns with other semantic models that had been previously developed for the client. The model provides our client with a stable representation of the entities and procedures used as it is not depending on structures dictated by vendor applications. By building the semantic model, the client also got a clear picture of what information was missing in the different systems to be able to more thoroughly understand and integrate their research data. Thus, the model now serves as a reference for modelling newly generated research data and is used to integrate research data assets into the company’s enterprise knowledge graph.

Once the semantic model was created and the knowledge graph populated with the data, queries based on the proposed use cases could be run. It demonstrated that end-to-end use cases can be answered using this knowledge graph. The knowledge graph transcends research domains and departments and is therefore a major step forward in the customers’ ongoing efforts to unlock research data from siloed systems.

Knowledge Graphs

Having all data in a structured, queryable model stimulates internal and external collaboration and speeds up research and development processes. It also enables maximum (re-)use of all available data.

With knowledge graph capability, data integration is simplified because meaning has been standardized. Processes can be automated by reducing the need for reconciliation. Data teams get analytical flexibility and the ability to ask ‘what if’ questions of the data. Data stewards can manage data more efficiently as all data points as well as the relationships between data elements are captured.

Download Semantic Modeling and Knowledge Graphs infographic

Are you, as a data steward, data scientist or researcher, constantly wasting time cleaning and structuring siloed data from different data sources and datasets? And does the way the data is stored and annotated hinder you in making biomedical discoveries?

Maybe you have just started looking into semantic models and knowledge graphs as a solution for these time and resource consuming efforts? This infographic shows the benefits of semantic models and knowledge graphs and how The Hyve, in collaboration with a top pharma company, solved their data integration challenge.

Let us know how to reach you

I want to receive updates and knowledge from The Hyve