In today's data-driven world, organizations face the pressing need for efficient data management and integration. Customers often express concerns about consolidating data from multiple sources and exposing it to various applications. Especially in pharma and life sciences R&D, the challenge of integrating and linking data generated at multiple stages of the research process and drug discovery value chain is particularly common. This can include data from laboratory experiments, clinical trials or external scientific literature that are needed to answer research questions, support decision-making, but can also become accessible for secondary use. Thus, the need for an integrated view over the disparate data sources and sets that is readily consumed by users and applications alike, has become increasingly evident. This is where a semantic layer comes into play. A semantic layer acts as an abstraction that bridges the gap between raw data storage and user-friendly interfaces, enabling seamless integration, organization, and exploration of information. In this article, we will explore the concept of the semantic layer (SL), how it addresses these challenges, and provide an overview of the different approaches to implement an enterprise semantic layer.
Understanding Semantic Layers
A semantic layer is critical to modern data ecosystems, providing a unified view and interpretation of complex data structures. It enables users to interact with the available data in a meaningful and intuitive manner, without the need to know all the technical details of underlying data sources, interchange formats, or storage technologies involved.
A semantic layer acts as a semantic abstraction that translates data as it lives in source storage systems into a structured and understandable representation. It can generally be defined as a ‘glue’ between the storage and consumption of data by client applications, offering a consolidated view of the data across an organization. The data is presented in familiar business terms, and in a format that facilitates efficient querying, analysis, and visualization.
At the core of the semantic layer is the semantic data model. Here, the entities (e.g. “study”, “compound”, “gene”), the relations between these entities, their attribute properties, and transformation rules of the semantic layer are defined. Since a (good) semantic data model is a reflection of the business domain, it is less likely to change over time as would be the case with application-specific database models, which vary based on specific application requirements. In this way, changes in the underlying storage structures can be made without adversely impacting the applications that rely on the structure of the semantic layer.
What are the advantages of a semantic layer?
Accelerated Data Integration for Cross-Disciplinary Research
In a typical R&D setting, researchers often work with various sources. A semantic layer simplifies integrated exploration and querying of data from these disparate systems, bringing together experimental data, clinical trial outcomes, and real-world evidence into a unified, research-friendly view. For example, a semantic layer could seamlessly connect high-throughput screening results from an internal lab system with patient demographics and treatment response data from another internal or public database, enabling more comprehensive analyses for drug target validation.
Standardized Definitions for Open Science Collaborations
In a complex R&D environment, where researchers across multiple teams (e.g., biology, chemistry, and clinical) or organizations work together, ensuring that everyone uses the same standards is essential. A semantic layer ensures consistent terminology across all collaborators, so researchers at different departments or organizations interpret the data the same way. For instance, when working on multi-omics studies across partners, a semantic layer ensures that terms like "mutation" or "variant" are consistently defined, improving data quality and making results more reproducible across collaborative efforts.
Simplified Data Access for Research Insights
Many researchers often need to query large, complex datasets without deep knowledge of data structures or underlying systems. A semantic layer makes it easy for them to access relevant data using familiar scientific terms, rather than requiring expertise in database management. For example, a scientist working on drug repurposing could pull relevant clinical trial data, patient records, and molecular data using intuitive queries, drastically speeding up the research process without needing extensive technical support.
Two Approaches to Semantic Layers: Relational-type versus Knowledge Graph-type
When deciding to implement a semantic layer, there are broadly two options: either the relational-type or the knowledge graph-type. These approaches differ on technology and the type of data model paradigm to represent entities and the relationships between them.
Relational-type semantic layers are typically expressed using a relational data model with tables/views, columns, and rows. While effective in handling data aggregates for fast analytics and easier for business intelligence (BI) tools to connect to, relational-type semantic layers have limitations when dealing with complex and diverse data types and relationships.
On the other hand, knowledge graph-based semantic layers leverage the power of semantic data models, organizing entities and their interrelations as a network of information to express the meaning of data that is understandable by both humans and machines. Knowledge graphs, built on Resource Description Framework (RDF) and Linked Data principles, provide a more flexible approach by representing context-rich data. While relational-type semantic layers are effective for structured data and large-scale processing, knowledge graphs resemble the way humans tend to build their own models of the world. Also, graphs allow the application of additional types of analytics such as network analysis and the use of RDF enables an easier adoption of FAIR principles - making data Findable, Accessible, Interoperable, and Reusable.
The semantic layer and the FAIR Principles
The decision to build a semantic layer is driven by the vision to harness the potential of organized data and knowledge, which can be further achieved by adopting a robust FAIR data strategy. Adhering to the fundamental principles of FAIR ensures maximum impact and value of the data ecosystem.
Knowledge graph-based semantic layers naturally align with the FAIR principles, as they provide standardized data representation using open standards such as RDF, facilitate data discovery, enable interoperability by reusing terms and identifiers of common vocabularies, and support data reuse across multiple applications and domains. The knowledge graph approach enhances the findability of data by allowing the representation of complex relationships via a web of interconnected data. Also, the metadata is stored alongside the data, facilitating easy retrieval of relevant information. By connecting data through a knowledge graph, users can access data directly via queries across data silos to answer complex questions. Interoperability is also improved, as knowledge graphs can integrate data from various sources and support the use of ontologies and controlled vocabularies that allow for the use of a ‘common language’ with external organizations. Furthermore, knowledge graphs promote data reusability by capturing comprehensive metadata (e.g. on data provenance) that can help the user trust the data.
Knowledge graphs and the FAIR principles enable organizations to create a connected and collaborative data ecosystem. By implementing a semantic layer using a knowledge graph, organizations can ensure that their data is more discoverable, accessible, and interoperable, ultimately driving innovation and fostering data-driven decision-making.
Conclusion
A semantic layer is crucial for unlocking the full potential of data by providing an integrated view of the data. While relational-type semantic layers are built upon a well-known modeling approach, knowledge graph-based semantic layers offer a more flexible, scalable, and interoperable solution. By harnessing semantic web technologies organizations can fully utilize the potential of their data assets in this data-driven era. The Hyve can support an organization in identifying the best setup of a semantic layer and further implement the steps to create a Knowledge graph-based layer.