In this use case, you will learn about how Fairspace helped a top 10 Pharma to address the challenge of accessing metadata in a unified manner of data assets with substantially different data models. Fairspace is an open-source solution developed by The Hyve to help research organizations achieve more flexible and FAIR research data management - securely storing and organizing their research data sets, sharing this data with collaborators, and easily searching for metadata and data.
The Challenge: Fragmented Data Silos
One of the world’s largest and top 10 pharmaceutical companies, renowned for its rich history in developing, producing, and marketing healthcare products, aimed to implement a tool that enables researchers to discover and leverage cross-divisional R&D data collections for enhanced scientific analytics.
The researchers faced significant challenges in managing and integrating Clinical data and Omics data due to the fundamentally different structures and scales of these datasets. Clinical data, including e.g. Clinical Imaging data, typically unstructured and massive, contains high-resolution medical images with metadata on imaging modality and patient demographics, while Omics data, highly structured, captures complex biological information such as gene expressions and protein interactions. These datasets are often stored in fragmented silos, making cross-domain analysis difficult. Inconsistent metadata standards further complicate the integration.
Researchers needed to harmonise these diverse data models, handle large-scale data efficiently, and ensure compliance with regulations like HIPAA and GDPR, all while supporting interdisciplinary collaboration to unlock deeper insights from both data types.
Fairspace’s unified metadata interface was crucial to overcoming these barriers, enabling seamless browsing, search, and analysis across diverse datasets.
The new Fairspace capability: Multi-Model search
Fairspace, an open-source data management platform, became the tool of choice for this usecase. Fairspace was originally built around a single, unified data model, enabling the integration of diverse datasets that fit the model. This core feature allows researchers to link, unify, and search through all metadata in a centralized way, offering an interface that bridges data silos and enhances discoverability. .
The platform could effectively handle individual domains like clinical data models or Omics data models on their own. However, the technical challenge arose when trying to integrate both clinical data and Omics data into a single interface. These two domains follow fundamentally different data models, with distinct entities, structures, and metadata standards, making it difficult to unify them in a common framework.
To tackle this, the multi-model search capability was introduced, allowing Fairspace to simultaneously manage and display datasets with completely different metadata models, while preserving the integrity of each dataset’s structure.
This approach was possible because we integrated multiple Fairspace instances, each potentially using a different metadata model, into a single, unified interface. Let’s break down how this is achieved.
Multi-Model Search: How It Works
Fairspace enables the integration of multiple Fairspace instances, each potentially using a different metadata model, into a single, unified interface. In this setup, a main Fairspace instance is selected as the core platform, and it is configured to integrate with other external Fairspace instances through an API gateway. This setup allows users to browse and explore metadata from different models, such as Clinical data and Omics data, within the same interface. Each Fairspace instance may have its own unique metadata model, but the unified interface brings these disparate datasets together while preserving their distinct structures.
All instances are connected to the same identity provider, ensuring seamless authentication across them. This means that users can log in with the same credentials and view metadata from both internal and external instances without needing to switch between platforms. While cross-instance metadata search is not yet supported (planned for future implementation), users can still explore external metadata views within the main interface, experiencing a browsing functionality similar to internal views (Figure 1).
This integration is powered by an architecture built on three key components in Figure 2:
Mercury (Front End): The user interface that aggregates different metadata views into a single, navigable platform.
Pluto (API Gateway): This gateway enables communication between various Fairspace instances, routing data requests and responses across different configurations.
Saturn (Back End): Handles the logic and database management to dynamically create the appropriate views and columns for external metadata, ensuring that both internal and external data are displayed accurately.
Together, these components allow Fairspace to unify diverse datasets without compromising their unique metadata structures, making the platform a powerful tool for organizations dealing with multiple data models.
Why This Matters: Solving Fragmentation for R&D
The pharmaceutical company was facing an age-old problem: disparate data models that couldn’t interact with each other, making comprehensive search across a variety of datasets difficult, if not impossible. By implementing Fairspace’s multi-model metadata views integration, they were able to overcome this challenge.
From a system design point of view, rather than forcing different data models into a single model, Fairspace allows each dataset to retain its unique structure while still presenting the data in a unified interface. This flexibility ensures that researchers can access the data they need, no matter how specialized or complex it is, without navigating multiple systems.
Additionally, the platform acts as a single point of truth. Even though the underlying data resides in different sources internal or external to the organization, users interact with all of it through a single, consolidated interface. This eliminates the time-consuming process of switching between different systems, allowing researchers to focus on generating insights rather than managing data.
Conclusion: Fairspace as the R&D Metadata Hub
Fairspace with its multi-model data search serves as a comprehensive data catalog and a single point of truth for R&D. By unifying diverse datasets like Clinical Imaging and Omics under one platform, it simplifies data discovery and accelerates research.
By accessing all relevant metadata in one place, researchers can avoid navigating multiple systems, thereby reducing the time spent managing data and streamlining workflows. This enables better data-driven decisions. Fairspace’s flexible architecture is essential for managing the complexity of modern scientific research, ensuring seamless access to critical R&D assets.
To learn more about the latest features and capabilities developed in 2024, click here. If your organization is facing similar data management challenges, complete the contact form below or reach out to us at marketing@thehyve.nl. Our experts can discuss with you how to best tackle these issues within your R&D departments.