A use case of Fairspace for Institut Curie

 

In this page you will learn about the common data challenges faced by large organizations and how Fairspace – an open source solution developed by The Hyve in collaboration with Institut Curie – can help to address those challenges. In general, Fairspace helps organizations quickly stand up a FAIR compliant metadata, data governance and auditing layer over existing storage and applications, thus aiming to contribute to the growing open source ecosystem around FAIR Data. 

 

                                           

 

Common challenges

  • Collaboration around data inside and outside of the organization is ad hoc
  • Difficulty in cross-group communication as well as clinical and research data linking
  • Research teams work in silos; Data is produced and stays on hard drives; No access to updated clinical annotations for researchers
  • Same analyses are done twice for a same sample while some data of interest get lost
  • Each research team is working with its own tools which are mostly outdated
  • More stringent data protection regulations and of course Good Research Practice

Proposed solutions

  • Need to make data independent assets by assigning PIDs and proper metadata
  • Need a better overview and map of their data
  • Need to integrate multiple data repositories
  • Need to spend efforts efficiently
  • Need to make use of new efficient tools
  • Need tool for tracking data usage and a GDPR compliant environment for hosting research data

 

 

Project objectives

The objective of the project for Institut Curie is to create a FAIR Virtual Research Environment with a transparent layer on top of both data sources and applications, to enable and manage FAIR data governance processes as well as store data access activity for regulatory compliance purposes. At the heart of the transparent layer is a semantic metadata store that stores metadata information about all linked data sources in the workspace, as well as context information such as metadata on subjects, samples, data consents etc.

 

 

Support Data Management Life Cycle with Fairspace

 

  • Add your data to Fairspace

All your data can be stored on the secure Fairspace storage, where it is indexed and some metadata is automatically recorded. One of the core principles is that we can connect to your existing storage systems without copying the data, but allowing for data sharing and metadata annotation.

  • Annotate your data with FAIR metadata

We provide sensible defaults to get you started. Ultimately though, you are the experts in your domain, and Fairspace helps you create an appropriate metadata model for your data. We make sure you are able to follow the FAIR principles for your metadata while doing so.

  • Share and Publish your data

Your data is already searchable via the Fairspace GUI and discoverable for machines and you can share data with others directly from within Fairspace. The RDF metadata can be used to populate and federate to data catalogs and semantic search interfaces.

  • Automate & integrate it!

Fairspace does not try to be a solution for all use cases and is built to integrate with other tooling. We integrate with Jupyter, tranSMART, and cBioPortal for analytics use cases. We provide a rich metadata API to automatically publish information generated in sources such as scientific instruments or workflow systems.

 

How does it work?

  • Collaborative metadata catalog on live research data collections
    • Search, browse and share files and collections
  • Flexible metadata layer to put data stewards in control
    • Have your experts define the metadata model appropriate for your research
  • Integrate with existing source data systems such as iRODS
  • Enable third party tooling for all stages of research

 

 

This figure details the main user flows in Fairspace:

(1) Users with the role of data steward are able to create and maintain the metadata model relevant to the organisations research data assets. The main building blocks of this semantic model are Classes, Properties, and Relations.

(2) Any user of the system will then be able to use the configured metadata model to describe the metadata entities relevant for their research. Advanced users can create automated processes to populate the metadata via the Fairspace API.

(3) The metadata can be linked to files and collections that have been indexed by Fairspace in an automatic way, or that have been added by users manually.

(4) Files can be shared with other directly and used in integrated applications for analytics and visualisations, e.g. Jupyterlab and cBioPortal.

 

If you want to discuss about similar problems your organization is facing, or if you want to know how The Hyve can help you tackle these problems, please drop us a message.