The FAIRplus project aims to develop tools and guidelines for making scientific research data Findable, Accessible, Interoperable, and Reusable. This IMI2 project started in January 2019 and will run until June 2022; The Hyve is one of the 22 partners. In this blog, an overview of the FAIRplus goals and the current state of affairs.
What is FAIR?
This question has been answered elaborately by The Hyve’s colleagues in previous blogs. But let me recapitulate briefly the incentive for FAIR. Researchers in the life-sciences create large volumes of data and a wide variety of data types nowadays. After experiencing data management difficulties, a group of researchers realized that they needed to make sure their data was Findable, Accessible, Interoperable and Reusable (FAIR). A number of European governments, including The Netherlands, are now actively promoting the generation of FAIR data and the reuse of research data within the scientific community.
What is FAIRplus?
Over the past years, the FAIR data principles have been developed. The question often remains: How to put those principles into practice? Where to start when making your research data FAIR? It’s a process that gets complicated very quickly and you want to avoid ending up in a situation where you can’t see the forest for the trees.
The FAIRplus project aims to boost the implementation of FAIR in several ways. First, by FAIRifying existing research datasets from 20 Innovative Medicines Initiative (IMI) projects. Secondly, by developing a process description and guidelines on how to FAIRify data, based on the experience gained with tackling the IMI-data. A cookbook shall be composed to provide scientists and organizations with practical advice on how to make their own datasets more FAIR. This cookbook will be shared publicly; an early version is already available here. Furthermore, FAIRplus will run a comprehensive training programme in FAIR data management for researchers, data scientists and IT experts working with life science data.
In total 22 partners participate in the FAIRplus project – 12 from academia, 7 members of the European Federation of Pharmaceutical Industries and Associations (EFPIA), The Hyve is one of 3 SMEs (small and medium-sized enterprises) involved in the project.
Work in progress
In the past year, datasets from 4 IMI-projects have been FAIRified. Until December, I was (on behalf of The Hyve) part of a squad tackling one of those, the RESOLUTE project. Together with the researchers involved with the project we made an initial inventory of the FAIRness of the data, decided what aspects of the transcriptomics data could be made FAIR, and documented the whole process. The lessons-learned were turned into a FAIR cookbook recipe. Since the beginning of this year, I have been promoted to squad-lead and am now overseeing the work of various FAIRplus teams.
How to achieve the FAIRplus goals?
FAIR scoring system
Another ambitious goal of the FAIRplus project is the development of a Capability Maturity Model (CMM). This tool will help organizations develop, assess and refine their FAIR data transformation program. Aspects covered by the CMM include:
- determining the FAIRness level of the dataset at the beginning of FAIRification
- creating a roadmap to FAIR implementation
- providing a set of indicators to measure the FAIRness of data assets
- defining an effective FAIRification process
- identifying capabilities needed for FAIR processes at different maturity level
There are already a number of scoring systems available to determine how FAIR data is (e.g. the RDA initiative). The FAIRplus project will incorporate aspects of these systems such as scoring per letter in combination with yes/no-questions. To give you an example, under F –Findability– you may be asked: can computers automatically find your metadata? Besides the scoring system, the FAIRplus project wants to define levels of FAIRness. The CMM would also contain practical advice on the actions required if you want to upgrade a dataset from level 1 to level 2 for instance.
FAIRification Roadmap
As mentioned before, the lessons-learned by the FAIRplus squads will be turned into cookbook recipes and made available to scientists who want to FAIRify their own research data. In the past year, a roadmap was composed with a step-by-step description of the FAIRification process (see illustration below). In the future, each step will be linked to recipes relevant to that stage of the FAIRification process. These recipes will describe how to solve practical issues, such as how to determine the license for your dataset.
Step 2 on the roadmap -competency questions- is one of the most important steps in the whole FAIRification process. At this step, you’re supposed to answer the question: What goals do I want to achieve? Since the process gets complex very quickly, it is extremely important to set clear targets. Otherwise, the whole process can swallow up a huge amount of time and resources. Defining specific targets not only helps you stay on track. You will know when the goal is reached.
It is also important to keep in mind that the aim should never be to reach a “100 percent FAIR” score. FAIRify only relevant data – data that really has to be made Findable, Accessible, Interoperable, Reusable. Don’t spend time and money on data that is outdated or for another reason unsuitable for reuse.
IMI catalogue and ontologies
With the roadmap in place, the FAIRplus squads now focus on steps 7 and 8: metadata strategies and evaluation against standards. Metadata are essential if you want to search your dataset; whether you want to find all trials on a particular asthma drug, studies with a specific cell line or all variants of a particular gene. So a good metadata catalogue is high on the wishlist of all IMI-partners, scientists and pharmaceutical companies. Within this project an IMI catalogue will be built but the specifications of building your own catalogue will also be captured in a recipe.
As Kees described recently in his blog, when adding metadata it’s important to keep in mind that there are various ‘zoom levels’. On a general level, the metadata provides information on data ownership, the type of disease or medical treatment. At a deeper level the metadata could describe which cell line was used or which machine was used for DNA sequencing.
Uniformity
To ensure uniform use of metadata, the project teams currently focus on building an IMI data catalogue and ontologies – a kind of metadata dictionary. Every academic discipline or field creates ontologies (jargon) to limit complexity and organize data into information and knowledge. Of course, the FAIRplus project will make good use of these existing catalogues and ontologies. The transcriptomics community for example has already defined a minimum metadata set (the MINSEQE). Uniform adoption of these would make reuse of this type of research data much easier. The squad quickly learned though that, in practice, these community standards are often ignored, with scientists instead developing their own templates. So one future recipe will describe technical solutions for mapping such templates to the community standard. As this project cannot cover all biomedical fields and not all have existing catalogues and ontologies, so another recipe might/could give instructions on how to make your own template.
Networking events for companies and SMEs
Over the course of the project, FAIRplus will organize three networking events for SMEs. One of these events is aimed at biotech companies owning data potentially in need of FAIRification, one at technology companies who are planning to provide FAIRification services, and one at academic groups interested in FAIRification for their data collections.
The first FAIRplus Innovation and SME Forum was held on 29 January 2020 at The Wellcome Genome Campus in Hinxton, Cambridge (UK) (check this video). Participants from biotech companies were clearly impressed by the work the FAIRplus squads had managed in the first year of the project. Their feedback also convinced the FAIRplus members that they should continue their efforts on a clear step-by-step description of the FAIRification process, because it confirmed the need for sound advice and practical solutions for making data FAIR.
Towards a FAIR future
As mentioned before, FAIR should never be a goal in itself. A 100 percent FAIR-score would require far too much time and money. Therefore, one of the pillars of the FAIRplus project is a pragmatic approach to FAIR – define a clear and achievable goal.
Another thing to keep in mind, is that the FAIR score doesn’t tell you anything about the scientific quality of the data. In theory, a dataset can be 100 percent FAIR and at the same time contain unreliable measurements or incomplete data.
Over the next few months, the FAIRplus partners and squads will continue their work on FAIRifying IMI datasets, setting up the IMI data catalogue, composing the FAIR cookbook and recipes, and developing the Capability Maturity Model. So stay tuned for further updates. After all, FAIR is the way forward in (biomedical) research.