Rich metadata in cBioPortal: a step towards FAIRness

The cBioPortal community and The Hyve have always championed Open Source and Open Data. Now that the FAIR principles for data stewardship are gaining momentum, we cannot stay behind! To take a small step towards FAIR compliance, rich metadata functionality has been added in cBioPortal 2.1.0.

Development of this feature was sponsored by one of our pharma clients to keep track of the data provenance for their clinical trials. A structured but flexible approach ensures it can be used in any context. This will take cBioPortal one step closer to enabling FAIR compliance by taking care of the F2 FAIR principle.

Use case

Many of the questions to the cBioPortal community’s discussion group are related to how the publicly available data in the public cBioPortal was preprocessed before being loaded into the portal. Before this feature was released, the metadata fields that were supported only included title, description and a reference to the accompanying publication, if any.

Rich Metadata

cBioPortal is used in many different contexts, with both public and private data coming from a variety of sources. For clinical trials run by pharma companies, different metadata needs to be stored when compared to data coming from public sources. To support all use cases, we needed to support metadata in any structure and we decided not to have any constraints. We do allow users to implement their own constraints on the metadata that they associate with their data to achieve uniformity for their study metadata.

How does it work?

It is quite simple actually. Metadata can be included when loading the study and will be shown in the cBioPortal front end. The supported formats for importing the metadata are JSON (for metadata that is automatically generated) and YAML (for metadata that is created by human beings).

The image on the left side shows the metadata that is imported, the image on the right shows the result rendered in the front end.

Next step

The next step would be to create metadata for all the public studies that are in the public portal. This would be a task for the data curators, as they know the data and metadata inside out and can retrace the steps taken before loading the data into cBioPortal. This way the data provenance can be recorded in a structured fashion.

cBioPortal cBioPortal

As a globally recognised leader in cBioPortal installations, The Hyve offers unparalleled expertise in managing and utilising large-scale biomedical data. Since joining the cBioPortal open-source community in 2015, The Hyve has actively contributed to the platform's development, overseeing the most active cBioPortal installations worldwide.

These services are tailored to meet the needs of a diverse clientele, including pharmaceutical companies, hospitals, data providers, and research organisations. Each solution enhances the analysis and visualisation of cancer genomics datasets, supporting research and clinical decision-making.

Read more
Written by

Sjoerd van Hagen

Need help with cBioPortal feature development?

The Hyve provides services to develop and improve features in cBioPortal. Implemented features are released to the community via the cBioPortal repository on GitHub. For inquiries on cBioPortal feature development projects or other services around cBioPortal, feel free to contact us via this form.

Fill in the form and we will get in touch