Rich metadata in cBioPortal: a step towards FAIRness

The cBioPortal community and The Hyve have always championed Open Source and Open Data. Now that the FAIR principles for data stewardship are gaining momentum, we cannot stay behind! To take a small step towards FAIR compliance, rich metadata functionality has been added in cBioPortal 2.1.0.

Development of this feature was sponsored by one of our pharma clients to keep track of the data provenance for their clinical trials. A structured but flexible approach ensures it can be used in any context. This will take cBioPortal one step closer to enabling FAIR compliance by taking care of the F2 FAIR principle.

Use case

Many of the questions to the cBioPortal community’s discussion group are related to how the publicly available data in the public cBioPortal was preprocessed before being loaded into the portal. Before this feature was released, the metadata fields that were supported only included title, description and a reference to the accompanying publication, if any.

Rich Metadata

cBioPortal is used in many different contexts, with both public and private data coming from a variety of sources. For clinical trials run by pharma companies, different metadata needs to be stored when compared to data coming from public sources. To support all use cases, we needed to support metadata in any structure and we decided not to have any constraints. We do allow users to implement their own constraints on the metadata that they associate with their data to achieve uniformity for their study metadata.

How does it work?

It is quite simple actually. Metadata can be included when loading the study and will be shown in the cBioPortal front end. The supported formats for importing the metadata are JSON (for metadata that is automatically generated) and YAML (for metadata that is created by human beings).

The image on the left side shows the metadata that is imported, the image on the right shows the result rendered in the front end.

Next step

The next step would be to create metadata for all the public studies that are in the public portal. This would be a task for the data curators, as they know the data and metadata inside out and can retrace the steps taken before loading the data into cBioPortal. This way the data provenance can be recorded in a structured fashion.

cBioPortal cBioPortal

The Hyve manages the largest number of active cBioPortal installations in the world, for a wide variety of clients, including pharma companies, hospitals, research institutes, data providers and research collaborations. Our contributions to the open-source code base can be found in our articles.

Read more

Need help with cBioPortal feature development?

The Hyve provides services to develop and improve features in cBioPortal. Implemented features are released to the community via the cBioPortal repository on GitHub. For inquiries on cBioPortal feature development projects or other services around cBioPortal, feel free to contact us via this form.

Fill in the form and we will get in touch