The cBioPortal community and The Hyve have always championed Open Source and Open Data. Now that the FAIR principles for data stewardship are gaining momentum, we cannot stay behind! To take a small step towards FAIR compliance, rich metadata functionality has been added in cBioPortal 2.1.0.
Development of this feature was sponsored by one of our pharma clients to keep track of the data provenance for their clinical trials. A structured but flexible approach ensures it can be used in any context. This will take cBioPortal one step closer to enabling FAIR compliance by taking care of the F2 FAIR principle.
Use case
Many of the questions to the cBioPortal community’s discussion group are related to how the publicly available data in the public cBioPortal was preprocessed before being loaded into the portal. Before this feature was released, the metadata fields that were supported only included title, description and a reference to the accompanying publication, if any.
Rich Metadata
cBioPortal is used in many different contexts, with both public and private data coming from a variety of sources. For clinical trials run by pharma companies, different metadata needs to be stored when compared to data coming from public sources. To support all use cases, we needed to support metadata in any structure and we decided not to have any constraints. We do allow users to implement their own constraints on the metadata that they associate with their data to achieve uniformity for their study metadata.
How does it work?
It is quite simple actually. Metadata can be included when loading the study and will be shown in the cBioPortal front end. The supported formats for importing the metadata are JSON (for metadata that is automatically generated) and YAML (for metadata that is created by human beings).
The image on the left side shows the metadata that is imported, the image on the right shows the result rendered in the front end.
Next step
The next step would be to create metadata for all the public studies that are in the public portal. This would be a task for the data curators, as they know the data and metadata inside out and can retrace the steps taken before loading the data into cBioPortal. This way the data provenance can be recorded in a structured fashion.