Visualizing and analyzing large-scale cancer genomic datasets
cBioPortal is an interactive open-source platform designed to visualize and analyze genomic data in a user-friendly way. It provides researchers and clinicians an insight in large scale cancer genomics data sets to help them create and select better treatments for patients.
We have been working to support a variety of organizations that use cBioPortal, including Dana-Farber Cancer Institute, National Cancer Institute, Berlin Institute of Health, the Netherlands Cancer Institute and a number of major pharma organizations. Deployment, data loading, development, consulting and training are among our provided services.
Key reasons to use cBioPortal
Local instance and internal data
By using an internal instance of cBioPortal, it is possible to use the full functionality of the public version with private data, in addition to the public datasets from cBioPortal Datahub. Many customization options are available, such as having custom gene sets and patient/case sets on the Query Page and using custom driver mutation annotations.
The OncoPrint visualization displays an overview of the genetic alterations in a study for a selection of genes. Using distinctive color coding, the genetic alteration type is highlighted per sample. Data from clinical variables can be added to correlate it with the molecular data and various sorting options are available to reorder the patients.
Integration with external databases
cBioPortal is integrated with external databases such as OncoKB, CIViC and Cancer Hotspots. These resources provide valuable knowledge on biological function, clinical implications, prognosis and possible treatments of genetic alterations. Our team has implemented support to include custom annotations from internal data.
The Hyve is specialized in deploying custom installations for production environments that require a specific authentication and authorization setup. Using Keycloak we provide solutions to control the access of the local cBioPortal instance or particular studies. With Keycloak it is also possible to integrate with existing LDAP services with standard protocols (SAML, OAuth and OpenID Connect).
cBioPortal provides an overview per study which can be used to select patients based on clinical variables, mutated genes or survival status. This includes a list of the most frequently altered genes, Kaplan-Meier survival curves and plots to describe any categorical or numerical clinical feature.
Often from the same patient, multiple samples are taken at different timepoints. In the Patient overview in cBioPortal it is possible to visualize the order of events on a timeline. This can be used to visualize sampling date, along with any other events that might have been measured, such as treatments, imaging date or date of surgery.
cBioPortal is particularly useful for sharing research data with co-workers and collaborators. Using sharable session IDs it is possible to bookmark and share specific findings. The various download options make it possible to easily download a result as figure for publication in a journal.
Data extraction, transformation and loading
With the explosion of biomedical data came the growth of different data formats. The Hyve provides solutions to automate conversion of local data to a cBioPortal friendly format. Our team consists of a number of bioinformaticians that have experience with data from different genomic sources, such as targeted gene panels, fusion assays and RNA sequencing.
Features developed by The Hyve
As open-source software company we contribute our developed features to the community. These are some of the features that were developed in collaboration with our clients and can be found in the public version of cBioPortal.
Custom driver mutation annotations
The OncoPrint view in cBioPortal is useful to visualize the prevalence of driver mutations in a cohort. OncoKB and CancerHotspots are the default option to define driver mutations. A client asked us to incorporate the option to use internal annotations loaded from the data instead of using external resources. The Hyve implemented both a Driver/Passenger option and a tier system which can be used to indicate the actionability of specific mutations.
cBioPortal for mouse
We implemented support for the mouse reference genome in the codebase and created a mouse gene database for cBioPortal. The resulting database is now publicly available on cBioPortal Datahub.
For the Berlin Institute of Health we have developed integration with the Clinical Interpretations of Variants in Cancer (CIViC) database. This is an open-source, open-access, community driven web resource where clinicians can find information on genetic alterations and use this information in decision making for precision medicine. Our feature adds CIViC annotations to the various mutation overviews of cBioPortal.
Gene set enrichment visualizations
Gene sets, also known as gene signatures, can be important biomarkers for specific tumor types. For a pharmaceutical customer The Hyve implemented the ability to select and query gene sets on the Query page, to visualize gene set scores (GSVA and ssGSEA) as a heatmap and plot gene set scores in the Plots tab. A refactored version based on cBioPortal’s new UI will be released soon.
Download data from the OncoPrint
In previous versions of cBioPortal, the OncoPrint view contained a download function that was limited to figures. On a customer’s request, The Hyve added an option to download the visualized data in tabular format.
Other reasons to use cBioPortal
cBioPortal was made open-source in 2015 under the AGPL 3 license. Together with other community members, The Hyve develops cBioPortal features that are publicly available on GitHub. Cross-institute code review is done to ensure high quality code. Regular releases ensure that new features are available to the community as quickly as possible, while frequent bug-fix releases ensure the stability of cBioPortal.
The public version of cBioPortal is visited by more than 30,000 users per month. Local installations are used by numerous academic cancer centers and pharmaceutical companies around the world. Known institutes that use cBioPortal are Memorial Sloan Kettering Cancer Center, Dana-Farber Cancer Institute, Princess Margaret Cancer Centre, Children’s Hospital of Philadelphia, Weill Cornell, Columbia, New York Genome Center, Cancer Center UK, EMBL and Berlin Institute of Health.
The FAIR principles, Findable, Accessible, Interoperable and Reusable, are important guidelines when working with large data amounts of data. Using cBioPortal can help accomplish the goals of these guidelines. For more information take a look at our section about FAIR guidelines and our blogpost on archiving in a FAIR way.
The community around cBioPortal is one of the best examples of a young yet well functioning open-source community. The Hyve joined the community soon after cBioPortal became open-source in 2015. Since then the community has been growing, and at the moment we collaborate with Memorial Sloan Kettering Cancer Center, Dana-Farber Cancer Institute, MD Anderson Cancer Center, Princess Margaret Cancer Centre and many more.