Enabling data-driven research: Solving language barriers between humans and machines

As a business developer at The Hyve, attending conferences is one of the key elements of my job. Here you hear about the latest trends, get inspired and expand your network. At the start of the coronavirus/COVID-19 pandemic, many in-person conferences were cancelled or postponed. However, now that we are a few months into the pandemic, several conferences are starting to take place as virtual events. This is a huge shift for both organizers and attendees which comes with new challenges and opportunities. Last week, I attended my first virtual conference: the PharmaTec Series. In this blog, I want to highlight my experience.

Pharma Tec series 2020

The PharmaTec Series featured three different programmes: 18th Annual Pharma IT & Data Congress, 4th Annual AI in Drug Development Congress and 2nd Annual SmartLabs & Laboratory Informatics Congress. For me, these three topics merged very well and basically the overall topic was: how to make better use of biomedical data to improve health outcomes.

Honestly, I was sceptical in the beginning about following all the talks from home. Would I stay engaged during the talks with work-related emails coming in and other distractions? Would I be able to interact with other participants? Surprisingly, both conference days flew by! Because all talks were under 30 minutes, it was very easy to stay focussed. Also, the program included many breakout sessions in which smaller groups joined and actively participated in the discussion.

The biggest reason for my enthusiasm is that the discussed topics were very exciting. Obviously, there was a lot of attention for the potential of AI in drug discovery and clinical research. To me, it is always inspiring to discuss this topic because it can really disrupt health care. We are starting to see real-world applications such as AI-designed drugs, finding the right patient population for a particular target and smart image analysis software.

Garbage in garbage out principle

But why do we not see more AI implementations? Is it over-hyped? These questions were addressed in several talks and roundtable discussions. It was very interesting to notice that all attendees strongly agreed: there is still a lot to gain in structuring data (The “garbage-in garbage-out principle”). Every talk mentioned the FAIR principles, stating that your data should be Findable Accessible Interoperable and Reusable, or an alternative abbreviation was used: Federated AI-ready. According to a recent report from the European Commission, which was mentioned by multiple speakers throughout the conference, the biopharmaceutical industry is losing 10 billion euros per year by not addressing the FAIR principles.

So there was a general consensus on the need for FAIR data. How can this be achieved in practice? At The Hyve, we have been helping many different organizations adopt a FAIR data strategy. This challenging process was also addressed in several breakout sessions. It was pointed out by Hans Constandt (the founder of Ontoforce) that this is often a business decision, because where do you begin? Do you select important datasets for FAIRification or convert systems to produce data FAIR-by-design? And does it make sense to update your legacy systems? In conclusion, a FAIR data strategy can look very different for each organization. Our FAIR experts at The Hyve can attest to that. Our FAIR projects and approaches differ greatly at each organization. We can be involved with assessment of the current state of FAIRness, creating an information landscape of all the data-generating systems or creating a semantic model and knowledge graph to describe the data assets.

So adopting a FAIR data management strategy requires effort and technical implementation, but the biggest challenge is the required culture change. Often, the researchers who create the data are not the ones who have an incentive to make it FAIR. Furthermore, secondary use of a dataset may not be clear at the time of data collection. Moreover, it remains a challenge to integrate data sets from fundamental and clinical research because they use different terminologies. Fane Mensah, founder and head of the Computer-Aided Biology, community, stated very nicely that it’s all a language barrier: so we have to make data understandable between humans from all backgrounds and between humans and machines.

All in all, this virtual meeting exceeded my expectations. It was very exciting to discuss with so many like-minded people how data can be leveraged to improve health outcomes. Interestingly, a general trend also seems to be that the pandemic has helped to convince senior management (“never waste a good crisis”) of the importance of good data management, so we can only expect to see more organizations adopting a FAIR data strategy more rapidly. We at The Hyve are very passionate about this process and will continue to support organizations in this journey. So feel free to reach out and discuss opportunities how to best reach your FAIR goals.

Written by

Tess Korthout