The Opportunity of Generating RWE from RWD
In the life sciences sector, real-world data (RWD) has emerged as a cornerstone for generating real-world evidence (RWE), offering insights that transcend the controlled environments of clinical trials. RWD encompasses data collected from diverse sources such as electronic health records (EHR), insurance claims, patient registries, and a growing range of wearable devices and health/outcome-related questionnaires, reflecting the complexities of healthcare delivery and patient experiences in everyday settings.
By leveraging RWD, researchers and decision-makers can produce RWE that informs regulatory decisions, supports drug development, enhances patient outcomes, and drives personalized medicine. This evidence is particularly valuable for assessing long-term safety, effectiveness, and healthcare resource utilization, providing a more comprehensive understanding of interventions as they are applied in diverse, real-world populations.
The Challenge in Accessing and Processing RWD
However, there are significant challenges to overcome in utilizing RWD to generate RWE that stem from the diversity, quality, governance, regulatory and privacy constraints of the data. RWD is often collected from disparate sources, which can lead to variability in data formats, completeness, and accuracy. Integrating these heterogeneous datasets requires robust standardization and interoperability, which remain resource/cost intensive and sometimes technical hurdles. Additionally, missing or incomplete data and biases inherent in observational data can compromise the reliability and validity of RWE. Regulatory and privacy constraints further complicate access and use of RWD, as data must be anonymized and compliant with regional laws like GDPR or HIPAA.
Addressing these challenges requires advanced analytics, secure data-sharing frameworks, and collaborative efforts between stakeholders in the healthcare ecosystem. These many challenges give rise to very complicated, lengthy and expensive processes to access and utilise RWD, which slows down research and often delays access to innovative therapies as well as increases the cost to deliver good quality healthcare.
The solution moving forward will be a Federated one
Federated approaches have emerged as transformative tools that help solve many of the challenges we face today in accessing and analyzing RWD. Federated Analysis and Federated Learning are two related yet distinct techniques enabling distributed data processing without centralizing sensitive information.
Federated Analysis (FA) is the process of performing statistical analysis or querying across distributed datasets without requiring data to be moved to a central repository. The data remains in its original location (e.g., hospitals, businesses, or devices), and only aggregated results or insights are shared. This means the data always remains at the source location and is managed by a “data custodian” (e.g., a hospital). This significantly reduces the risk of breaches that expose personal identifiable information (PII), which can be very problematic and costly for the organizations involved, and the people/patients involved. An important key feature of a federated analysis, is that only anonymized insights or aggregate statistics are shared beyond the walls of the data custodian. Raw data is never exposed and all PII is never made accessible and excluded from any aggregate results or insights shared.
An example of a Federated Analysis approach in life sciences would be multiple hospitals collaborating in the form of a federated network to identify trends in disease progression for epidemiological studies. Using federated analysis, each hospital can compute summary statistics (e.g., average age of patients with a condition) locally and contribute to a centralized repository for combined analysis, ensuring patient data privacy is upheld.
Federated Learning (FL) is a machine learning technique where a model is trained collaboratively across multiple devices or servers holding decentralized data. Instead of transferring raw data, the local devices compute model updates, which are aggregated centrally to improve the global model. Model training happens across multiple data nodes in a federated network with each participant training the model locally using their private data. Model updates (e.g., gradients or weights) are then sent to a central server for aggregation, which are in turn used in applications that leverage these improved models.
Examples of federated learning in the life sciences can commonly be found in drug discovery and personalized medicine that often involve multiple pharmaceutical companies and research institutions collaborating to develop machine learning models for drug discovery, while keeping their proprietary data private. In a personalized medicine use case, federated learning would be used to train a diagnostic AI model across hospital systems within a federated network for early disease detection and/or prescribing personalized treatments.
In conclusion, Federated networks provide two distinct federated approaches: Federated Analysis and Federated Learning. Both address the growing need within the life sciences community to leverage the growing opportunity for RWD to generate RWE in a secure, privacy-preserving manner. Federated Analysis focuses on deriving statistical insights, while Federated Learning enables collaborative model training. These federated approaches can be actioned within a federated network without raw data ever leaving its source location. These federated approaches, collectively represent the future of secure, distributed data processing.
If you would like to learn more about how The Hyve can help your organization implement a future proof RWD strategy through a federated approach, feel free to contact us at marketing@thehyve.nl or via our website contact form.