One of the solutions that can greatly help you in your research on observational data is OHDSI, an open source platform for Observational Health Data Sciences and Informatics (see this page for an introduction to OHDSI and the concept of observational data). In this blogpost we will approach OHDSI from a scientific angle to show you which type of questions you can answer with the help of OHDSI.
Observational data is becoming more and more popular nowadays:
Researchers try to involve observational data in their science and in their funding applications.
Pharmaceutical companies mention in silico clinical trials.
Epidemiologists use it in populational health research.
Even medical doctors are starting to use observational data when looking for the right treatment for their patients.
It has proven to be more than just a hype. On the contrary: professionals involved in biomedical research are only just starting to uncover the potential of observational data.
The power of observational data
Observational data is medical and healthcare data observed for a particular patient. The main power of observational data lays in its quantity. All of us go to a GP every now and then, have insurance, some of us even participate in scientific and clinical studies. These observational data is already available to use, unlike data from randomized clinical trials, where patients need to be recruited and followed up for a period of time (which can be expensive for biomedical companies and invasive for the patients),. If we store all these observational data in one common format, we will be able to leverage millions and millions of observations. However, there are two pitfalls with the power of big quantities of data:
the shear size of the data requires new technologies to process it efficiently
there are many (unknown) confounding factors that need to be taken into account (e.g., age or gender biases in compared groups).
OHDSI can help you trying to avoid these pitfalls.
The OHDSI community looks at observational data as a medical chart: it reflects a patient’s journey.
The time point of starting treatment X, which is considered as the reference point.
The time period prior to the start of treatment X (including observed conditions, procedures the patient went through, drugs he/she has taken), or baseline time.
The follow-up time, meaning the period after the start of treatment X, in which the patient can expect a certain outcome.
Types of questions OHDSI can answer
Using timelines in patients’ observational data, you can ask three major types of research questions:
1) Questions about observations made in the past for other patients, or clinical characterization types of questions.
These are the questions that look at the data from the baseline time. In other words, these are questions about what happened to the patients:
Which treatment paths did patients with this disease follow after diagnosis?
How many patients experienced a certain outcome after treatment?
Has a combination of condition X (e.g., elevated blood pressure or hypertension) and disease Y (e.g., type II diabetes), mostly observed in male or female patients?
Comparing patient populations from three different countries, in which country is this treatment more often prescribed as the first line of treatment?
What are the most prescribed drugs in my population?
And many more.
The nature of these questions is purely observational or descriptive. You do not need to apply any statistical tests to answer these questions. You are just grouping variables and observations based on certain criteria.
2) Questions about the future that a patient can expect based on the chosen treatment, or patient-level prediction questions.
These are the questions that look at the data from other patients with a similar condition / disease / treatment and making a prediction for a particular patient. After building a prediction model on existing data, we can answer questions like:
What is the probability that this patient will develop the disease?
What is the probability that this patient will experience the outcome?
What treatment option is best for this patient, given his history?
This type of questions -- given what we have seen has happened, can we predict what will happen next -- is not based purely on observations. It includes statistical inference. With these questions you can start building and testing hypotheses. However, they do require a level of statistical modeling.
3) Questions about the reason for a certain observation or outcome, or population-level effect estimations.
With this type of question, you are not just predicting what is going to happen, but also why:
Does treatment X cause this adverse effect more often than treatment Y?
Is treatment X or treatment Y more effective for this disease?
Are males in my population more likely to be affected by outcome Z than females?
Are persons with condition X more likely to also have condition Z? (compared to a cohort of persons without condition X)
These questions make you draw a conclusion on the causal connection based on the observational data. Causal inference requires different statistical modeling methods and testing. It is a powerful tool to understand the relationship between different observations.
Because we are dealing with observational data, we need to be careful and consider confounding factors. The statistical packages developed within OHDSI include tools to correct for this (e.g. propensity score matching).
To conclude:
We hope that this information gives you a better understanding of the kinds of questions you might answer with OHDSI. Hopefully you can make a more informed decision on whether you want to try out OHDSI for yourself.
Please contact us if you want to know more about OHDSI.