Using health data to predict the next wave of COVID-19
By definition, a pandemic is a new disease outbreak. How do you predict its path without any historical data to rely on?
Paul Nielsen | May 28, 2020
Many teams from across academia, government and the health industry have joined the race to predict the next wave of COVID-19. And when it comes to disease forecasting, this kind of friendly competition is quite healthy. That’s because it takes large amounts of data from a variety of sources and locations to build the most accurate predictions.
But how do you even begin to build a forecast for a pandemic?
When it comes to artificial intelligence (AI), today’s cutting-edge forecasting tools use machine learning, a type of AI that relies on historical data to make predictions for the future.
But unlike epidemics — which come and go, leaving useful information in their wake — pandemics offer little historical data. By definition, a pandemic is the worldwide spread of a new disease. That means that, at least in the beginning, there is no data with which to build and train a model.
With so many unknowns, how can you predict a pandemic’s future? And what can you do with the information once you have it?
Past data become ingredients for prediction
When COVID-19 emerged, we turned to our Optum Flu Forecast for answers. The team here set out to turn this epidemic predictor — born out of a relationship with the Delphi research group at Carnegie Mellon University (CMU) – into an important, early warning system for COVID-19.
Roni Rosenfeld, PhD, co-leads the Delphi group at Carnegie Mellon University (CMU) and likens disease forecasting to the way we forecast the weather. Just like weather forecasting, Dr. Rosenfeld says, disease forecasting tells you the likelihood of an event.
But, he says, there’s one huge difference.
“In weather forecasting, you know the current situation exactly. Temperature and other constant streams of measurement come from all over the planet, and forecasters are free to concentrate on the future.”
When it comes to disease, says Dr. Rosenfeld, there’s less accuracy.
“Not only do we not know current levels of a disease,” says Dr. Rosenfeld, “we don’t even know past levels with great accuracy.”
Though over time — and through disease surveillance reports that are submitted and amended — our uncertainty grows smaller.
To build a COVID-19 dashboard in such a way that it could make accurate predictions, our team first had to understand the current burden of disease in a given location at a given time. This is called “nowcasting,” and the data sources used to build a nowcast are what ultimately train computer models to forecast.
Thanks to a global effort, work that normally would take six months was accomplished in just four weeks.
The data science behind COVID-19 predictions
“Flu is well studied,” says Danita Kiser, PhD, Optum director of product research and emerging technologies. “We have a ton of data — available publicly and within Optum data stores —that’s used to build signals for our Flu Forecast.”
“We knew that COVID-19 had similar coding practices as the flu. So, we were able to look back at these data — starting ‘pre-war’ against COVID-19 to current day — to find patterns in coding practices and build new signals.”
These signals formed the foundation for the Optum COVID-19 dashboard, which launched at the end of March. This AI-powered tool uses anonymized private and public data to predict the timing and location of COVID-19 outbreaks at the state and county level.
In addition to de-identified data from electronic health records (EHRs), pharmacy claims and medical claims, the dashboard pulls in data streams from CMU’s COVIDcast, which displays real-time information derived from multiple partners. Soon, COVID-19 testing data will be added.
All these streams are ingested by machine learning models that look for patterns. The models then make inferences and predictions based on the patterns they see.
To make sure we are on the right track, our teams review forecasts and, when prediction timelines have passed, look back to be sure data correlate with what occurred.
What makes data reliable?
Precision and latency are critical to reliable forecasting. Claims, by default, have a time lag associated with them. That’s where Optum EHR data comes into play. EHR accuracy and timeliness (in some cases, down to the minute) paint a clear picture of hospital admissions, diagnoses and several other metrics.
But EHR or claims data alone will not create a robust enough COVID-19 forecast. The Optum COVID dashboard also incorporates signals from CMU’s COVIDcast data streams, which include self-reported data from online surveys and trends in search queries made available to the Delphi Group by Google.
Because each of these sources has inherent bias, Dr. Rosenfeld and the team cleanse the data.
For some sources, like search queries, they study error patterns, then construct a matrix that cancels out those errors. For other sources — like surveys where respondents are asked to self-report symptoms — they look only for changes over time.
The benefits of an early warning system
A lot of work goes into creating a system like the Optum COVID-19 dashboard. But it’s an essential part of our toolkit as we help the individuals and health care professionals we serve as they try to prepare.
With access to COVID-19 predictions, supply chains, clinics and pharmacies can staff up and stock up. Care managers and nurses can proactively reach out to targeted, at-risk individuals to help ensure they get the support they need. And leaders and policy makers can make better informed decisions.
We’ve seen firsthand the value of getting forward-looking insights into professionals’ hands more quickly. While the tool offers a short forecast now — just two weeks compared to the six-week forecast available for flu — that lead time will continue to grow as the machine learning models ingest and learn from more and more data.
That two-week window is already a major step up; we began with a two-day view, achieved a five-day forecast by the end of April and have expanded to a 14-day view just two weeks later.
Along with insights from traditional epidemiological models, machine learning-driven predictions will become important data points for leaders to consider as they debate public health restrictions and guidelines, or if subsequent new waves of COVID-19 outbreaks develop.
AI as the way forward
There are many unknowns about COVID-19, but one thing seems certain: it’s likely not going away anytime soon.
The use of AI in health care had already begun to trend higher. The 2019 OptumIQ™ Annual Survey on AI in Health Care found that 62% of respondents had implemented an AI strategy — an increase of nearly 88% from 2018.
It’s possible that COVID-19 has brought new urgency and appreciation for what AI can bring to health care. As bad as the pandemic is, my hope is that one of its enduring legacies will be how it has changed the pace at which our health system embraces new technologies. Insights and efficiencies gained from responsible and smart applications can truly enable professionals across our industry to do their best work.
Additional stories around the industry response to COVID-19 and our efforts to confront current challenges can be found on Optum Community Circle. You can also find more perspectives on enabling health care innovation on our data, analytics and technology blog.
Paul Nielsen
Vice President, Advanced Technology Collaborative, Optum
A global executive with more than 25 years of technology and business experience, Paul Nielsen leads strategic programs in Optum technology and directs the Optum College of Artificial Intelligence.
His diverse experience includes designing and developing software and hardware solutions for the telecommunications, IT service provider, health care and CE industries, and building professional relationships with CEOs, CTOs and managing partners at venture capital investment firms.
Nielsen earned a Bachelor of Science degree in science and computer science from the University of Massachusetts at Amherst.
Sign up for updates
Receive fresh perspectives and expert advice on data, analytics and tech innovation in health care.