Health officials failed to safeguard trust in Pa.’s coronavirus data, sowing confusion

Spotlight PA is an independent, nonpartisan newsroom powered by The Philadelphia Inquirer in partnership with the Pittsburgh Post-Gazette and PennLive/Patriot-News. Sign up for our free weekly newsletter.

HARRISBURG — Data is difficult.

At its best, it’s a tool for sound policy. Data can cut through noise to help spot trends, like new clusters of the coronavirus. It can prove which communities face the most risk and where supplies are most needed.

At its worst, though, data brings trouble. Without context, it can seem to tell a story that might not be true. A county’s case count on its own is misleading without information about tests. A death toll might be declining, but it won’t account for reporting delays.

Since March, data about the coronavirus has played a profound role in Pennsylvania, underpinning sweeping decisions to close schools, restrict businesses, and separate loved ones. Now, in the early days of the state’s reopening, accurate data is critical to detecting a resurgence.

But over the past three months, the state health department has repeatedly failed to safeguard the public’s trust in its data. While there is no evidence of intentional manipulation, the state’s blunders have created openings for confusion.

A Spotlight PA review of the state’s COVID-19 data practices found:

From the start, Pennsylvania officials failed to acknowledge the full limitations of COVID-19 data. The state has not always clearly explained to the public what numbers mean, what they don’t, and when and why they might change.
On several occasions, the health department has published data without careful vetting. When mistakes have been made, the state has quietly edited information without clearly documenting and communicating the changes.
The state has been opaque about its methods and sources used to compile the data, obscuring the public’s ability to scrutinize the numbers.

Taken together, these shortcomings make it difficult for researchers, policymakers, and the public to get an accurate sense of what’s happening.

“The whole picture isn’t being painted,” said Raeven Faye Chandler, director of the Pennsylvania Population Network, a research center housed at Penn State, “and it’s possible that we could derive results that are more optimistic than they actually are.”

In a statement, the state health department defended its handling of COVID-19 data, saying it has faced immense challenges and has tried to be transparent.

“Funding for public health in Pennsylvania is among the lowest in the country, and some of the work being done … is work that up until this response, had not been possible,” the department said. “We are providing data on a daily and weekly basis that previously took months and sometimes more than a year to compile.”

The statement added that “when there are items that may confuse the public, we are doing our best to explain what happened.”

Chandler, whose work focuses on social demographic and health research, uses the state’s numbers to build a data-driven analysis of COVID-19′s effect on high-risk populations. She said she understands the challenges the health department is up against, “given the unprecedented nature” of the pandemic.

But the problem, she said, is that there has been “no discussion presented to provide clarity” when the state’s methods with data have changed.

“With the lack of information, it’s hard for researchers to understand what’s happening and what the risks are and how that may vary across populations,” Chandler said.

Be clear and transparent

Since the start of the pandemic, Pennsylvania officials have said decisions would be driven by data. They have asked the public to trust the process, saying policies would be fact-based and fair.

Along the way, however, the state has repeatedly struggled to explain how data is being collected and used. Instead, officials have had to play defense to dispel confusion and baseless conspiracy theories after a number or metric has unexpectedly changed.

In April, Gov. Tom Wolf announced a reopening benchmark with clear numbers, easy for the public to track. If a county had fewer than 50 new cases per 100,000 people over the past 14 days, Wolf said, it would be a sign that it’s safe to start reopening.

“We’re going to be applying the metrics that I mentioned — the number of cases per 100,000 people — to make sure that we’re doing this in a data-driven, evidence-based way,” Wolf said at the time.

Local officials and the public latched on to the metric, only to see the state reopen counties that failed to meet it. Wolf later explained the benchmark had always been one of many, and had shifted as the state’s understanding of COVID-19 evolved.

State officials have likewise struggled to explain how they are collecting death data.

As early as March, county coroners raised concerns about the health department’s process, and the lack of a uniform death reporting system led to discrepancies between state and county counts.

At daily press conferences throughout much of April, Health Secretary Rachel Levine discussed the department’s efforts to “reconcile” death data — to merge and vet numbers being collected from multiple systems. That effort is still ongoing, leading to delays between the date deaths occur and the date they are reported publicly.

Even the state’s process of defining COVID-19 deaths became a major point of confusion.

Questions about whether the health department’s count included deaths probably caused by the coronavirus have circled for months. The peak of confusion was in April, when the department announced the addition of probable deaths to the count, then, two days later, said they removed more than 200 of them.

“There wasn’t any clarity, aside from minimal asterisks,” Chandler said.

Currently, the state’s death count does include some probable deaths, according to a department spokesperson. But a member of the public wouldn’t know that. The state releases a daily breakdown of confirmed and probable cases, but does not do the same for deaths.

The state has also faltered in the collection of data about race.

For much of the crisis, it has been mandatory for Pennsylvania health-care providers to report the race and ethnicity of people with COVID-19, but the state still lacks the data for more than half of its almost 80,000 cases.

“It’s a really big deal,” said Chandler. “We need this data to be able to understand how COVID-19 may be disproportionately impacting individuals, particularly those who may be more vulnerable.”

Even when data about race and ethnicity is included, the state doesn’t disclose how it was obtained.

That’s a crucial missing piece of the puzzle, said Bob Gradeck, manager of the Western Pennsylvania Regional Data Center at the University of Pittsburgh’s Center for Social and Urban Research.

As the state tries to understand issues with equity, Gradeck said, “it’s important to understand the context” — where numbers about race and ethnicity come from and whether the collection process has been consistent across the state.

Vet data before it goes public

On June 8, the health department debuted a new data dashboard. At its launch, the dashboard’s number of tests appeared impossibly high. The trendline showed a jump overnight from about 12,000 tests administered one day to 26,100 the next.

But the data was wrong.

After Spotlight PA inquired, Nate Wardle, a spokesperson for the department, acknowledged the error, saying the second day’s number should in fact be 9,410. Hours passed before it was corrected. The situation is just one of many.

In mid-May, when the health department published a long-awaited list of nursing homes with outbreaks of COVID-19, the numbers were immediately contested. Without disclosure or acknowledgement, the department began quietly correcting issues. Days later, they admitted to some problems.

Complexities and shifts “should be expected,” said Gradeck, of the Western Pennsylvania Regional Data Center. “It’s not surprising that the numbers change.”

But if you’re clear about the data’s limitations from the start, he said, you avoid “setting yourself up for a gotcha moment.”

And with data constantly revised, it’s important to provide historical numbers, said Coral Sheldon-Hess, a professor of computer information technology and data analytics at the Community College of Allegheny County.

People analyzing Pennsylvania’s COVID-19 data need to know when to “correct any past numbers, to help make predictions better going forward,” Sheldon-Hess said.

But that hasn’t happened in every case.

Since March, the health department has kept an archive page of coronavirus data, publishing daily tallies. But the archive doesn’t disclose when numbers were later corrected, nor does it explain why the department changed its methodology.

What’s more, the health department said June 8 that with the launch of the new data dashboard, it would no longer be posting updates to the archive page. That wouldn’t be necessary, Wardle said, given that the dashboard contained a “graphical depiction” of when COVID-19 cases and deaths occurred.

A day later, after hearing that the dashboard was difficult for some people to use, the department resumed posting to the archive page.

Make data easy to scrutinize

In 2016, the Wolf administration pledged to make government data available and usable to the public.

“One of our most valuable and underutilized resources in state government is data,” Wolf said at the time.

The initiative centered around OpenDataPA, an online portal for data that’s both free for anyone to use and structured in a way that’s easy for computers to process. Think: Excel spreadsheets or CSV files, not PDF files or tables posted on web pages.

The format of data is important, because it sets the stage for what the public can do with it.

“If I have three hours to work on a dataset and I spend two hours just getting that data, my time to explore and understand the data is limited,” said Jacob Kaplan, a postdoc fellow at the University of Pennsylvania, who’s been studying the spread of the coronavirus in prisons.

In the OpenDataPA portal, the catalog has a listing for data about the coronavirus. But the page doesn’t actually contain data.

Instead, it just links to the health department’s COVID-19 website, where data is structured in a way that makes it cumbersome to work with and difficult to analyze.

If Pennsylvania made its source data easily accessible, it could have helped quash concerns last month, when the state said its total count of COVID-19 tests included negative antibody tests, then backtracked on the statement a day later.

The situation raised red flags among epidemiologists, as antibody tests show past infections, not current ones, and, if included, would distort the state’s capacity to detect infections in real time.

But as it stands, Pennsylvania is touting total testing numbers impossible for the public to vet. County-level data currently shows only the number of people receiving COVID-19 tests, without disclosing how many times those people are tested.

Those numbers — exactly how many people are being tested more than once — are “reported internally,” Wardle, the spokesperson, said.

Data visualization by Daniel Simmons-Ritchie of Spotlight PA.

100% ESSENTIAL: Spotlight PA relies on funding from foundations and readers like you who are committed to accountability journalism that gets results. If you value this reporting, please give a gift today at spotlightpa.org/donate.