An American scientist has incited a new skirmish over the origin of the coronavirus, reporting that he has retrieved potentially significant genetic data about SARS-CoV-2 that had been stored and later deleted from a digital archive at the National Institutes of Health.
Jesse Bloom, a computational biologist at the Fred Hutchinson Cancer Research Center in Seattle, posted his findings on the preprint server bioRxiv, where papers that have not yet been peer-reviewed or published in a journal have been landing by the thousands since the start of the pandemic.
The scientific significance of Bloom’s research remained unclear Wednesday, but it stirred instant online reaction, favorable and unfavorable alike, among scientists who have been debating the flurry of theories about the initial coronavirus outbreak.
Bloom, who retrieved the data through Google Cloud, does not claim that it advances one theory or another, but he contends it bolsters evidence that the virus was circulating in Wuhan, China, before a December outbreak of COVID-19, the illness caused by the virus, that was linked to a market selling live animals.
What is not in dispute is that the data was deleted from a database at NIH. The data was included in a preprint paper posted in March 2020 and published that June in the journal Small.
NIH released a statement Wednesday saying that a researcher who originally published the genetic sequences asked for them to be removed from the NIH database so that they could be included in a different database. NIH said it is standard practice to remove data if requested to do so. NIH statement did not identify the scientist who requested that the material be excised from the agency’s sequence read archive, known as the SRA.
“These SARS-CoV-2 sequences were submitted for posting in SRA in March 2020 and subsequently requested to be withdrawn by the submitting investigator in June 2020. The requester indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues,” NIH said.
The statement said NIH “can’t speculate on motive beyond a submitter’s stated intentions.”
Bloom said in an email to The Washington Post that he was not accusing NIH of wrongdoing. But Bloom’s online paper suggests the deletion of data violates scientific norms and the code of trust essential to science. On Twitter, Bloom said the data was also taken down from a Chinese database.
“[T]he current study suggests that at least in one case, the trusting structures of science have been abused to obscure sequences relevant to the early spread of SARS-CoV-2 in Wuhan,” Bloom wrote.
Efforts by The Post to reach the senior author of the sequencing paper have been unsuccessful.
Robert Garry, a Tulane University virologist who co-authored an influential March 2020 paper saying SARS-CoV-2 was a natural and not engineered virus, took issue with the new Bloom paper.
“Jesse Bloom found exactly nothing new that is not already part of the scientific literature,” Garry wrote in an email. He called the Bloom paper “inflammatory.”
Bloom is no stranger to the virus-origins debate. He was the lead author of a letter to the journal Science, signed by an additional 17 prominent scientists, that last month criticized a World Health Organization probe into the origins of the virus. The letter called for a deeper investigation of the “lab leak” hypothesis, which asserts the coronavirus — accidentally or by design — potentially slipped out of a laboratory in Wuhan.
Stanford University microbiologist David Relman, another organizer of that letter, said of Bloom’s findings: “It shows how critical it is that early data be sought, preserved, and shared in trying to infer virus evolutionary paths and origins, since early data are always sparse to begin with, and since analyses are therefore so sensitive to specific data that happen to be available.”
In his paper, Bloom does not claim that the data he retrieved advances the argument for a lab leak or a natural zoonosis.
“This study provides no evidence either way,” Bloom said in an email. “But it does indicate that we probably have not exhausted all relevant data.”
He added, “I think as scientists we really need to focus on the following two questions: How can we get more data? How can we better analyze the data we have?”
Bloom said the deleted sequences he recovered reinforce a notion supported by previous analyses, including a conclusion from the WHO-convened investigation into the virus’ origins conducted earlier this year: The virus probably infected people before the outbreak at the Huanan Seafood Market in December 2019. That spreading event, though large, was not necessarily the first instance of SARS-CoV-2 in humans.
W. Ian Lipkin, a Columbia University epidemiologist, said by email that Bloom’s paper offers “evidence of what many of us speculated — that the virus was circulating before the market outbreak. The retraction of sequence data is unprecedented and must be addressed.”
University of California, San Diego evolutionary biologist Joel Wertheim, who has studied the emergence of the virus in Hubei province, said, “I actually don’t think this study adds much to the origins debate.”
The sequences Bloom analyzed show greater similarities with coronavirus relatives in bats, when compared with the virus that infected many people at the seafood market. But researchers were already aware of two genetic lineages of the coronavirus that spread in Wuhan in January and February 2020, Wertheim said, and “these genome fragments further demonstrate this point.”
Speculation emerged on Twitter on Wednesday that Bloom’s findings could alter the timeline of the virus emergence, but Wertheim said that’s doubtful: “I’m not convinced that this paper makes a strong case for altering our molecular clock estimates, since similar — more complete — data were included in previous studies.”
President Joe Biden has ordered the intelligence agencies to conduct a review of information that could shed light on the origins of the virus. In an interview with Yahoo News published Tuesday, Director of National Intelligence Avril Haines said the ultimate answer might never be found.
“We’re hoping to find a smoking gun,” Haines said, but “it’s challenging to do that,” adding that “it might happen, but it might not.”
Haines said that teams were seeking to collect new intelligence, in addition to taking a fresh look at information that was already gathered.
— — —
The Washington Post’s Shane Harris contributed to this report.