Could SARS-CoV-2 have escaped from a laboratory?
There are precedents for laboratory incidents leading to isolated infections and transient transmission chains, including SARS-CoV
22. Aside from the 1977 A/H1N1 influenza pandemic that likely originated from a large-scale vaccine challenge trial
23, there are no documented examples of human epidemics or pandemics resulting from research activity.
No previous epidemic has been caused by the escape of a novel virus and there is no data to suggest that the WIV—or any other laboratory—were working on SARS-CoV-2, or any virus close enough to be the progenitor, prior to the COVID-19 pandemic. Viral genomic sequencing without cell culture, which was routinely performed at the WIV, represents a negligible risk as viruses are inactivated during RNA extraction
28 and no case of laboratory escape has been documented following the sequencing of viral samples.
Epidemiological modeling suggests that the number of hypothetical cases needed to result in multiple hospitalized COVID-19 patients prior to December 2019 is incompatible with observed clinical, genomic, and epidemiological data
20.
Gain-of-function research would be expected to utilize an established SARSr-CoV genomic backbone, or at a minimum a virus previously identified via sequencing. However, past experimental research using recombinant coronaviruses at the WIV has used a genetic backbone (WIV1) unrelated to SARS-CoV-232 and SARS-CoV-2 carries no evidence of genetic markers one might expect from laboratory experiments
40.
There is no rational experimental reason why a new genetic system would be developed using an unknown and unpublished virus, with no evidence nor mention of a SARS-CoV-2-like virus in any prior publication or study from the WIV32
41,42, no evidence that the WIV sequenced a virus that is closer to SARS-CoV-2 than RaTG13, and no reason to hide research on a SARS-CoV-2-like virus prior to the COVID-19 pandemic.
Under any laboratory escape scenario SARS-CoV-2 would have to have been present in a laboratory prior to the pandemic, yet no evidence exists to support such a notion and no sequence has been identified that could have served as a precursor.
A specific laboratory escape scenario involves accidental infection in the course of serial passage of a SARSr-CoV in common laboratory animals such as mice. However, early SARS-CoV-2 isolates were unable to infect wild-type mice
43. While murine models are useful for studying infection in vivo and testing vaccines, they often result in mild or atypical disease
44–48. These findings are inconsistent with a virus selected for increased pathogenicity and transmissibility through serial passage through rodents.
Although SARS-CoV-2 has since been engineered
49 and adapted by serial passage
50–52, specific mutations in the spike protein, including N501Y, are necessary for such adaptation in mice
51,52. Notably, N501Y has arisen convergently in multiple SARS-CoV-2 variants of concern in the human population, presumably being selected to increase ACE2 binding affinity
53–56. If SARS-CoV-2 resulted from attempts to adapt a SARSr-CoV for study in animal models, it would likely have acquired mutations like N501Y for efficient replication in that model, yet there is no evidence to suggest such mutations existed early in the pandemic. Both the low pathogenicity in commonly used laboratory animals and the absence of genomic markers associated with rodent adaptation indicate that SARS- CoV-2 is highly unlikely to have been acquired by laboratory workers in the course of viral pathogenesis or gain-of-function experiments.
Evidence from genomic structure and ongoing evolution of SARS-CoV-2
Considerable attention has been devoted to claims that SARS-CoV-2 was genetically engineered or adapted in cell culture or “humanized” animal models to promote human transmission
57. Yet, since its emergence, SARS-CoV-2 has experienced repeated sweeps of mutations that have increased viral fitness
58,59. The first clear adaptive mutation, the D614G substitution in the spike protein, occurred early in the pandemic
60,61. Recurring mutations in the receptor binding domain of the spike protein, including N501Y, K417N/T, L452R, and E484K/Q—constituent mutations of the variants of concern—similarly enhance viral infectivity
54,55,62 and ACE2 binding
53,63, refuting claims that the SARS-CoV-2 spike protein was optimized for binding to human ACE2 upon its emergence
56.
Further, some pangolin-derived coronaviruses have receptor binding domains that are near-identical to SARS-CoV-2 at the amino acid level
40,64 and bind to human ACE2 even more strongly than SARS-CoV-2, showing that there is capacity for further human adaptation
65. SARS-CoV-2 is also notable for being a host generalist virus
66, capable of efficient transmission in multiple mammalian species, including mink, tigers, cats, gorillas, dogs, raccoon dogs, ferrets, and large outbreaks have been documented in mink with spill-back to humans
67 and to other animals
68. Combined, these findings show that no specific human “pre” adaptation was required for the emergence or early spread of SARS-CoV-2, and the claim that the virus was already highly adapted to the human host
57, or somehow optimized for binding to human ACE2, is without validity.The genesis of the polybasic (furin) cleavage site in the spike protein of SARS-CoV-2 has been subject to recurrent speculation.
Although the furin cleavage site is absent from the closest known relatives of SARS-CoV-2
40, this is unsurprising as the lineage leading to this virus is poorly sampled and the closest bat viruses have divergent spike proteins due to recombination
15,16,18. Furin cleavage sites are commonplace in other coronavirus spike proteins, including some feline alphacoronaviruses, MERS-CoV, most but not all strains of mouse hepatitis virus, as well as in endemic human betacoronaviruses such as HCoV-OC
43 and HCoV-HKU169–
71. A near identical nucleotide sequence is found in the spike gene of the bat coronavirus HKU9-1
72, and both SARS-CoV-2 and HKU9-1 contain short palindromic sequences immediately upstream of this sequence that are indicative of natural recombination break-points via template switching
72.
Hence, simple evolutionary mechanisms can readily explain the evolution of an out-of -frame insertion of a furin cleavage site in SARS_CoV-2
(Fig-2)
The SARS-CoV-2 furin cleavage site (containing the amino acid motif RRAR) does not match its canonical form (R-X-R/K-R), is suboptimal compared to those of HCoV-HKU1 and HCoV-OC
43, lacks either a P1 or P2 arginine (depending on the alignment), and was caused by an out-of-frame insertion
(Fig. 2).
The RRAR and RRSR S1/S2 cleavage sites in feline coronaviruses (FCoV) and cell-culture adapted HCoV-OC43, respectively, are not cleaved by furin
69. There is no logical reason why an engineered virus would utilize such a poor furin cleavage site, which would entail such an unusual and needlessly complex feat of genetic engineering. The only previous studies of artificial insertion of a furin cleavage site at the S1/S2 boundary in the SARS-CoV spike protein utilized an optimal ‘RRSRR’ sequence in pseudotype systems
73,74.
Further, there is no evidence of prior research at the WIV involving the artificial insertion of complete furin cleavage sites into coronaviruses.
The recurring P681H/R substitution in the proline (P) residue preceding the SARS-CoV-2 furin cleavage site improves cleavage of the spike protein and is another signature of ongoing human adaptation of the virus
75. The SARS-CoV-2 furin site is also lost under standard cell culture conditions
34,76, as is true of HCoV-OC4373. The presence of two CGG codons for arginines in the SARS-CoV-2 furin cleavage site is similarly not indicative of genetic engineering
77.
Although the CGG codon is rare in coronaviruses, it is observed in SARS-CoV, SARS-CoV-2 and other human coronaviruses at comparable frequencies
77. Further, if low-fitness codons had been artificially inserted intothe virus genome they would have been quickly selected against during SARS-CoV-2 evolution, yet both CGG codons are more than 99.8% conserved among the >1,800,000 near-complete SARS-CoV-2 genomes sequenced to date, indicative of strong functional constraints (
supplementary information, Table S1).
Conclusions
As for the vast majority of human viruses, the most parsimonious explanation for the origin of SARS-CoV-2 is a zoonotic event. The documented epidemiological history of the virus is comparable to previous animal market-associated outbreaks of coronaviruses with a simple route for human exposure. The contact tracing of SARS-CoV-2 to markets in Wuhan exhibits striking similarities to the early spread of SARS-CoV to markets in Guangdong, where humans infected early in the epidemic lived near or worked in animal markets. Zoonotic spillover by definition selects for viruses able to infect humans. The laboratory escapes documented to date have almost exclusively involved viruses brought into laboratories specifically because of their known human infectivity.
There is currently no evidence that SARS-CoV-2 has a laboratory origin. There is no evidence that any early cases had any connection to the WIV, in contrast to the clear epidemiological links to animal markets in Wuhan, nor evidence that the WIV possessed or worked on a progenitor of SARS-CoV-2 prior to the pandemic. The suspicion that SARS-CoV-2 might have a laboratory origin stems from the coincidence that it was first detected in a city that houses a major virological laboratory that studies coronaviruses. Wuhan is the largest city in central China with multiple animal markets and is a major hub for travel and commerce, well connected to other areas both within China and internationally. The link to Wuhan therefore more likely reflects the fact that pathogens often require heavily populated areas to become established
20
We contend that there is substantial body of scientific evidence supporting a zoonotic origin for SARS-CoV-2. While the possibility of a laboratory accident cannot be entirely dismissed, and may be near impossible to falsify, this conduit for emergence is highly unlikely relative to the numerous and repeated human-animal contacts that occur routinely in the wildlife trade.
Failure to comprehensively investigate the zoonotic origin through collaborative and carefully coordinated studies would leave the world vulnerable to future pandemics arising from the same human activities that have repeatedly put us on a collision course with novel viruses.