The case for the lab leak theory grows stronger by the day.
The news that the US Department of Energy has changed its mind about the probability of a lab leak as the source of covid-19 on the basis of new evidence has sent shock waves through the scientific community.
It’s fitting that this news broke two days before the 70th anniversary of the discovery of the double-helix structure of DNA, by Jim Watson and Francis Crick, which effectively marked the moment that humanity began to crack nature’s Enigma code and read its messages. More new evidence has been trickling out in the past few months, much of it from inside the codes of virus genomes. Without the ability to decode virus messages, we would still be much more in the dark about what happened in Wuhan in late 2019.
In every living creature, DNA’s messages, spelling out the recipe for making and running the organism, are written in a simple four-letter cipher, A, C, G and T. (Coronavirus messages are written in the almost identical language of RNA, but virologists use the DNA equivalent letters to avoid confusion.) And here’s a short burst of that text that is right at the heart of the evidence for a possible lab leak: cct cgg cgg gca. That code is the recipe for four amino acids in a particular region of the spike protein of the virus: proline, arginine, arginine, alanine, or PRRA.
It turns out that this message is unique to SARS-CoV-2. That is to say if you look at every other sarbecovirus (SARS-like beta coronavirus) ever discovered – and there’s hundreds of them – they all lack this little message in this place. You can line them up and show how the text matches almost perfectly up to that point and after that point, but the 12-letter text has been inserted into just the SARS-CoV-2 genome and into none of the others.
It’s not just any message. It transforms the virus’s ability to infect human cells and is the reason we’ve had a pandemic and not a minor outbreak. That is because a concentration of arginines in this part of the virus spike attracts the attention of a human enzyme, which cuts the spike protein at this point so that it opens up like a flower, priming the virus to infect other cells.
Such “furin cleavage sites” are found in other coronaviruses but never in sarbecoviruses. For reasons we don’t fully understand it is a disadvantage to a sarbecovirus to have such a feature when in its natural habitat, the gut of a horseshoe bat. When the pandemic began various scientists knew that, and they knew something else as well. They knew that virologists had been putting furin cleavage sites into coronaviruses in the lab, sometimes in harmless pseudoviruses but sometimes in real live viruses, sometimes in Wuhan as well as elsewhere, in order to study the effect they had on infectivity.
So when they first saw the genome of the SARS-CoV-2 virus at the end of January 2020, with its furin cleavage site, alarm bells rang and a handful of virologists arranged a secret video conference on 1st February to discuss it. Even more alarming was the fact that the paper from the Wuhan Institute of Virology describing the virus genome did not even mention this feature and truncated a diagram just short of where it would show up. That, says my co author Alina Chan, is like describing a unicorn without mentioning the horn.
But the same scientists who attended that call then drafted a paper in which they dismissed the furin cleavage site as irrelevant by saying that a natural horseshoe bat sarbecovirus would soon be found with a furin cleavage site in it. Well, they were wrong about that. Three years later lots more sarbecoviruses have come to light in bats, and a couple in pangolins, and none has a furin cleavage site. Not one.
OK, they now say, maybe it got there through mutation. Trouble is, a mutation usually changes, adds or subtracts one letter in the code at a time, not 12. Maybe it got there by recombination: two different viruses infect one bat and their genes get muddled up. But it’s pretty unlikely this would happen in just the right spot to create a furin cleavage site and where would it get the text from if other sarbecoviruses don’t have it? Besides, if this part of the genome was a hot spot for recombination, then during the pandemic we would have seen a lot of recombination at this site in human cases, and we have not.
Let me try an analogy. In 1942 Enigma decoders are picking up messages from a hundred different U-boats reporting similar weather in the mid Atlantic. Each is very slightly different but follows the same format, for example: “partly cloudy, pressure high, wind force five, temperature about 8 degrees”. One of the submarines sends a slightly different message: “sun shining, pressure high, great to have the Fuhrer on board, wind force four, temperature about 7 degrees”. Yet when the code breakers reported these messages to their superiors, they omitted to mention the bit about the Fuhrer.
There is a further peculiarity. In the genetic code used by all organisms, there are six different ways of spelling out the code for “arginine”: cgt, cgc, cgg, cga, aga, agg. Human bodies use cgg for arginine fairly often. Viruses rarely do: 5% of the arginine codons in SARS-CoV-2 are coded by cgg. So what are the chances of two cggs turning up next to each other in a brand new chunk of inserted text in a sarbecovirus? Low. But virologists playing with furin cleavage sites have been known to do what is called “codon optimisation” and use human-friendly versions of text.
Last week, a paper appeared from six experts in bioinformatics and virology that has further stirred up some bats in the molecular belfry. It’s a fine example of Bletchley-like detective work and it may or may not shed light on what happened in Wuhan in 2019. The title of the paper, which has not yet been peer reviewed, reads: “Discovery of a novel merbecovirus DNA clone contaminating agricultural rice sequencing datasets from Wuhan, China”.
When scientists use gene sequencing machines, they sometimes fail to clean them properly. Then when they dump their data into public databases, contaminant sequences from previous use of the same machines show up. In this case some scientists at the Huazhong Agricultural University in Wuhan were sequencing rice genes in January 2020 but their data included fragments of DNA from one or more bat coronaviruses. These had probably been left there after previous use of the same machines for a different project.
Yuri Deigin and his colleagues were able to piece together a whole coronavirus genome from these fragments and it proved to be a novel virus, closely related to HKU4, a bat virus that is related to the one that causes the lethal disease called MERS, which caused a brief but frightening outbreak in 2012 in Saudi Arabia. These “merbecoviruses” are different from sarbecoviruses so this experiment could not itself have caused the pandemic, but it exposes some intriguing facts.
What was unusual about this new virus was that it was in the form of a “bacterial artificial chromosome” – that is to say, it had been put together in a lab in such a way as to be grown in a bacterium. Indeed, when they searched for evidence of bat genes in the samples they found none: this virus may have originated in a bat, but the sequence itself was put together in the lab in bacteria and in DNA form, not RNA form. Still more suspiciously, they found evidence that the spike gene of MERS itself had been inserted into this virus genome in some cases.
In other words, somebody had been doing an experiment with a brand new, unpublished, virus from bats in which they inserted the dangerous spike gene from a MERS virus. One of the arguments used by those defending the Wuhan laboratories has been that they would have published the genomes of any viruses they were working on. Well, just across town we now know that one of their partners was working on a novel, unpublished bat virus and inserting a MERS spike into it.
The Huazhong Agricultural University had already done an experiment with a pig coronavirus that involved inserting a furin cleavage site into a coronavirus. And Shi Zhengli at the Wuhan Institute of Virology – the person running the biggest bat sarbecovirus lab in the world – had in 2015 published a paper in which she and colleagues in Minnesota had engineered a new furin cleavage site into the HKU4 bat virus to enable it to enter human cells. Now here’s a neighbour and partner of the WIV engineering a MERS spike (which has a furin cleavage site anyway) into another HKU4-like virus in 2020.
Shi Zhengli insists that she never did a similar experiment on sarbecoviruses though it was the obvious nest thing to do. Which is surprising because in September 2021 a document was leaked in the US that strongly implied otherwise. It was a grant application, called the Defuse proposal, written in 2018, to which Dr Shi was a signatory, from her closest US funding partner, the EcoHealth Alliance to the US Department of Defense. Between them they planned to add furin cleavage sites to novel sarbecoviruses in the coming years. It did not get funded, but there’s every chance the work went ahead anyway with the much larger funding stream coming from the Chinese Academy of Sciences.
Amazingly, the EcoHealth Alliance had never bothered to tell us about this document, one of many shocking omissions and obfuscations by them. The document is effectively a description of a plan to engineer a SARS-CoV-2-like virus for the first time.
To summarise. A bat coronavirus pandemic began in the city with the biggest bat coronavirus lab in the world, a long way from where those viruses are found naturally. It was caused by the first and so far only sarbecovirus with a furin cleavage site in it, a feature that had been inserted into other coronaviruses nearby, and that had been planned to be inserted into a sarbecovirus for the first time. And the lab in question has refused repeatedly to publish a list of all the viruses its possesses. Oh, and the other possible cause of the pandemic – an infected animal in a market – has still never shown up.