Genomics of COVID-19

April 3, 2020 0 By FM

Inside a lipid membrane covered in protein spikes, the RNA of SARS-CoV-2 sits coiled and waiting. This RNA inserts itself into a host cell where it replicates and sheds its copies until the cell eventually bursts to spew the remaining virus. At about 30,000 base pairs in length, the SARS-CoV-2 genome is roughly 2–3 times the size of the typical flu virus, and longer than those of other pandemic viruses such as H1N1, which caused the Spanish flu. Scientists are only beginning to understand how SARS-CoV-2 arose and whether it is still changing. Continued monitoring of the virus genome will be necessary for the development of vaccines and the search for an effective treatment.

SARS-CoV-2: A chimera of two viruses? 

Since the initial sequences of the SARS-CoV-2 genome were published by research teams from the Chinese Center for Disease Control and the Chinese Academy of Medical Sciences at the start of the outbreak, scientists have been trying to trace how the virus came to affect humans. A Lancet publication comparing the virus genomes from patients in Wuhan to a coronavirus found in bats found a 96% similarity between the sequences. However, bats are not sold in the market associated with the outbreak of SARS-CoV-2 and another intermediate host animal was suspected to have passed it on to humans. Then, researchers from South China Agriculture University in Guangzhou announced at a Nature press conference in February that they had found 99% similarity between the virus which infected pangolins and SARS-CoV-2 in humans. Pangolins, a type of anteater sold as exotic meat and prized in Chinese medicine for its armored scales, were now a prime candidate to be the intermediate host. But later that month, Nature had to clarify that the 99% overlap announced was not between the entire genomes of the two viruses but just over a particular receptor-binding domain. The actual overlap between the full genomes of pangolin virus and human SARS-CoV-2 was in the range of 85.5% to 92.4%. This is less than the 96% overlap between bat coronavirus and human SARS-CoV-2, adding to the confusion about the animal origin of the virus.

But now a potential reason for this lack of clarity on the evolution of the virus has come forward. SARS-CoV-2 may actually be a combination of two coronaviruses, one using bats as a host and a second from pangolins, that merged to create a chimera virus. A 99% similarity between the receptor-binding domain of angiotensin-converting enzyme 2 between pangolin virus and human SARS-CoV-2 enables this version, which infects pangolins, to invade human cells, unlike the bat coronavirus. The closer similarity in the overall genome between bat virus and human SARS-CoV-2 suggests that a viral recombination event occurred in an animal that was infected with both pangolin and bat versions of the virus before it spread to humans. Therefore, while more similar to the bat coronavirus, SARS-CoV-2 acquired the receptor-binding domain from the pangolin virus which allowed it to make the jump into infecting humans.

S-type and L-type SARS-CoV-2 strains

Ongoing tracking of SARS-CoV-2 genomics is available on NextStrain, a website which visually maps the virus mutations based on aggregated sequences from Global Initiative to Share All Influenza Data (GISAID). They have found that the virus is mutating at a rate of 24 mutations per year, which is less per base pair than the regular flu. NextStrain builds phylogenetic trees that show the changes in the virus strains across regions. This can be used to understand how a localized outbreak began and can even identify if the virus has been building up in the population undetected.

Earlier in March, there was concern that one strain of SARS-CoV-2 called the L-type was more aggressive than the S-type, based on analysis showing that L-type accounted for 70% of the cases in a sample of 103 genomes collected in Singapore. However, the greater prevalence of a strain is not sufficient evidence that it is more virulent than another. The differences between the RNA sequences would need to affect the proteins created by the virus. So far this change of the S-type has been seen only in Singapore and appears to eliminate production of the ORF8 protein which has an unknown function. The effect of this protein on the virulence of the disease is unclear. As a possible bright spot, research by scientists in Germany indicated that lack of the ORF8 protein actually weakened the growth of SARS in cells, and could have the same impact on SARS-CoV-2. 

Another recent study from International Center for Genetic Engineering and Biotechnology in New Delhi identified a unique mutation in a strain of the virus in India. The paper, which has yet to be peer-reviewed, used a computational approach to predict that such mutations would reduce the stability of proteins made by the virus. However, this theory has not been tested in cell culture and so cannot be used to say conclusively that SARS-CoV-2 is growing weaker. Going forward, these divergences will need to be monitored in case further changes affect the development of vaccines or other treatments.

Potential for vaccines and treatment

The crown of proteins that coat the virus explain its classification as a “coronavirus”. These spike proteins bind to receptors on the surface of the host cell and then inserts the virus RNA. The genes associated with spike proteins were identified in February as was the protein structure, offering a route to developing a vaccine for the virus. Creating copies of these spike proteins with no viral RNA attached could be a way to teach the immune system to recognize the virus.

In terms of treatment, the similarity between SARS-CoV-2 and other human coronaviruses form the basis of predicting which drugs will be effective. Based on how SARS and MERS responded to antivirals, scientists from the Wuhan Institute of Virology tested ribavirin, penciclovir, nitazoxanide, nafamostat, chloroquine along with remdesivir which was used against Ebola and a Japanese flu medication called favipiravir on cells infected with SARS-CoV-2. Out of the compounds tested, chloroquine and remdesivir seem most promising for clinical trials in patients. 

Since chloroquine is already FDA approved for other indications, the clinical trial should be faster, provided they can access enough of the compound to conduct the tests. At the University of Minnesota, a small clinical trial using hydroxychloroquine, which has shown less toxicity than chloroquine, is looking to recruit healthcare workers to participate since they are at a higher risk for contracting the virus. 

Tracking mutations 

As SARS-CoV-2 spreads from continent to continent, keeping track of its mutations will be an ongoing challenge. Most recently, a £20 million consortium was established in the UK, involving the NHS and university partners to continue sequencing the genomes of SARS-CoV-2. This will add to sequencing efforts already underway in Singapore at A*STAR’s Bioinformatics Institute. While the original SARS-CoV-2 sequences were produced by teams in China as part of initial efforts to tackle the disease, the problem is now global in scale and will require a response from multiple governments.



Other Articles

The author has a Master’s in Biotechnology from the University of Pennsylvania.

—Article first published on, republished with additional inputs from the author. For references and resources