Sars-CoV-2 Covid-19 Part 3, Spike Protein Mutation

Mutation is random; natural selection is the very opposite of random
~ Richard Dawkins

Mutations merely furnish random raw material for evolution, and rarely, if ever determine the course of the process.
~ Sewall Wright

A specific mutation in the Sars-CoV-2 genome has been in the news lately. This preprint by Korber et al. describes a mutation, D614G, in the Sars-CoV-2 spike protein. The D614G notation describes a change from the reference sequence Aspartic Acid base to a Glycine at position 614 in the protein amino sequence. This is a non-synonymous substitution caused by a single nucleotide change from an A to a G at position 23,403 in the genome. The position refers to the location in the reference sequence, NC_045512. The article makes some strong claims, including that this is a new strain and that the mutation make the virus more transmissible and perhaps more deadly. The paper was subject to some debate on scientific twitter and blogs, with a fair amount of push back on the claims.

Spike protein mutations are intrinsically interesting because the spike protein is directly involved with binding to the the host cell surface and is the target of many vaccine and antibody therapies under development. As might be expected, media, individuals, governments, and everyone's sister are scouring the preprint servers for news and research about COVID-19. The L. A. Times reported on the Korber et al. paper in an article titled Scientists say a now-dominant strain of the coronavirus could be more contagious than original. Notice, the could in the headline. That mild qualifier was often lost as news of the article spread on Twitter and Facebook.

I'm suspicious of the claim that the mutation makes the virus more transmissible than the original pre-mutation version. The preprint makes this claim based on the notion that the mutated version of the virus appeared in Europe at the end of February and spread quickly through Europe, North America, and Australia. They even call it a new strain of the virus. The mutated virus dominates recent sequence data from the US and there are some PCR results that may point to increased infectivity.  One problem with this idea is that the spread may be due to a founder's effect caused by an individual with the mutated virus being an early spreader in Europe. Given the high degree of travel from Europe to North America, once established in Europe, the mutated virus would inevitably spread to the US and Canada. However, in the absence of cell culture and clinical data showing increased transmissibility, claims that this virus represents a new and more dangerous version are premature. Trevor Bedford's twitter feed does a good job of explaining the pros and cons of the study. The Atlantic has done an excellent job of reporting on COVID-19 and provided a summary of the article and what other scientists are saying about it.

We had been looking at this mutation previously, trying to see what effect this mutation had on the spike protein structure (more on that at another time). In order to get a snapshot of the spread of the mutated virus, we downloaded 1623 complete Sars-CoV-2 genomes from NCBI Virus on May 7, 2020. After removing duplicate sequences, we were left with 1477 sequences. These sequences were aligned with Mafft version 7.455. We processed the data to find positions that varied across the sequence. In the aligned data,  A (amino acid D) appears 637 times and G (amino acid G) appears 836 times. The distribution of mutation with time is shown below.


N indicates an indeterminate nucleotide call and R is either an A or a G. The G mutation starts showing up in this data in a sample taken in Florida on Feb. 28. In the data, there is one sequence with a G labeled 2020-02-01, but this is an anomaly caused by arbitrarily assigning an arbitrary date to a sequence with a collection data of 2020-02, i.e. no day was listed for the date. From 2020-02-28, the G mutation begins to dominate in the data. However, this portion of the  data is dominated by USA data which predominately contains the G mutation.

The data and code can be downloaded here.



No comments:

Post a Comment