Can Python Help fight Ebola?

Ebola is big news. Governors are ignoring public health officials and the White House. News media publish sensationalist headlines.

From CNN we have:  "Ebola in the air? A nightmare that could happen". Is that a possibility? Is it likely? We'll get to that in a moment. The question at hand is what, if anything, does Python have to do with Ebola? 

The university where I'm teaching has asked us to devote some time to discussing the Ebola epidemic, so I gave some thought about how programming can contribute to dealing with the problem.. I won't go into detailas about the epidemic here. If you want to find facts, I suggest staying away from the news and going to NIH or NIAD. The best popular account of how health officials and scientists are trying to deal with epidemic that I have encountered is from the New Yorker.  As part of the class I teach, I decided to take a brief look at whatever kind of Ebola genomic data was available and see if I could gain any insight into how Ebola functioned and if I could use it as an example of real world Python use.

Ebola is caused by Ebola virus. The virus a member of the Filoviridae family of viruses. Its proper name is Zaire ebolavirus.  It has a negative sense RNA genome that contains 7 genes coding for 7 proteins. The genome is only about 18K nucleotides, amazingly small for something that deadly. 

Here is the layout of the genes from the UCSC Ebola browser.


Of those genes, VP35 is apparently the most likely vaccine target. It is involved in viral replication. The nucleocapsid composed of a helical single stranded RNA genome wrapped around proteins NP, VP35, VP30, and L. The capsid is studded with viral glycoprotein (GP) spikes viral proteins VP40 and VP24 sit between the nucleocapsid and the envelope.

To try to answer the question about whether Ebola could become an airborne virus like chickenpox, we need to examine how the virus is mutating. To gain some insight into that question, I downloaded 99 complete Ebola virus genome sequences collected from human hosts between 1/1/2014 and 10/25/2014 from the NCBI Virus Variation site. I downloaded the data in FASTA and GenBank formats.

I used Muscle to align the full genomes. The initial alignment in Clustal format is here. Viewing the alignment with JalView revealed that position 10218 in the alignment appeared to have high variability. Is this significant? To get at that question, I used BioPython to extract the gene sequences for each protein from the GenBank files and align the gene sequences. Next I calculated the entropy of each aligned column for each gene alignment in a manner similar to this. Entropy is a measure of variability. A column with high entropy has high variability. Most columns in the gene alignments have entropy = 0, among the gene alignments.  The maximum entropy was 0.44 at position 2519 in the L gene or 10 differences in 99 sequences. In the L gene this was 9 T to C changes (actually U to C) or a CCU to CCC codon change. The codon change was a synonymous mutation, not changing the gene sequence.

I haven't yet examined the other the variable positions yet, but I don't expect to see many surprises. My guess is that Ebola 2014 is not mutating as rapidly as we saw in inflenza in the 2009 pandemic. Of course, I'm not a virologis or even a real biologist so take this with a grain of salt. YMMV (Your mileage may vary).

 All of the data and Python code for this analysis is available for download.

The bigger pictures is available in this Science article. It looks like the consensus Ebola genome may not be mutating rapidly right now in 2014, based on 99 sequences, but it has changed significantly from earlier outbreaks and there are a number SNPs in the full set of sequencing data giving the possibility of multiple alleles.

As to the question of whether Ebola can become airborne, the best response comes from Eric Lander, That’s like asking the question ‘Can zebras become airborne,’ ”...“That would be like saying that a virus that has evolved to have a certain life style, spreading through direct contact, can evolve all of a sudden to have a totally different life style, spreading in dried form through the air. A better question would be ‘Can zebras learn to run faster?’

















No comments:

Post a Comment