Posts

Showing posts from October, 2014

Can Python Help fight Ebola?

Image
Ebola is big news. Governors are ignoring public health officials and the White House . News media publish sensationalist headlines . From CNN we have:  "Ebola in the air? A nightmare that could happen". Is that a possibility? Is it likely? We'll get to that in a moment. The question at hand is what, if anything, does Python have to do with Ebola?  The university where I'm teaching has asked us to devote some time to discussing the Ebola epidemic, so I gave some thought about how programming can contribute to dealing with the problem.. I won't go into detailas about the epidemic here. If you want to find facts, I suggest staying away from the news and going to NIH or NIAD . The best popular account of how health officials and scientists are trying to deal with epidemic that I have encountered is from the New Yorker .  As part of the class I teach, I decided to take a brief look at whatever kind of Ebola genomic data was available and see if I could gain an...

Counting Pairs

One question of interest when dealing with paired-end reads is what was the length of the sequence that the paired mates originally came from. We mapped D. melanogaster DNA,  100 bp paired-end reads, to the genome with BWA . Among other information, BWA returns the observed template length (TLEN) for each read and its mate. With this information, producing a histogram of the fragment lengths is simply a matter of extracting the proper field from the SAM/BAM file. We used Pysam to read the BAM file. While reading we filtered read pairs and rejected any read that had a base call quality or mapping quality less than 30. We also rejected read pairs when one of the pair did not map. For the pairs that passed the filter, we simply count the number with each TLEN. We save the counts in a file for plotting with R. Here's the counting code: def ReadBamFile(bam_file): """ ReadBamFile - read a bam file and count nucleotides by context bam_file - bam file co...