The Analytic Garden

Posts

Showing posts from April, 2014

More Counting - This Time Reads in Exons and Introns

- April 11, 2014

One thing that is important to compare is the relative coverage of reads in exonic vs. intronic regions. We can count reads in introns and exons using the IntervalTree and Gene classes we used previously. We will derive a new class from the Gene class. The new class objects have properties for keeping track of read counts and methods for searching the exon tree and printing the counts. class GeneWithCounts(Gene): """ GeneWithCounts - extend Gene class to include counts of reads overlapping introns and exons, and search and print routines """ def __init__(self, gene_str): Gene.__init__(self, gene_str) self.read_count = 0 self.reads_in_exons = 0 self.reads_in_introns = 0 def SearchExons(self, read): """ SearchExons - search exon tree for read overlap with exon """ self.read_count += 1 exons = self.exons o...

Mmmm Pie

- April 09, 2014

We have a new algorithm for SNP prediction to be published soon. One thing we are interested in is what genomic features do the SNP predictions overlap and what is the distribution of the feature types. To analyze the SNPs we will use a process similar to the one we used to annotate genes with GO annotation . We will build an interval tree of gene, ncRNA , and EST genomic ranges. We will then search the tree to find the genomic object that contains the SNP location. If it the position is within a gene, we will determine if it lies in an exon, intron, a UTR , or within 1 kb of the 3' UTR. In order to deal with finding if a particular SNP lies with 1000 bases of the 3' UTR, we will extend the gene location by 1 kb at the 3' end. To describe the extended gene region, we will use a modification of the Gene class from the GO annotation program. The new class is derived from the Gene class and adds a couple of features to the basic class. A particular position may overla...

Counting Alignment Qualities

- April 03, 2014

We looked at eliminating reads with low quality previously. Now, I would like to look at alignment quality. The BWA alignment software reports a mapping quality for each read. The mapping quality is defined as \({Q_{mapping}} = - 10{\log _{10}}P(mapping\,position\,is\,wrong)\). Q is rounded to the nearest integer. We want to be able to plot a histogram of alignment qualities. To gather the data, we loop through all of the reads in bam alignment file and count the number of reads with each mapping quality. We will use PySam to read the bam file and a Numpy integer array to hold the counts. Here's the code: def ReadBamFile(chrom, bam_file): """ ReadBamFile - read a bam file and count mapping qualities chrom - chromosome name e.g. chr2L bam_file - input bam file Return: counts - a numpy array of integers, containing the count of alignment ...