We looked at eliminating reads with low quality previously. Now, I would like to look at alignment quality. The BWA alignment software reports a mapping quality for each read. The mapping quality is defined as \({Q_{mapping}} = - 10{\log _{10}}P(mapping\,position\,is\,wrong)\). Q is rounded to the nearest integer.
We want to be able to plot a histogram of alignment qualities. To gather the data, we loop through all of the reads in bam alignment file and count the number of reads with each mapping quality. We will use PySam to read the bam file and a Numpy integer array to hold the counts.
Here's the code:
def ReadBamFile(chrom, bam_file):
"""
ReadBamFile - read a bam file and count mapping qualities
chrom - chromosome name e.g. chr2L
bam_file - input bam file
Return:
counts - a numpy array of integers, containing the count of alignment
qualities. counts[i]=j means there were j reads with alignment
quality i.
"""
counts = np.zeros(61, dtype=int)
in_file = pysam.Samfile(bam_file, 'rb')
for read in in_file.fetch(chrom):
counts[read.mapq] += 1
return counts
Once we have the counts, we can print them to a CSV file for plotting with R or Matlab.
You can download the entire file.
No comments:
Post a Comment