Counting Alignment Qualities

We looked at eliminating reads with low quality previously. Now, I would like to look at alignment quality. The BWA alignment software reports a mapping quality for each read. The mapping quality is defined as \({Q_{mapping}} =  - 10{\log _{10}}P(mapping\,position\,is\,wrong)\). Q is rounded to the nearest integer.

We want to be able to plot a histogram of alignment qualities. To gather the data, we loop through all of the reads in bam alignment file and count the number of reads with each mapping quality. We will use PySam to read the bam file and a Numpy integer array to hold the counts.

Here's the code:

def ReadBamFile(chrom, bam_file):
    """
    ReadBamFile - read a bam file and count mapping qualities

    chrom - chromosome name e.g. chr2L
    bam_file - input bam file
    Return:
    counts - a numpy array of integers, containing the count of alignment
             qualities. counts[i]=j means there were j reads with alignment
             quality i.
    """

    counts = np.zeros(61, dtype=int)
    in_file = pysam.Samfile(bam_file, 'rb')
    for read in in_file.fetch(chrom):
        counts[read.mapq] += 1

    return counts


Once we have the counts, we can print them to a CSV file for plotting with R or Matlab.

You can download the entire file.


No comments:

Post a Comment