How to Count
Counting is hard. It seems like it shouldn't be. After all, we all learned to count as part of our first experiences in school, or maybe even before. In computational biology, we often have to count to determine basic properties of a collection of sequence or to estimate probabilities and expected values. The problem we frequently run into is how do we determine what to count and how do we go about counting it. Sequencing Errors One of the questions we have been interested in answering is; what are the basic error rates in the base calls coming from the genome center's Illumina sequencing machine. A brief overview of the sequencing and alignment processes can be downloaded here . If you looked at https://www.youtube.com/watch?v=l99aKKHcxC4 you can see how reads are built one nucleotide at a time. The base calling process determines the nucleotide type by image processing of emitted fluorescence. For each nucleotide, intensity values are calculated for each type: A, C, G, a...