GC content
The GC content of a sequence is the percentage of G's and C's in a fragment or an entire genome. Genes are often characterized by a higher GC content than background.Variations in GC content can lead to bias in read coverage in ChIP-seq and RNA-seq experiments. See Benjamini and Speed . Calculating GC content is straight forward. Select a segment of DNA or RNA sequence and count A's, C's etc. and calculate \[gc = \frac{{G + C}}{{A + C + G + T}}\]. Often GC content is calculated for an entire chromosome by calculating the GC content in a series of either overlapping or sliding windows of fixed size. Our problem is slightly more complicated. We want to calculate the number of reads mapped to various values of GC content ranging from 0 to 100%. The first step is to calculate the GC content for a chromosome using a series of non-overlapping, fixed width windows. We can use Python string functions to do this. def GC_pct(seq_rec, window): """ ...