Sequence data is often stored in Fasta format. Fasta files are text files, so to access them, you can use standard I/O routines to read and write fasta files. However, libraries like BioPerl and BioPython provide a convenient way to manipulate these files.
Fasta files often contain multiple sequences. Here's a simple example of using BioPython to read a Fasta file and return a list of SeqRecord objects. SeqRecord objects contain sequence data, the sequence ID, possibly annotation and other information. It's a uniform object for storing sequence data.
def ReadFasta(file):
"""
ReadFasta - read a sequence from a fasta formatted file.
file - fasta file name
returns: a list of BioPython sequence record
"""
seq_list = []
handle = open(file, 'rU')
for record in SeqIO.parse(handle, 'fasta'):
seq_list.append(record)
handle.close()
return seq_list
That's it. SeqIO is an interface object for reading all kinds of sequence types. The parse routine is an iterator that retrieves a sequence from a file and returns a SeqRecord.
Here's a complete program for getting the lengths of a collection of sequences in a Fasta file. It prints the sequence ID and the sequence length.
download
import sys
import argparse
from Bio import SeqIO
def ReadFasta(file):
"""
ReadFasta - read a sequence from a fasta formatted file.
file - fasta file name
returns: a list of BioPython sequence record
"""
seq_list = []
handle = open(file, 'rU')
for record in SeqIO.parse(handle, 'fasta'):
seq_list.append(record)
handle.close()
return seq_list
def GetArgs():
"""
GetArgs - read the command line
returns - a bam file name, a genome file name, and an output file name
bam file is required
genome file is optional and has a default
output file is option, if not given, output is written to stdout
typing python phiX.count.py will show the options
"""
def ParseArgs(parser):
class Parser(argparse.ArgumentParser):
def error(self, message):
sys.stderr.write('error: %s\n' % message)
self.print_help()
sys.exit(2)
parser = Parser(description='Get the length of a fasta sequence.')
parser.add_argument('-f', '--fasta_file',
type=str,
required=True,
help='Fasta File (required).')
return parser.parse_args()
parser = argparse.ArgumentParser()
args = ParseArgs(parser)
return args.fasta_file
def Main():
fasta_file = GetArgs()
seqs = ReadFasta(fasta_file)
for seq in seqs:
print seq.id, len(seq)
if __name__ == '__main__':
Main()
No comments:
Post a Comment