12.1
12.2 The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research. Things you can do with BioPerl: Read and write sequence files of different format, including: Fasta, GenBank, EMBL, SwissProt and more… Extract gene annotation from GenBank, EMBL, SwissProt files Read and analyse BLAST results. Read and process phylogenetic trees and multiple sequence alignments. Analysing SNP data. And more… BioPerl
12.3 BioPerl modules are called Bio::XXX You can use the BioPerl wiki: with documentation and examples for how to use them – which is the best way to learn this. We recommend beginning with the "How-tos": To a more in depth inspection of BioPerl modules: BioPerl
12.4 The Bio::SeqIO module allows input/output of sequences from/to files, in many formats: use Bio::SeqIO; $in = new Bio::SeqIO("-file" => " "EMBL"); $out = new Bio::SeqIO("-file" => ">seq2.fasta", "-format" => "Fasta"); while ( my $seqObj = $in->next_seq() ) { $out->write_seq($seqObj); } A list of all the sequence formats BioPerl can read is in: BioPerl: the SeqIO module
12.5 use Bio::SeqIO; $in = new Bio::SeqIO("-file" => "<seq.fasta", "-format" => "Fasta"); while ( my $seqObj = $in->next_seq() ) { print "ID:".$seqObj->id()."\n"; #1st word in header print "Desc:".$seqObj->desc()."\n"; #rest of header print "Length:".$seqObj->length()."\n";#seq length print "Sequence: ".$seqObj->seq()."\n"; #seq string } The Bio::SeqIO function “ next_seq ” returns an object of the Bio::Seq module. This module provides functions like id() (returns the first word in the header line before the first space), desc() (the rest of the header line), length() and seq() (return sequence length). You can read more about it in: BioPerl: the Seq module
12.6 The Bio::Seq can read and parse the adenovirus genome file for us: BioPerl: Parsing a GenBank file gene /gene="NDP" /note="ND" /db_xref="LocusID:4693" /db_xref="MIM:310600" CDS /gene="NDP" /note="Norrie disease (norrin)" /codon_start=1 /product="Norrie disease protein" /protein_id="NP_ " /db_xref="GI: " /db_xref="LocusID:4693" /db_xref="MIM:310600" /translation="MRKHVLAASFSMLSLL SHPLYKCSSKMVLLARCEGHCSQAS PLVSFSTVLKQPFRSSCHCCRPQTS LTATYRYILSCHCEEC " primary tag: gene tag: gene value: NDP tag: note value: ND tag: db_xref value: LocusID:4693 value: MIM: primary tag: CDS tag: gene value: NDP tag: note value: Norrie disease (norrin)......
12.7 The Bio::Seq can read the adenovirus genome file for us: use Bio::SeqIO; $in = new Bio::SeqIO("-file" => $inputfilename, "-format" => "GenBank"); my $seqObj = $in->next_seq(); foreach my $featObj ($seqObj->get_SeqFeatures) { print "primary tag: ", $featObj->primary_tag, "\n"; foreach my $tag ($featObj->get_all_tags) { print " tag: ", $tag, "\n"; foreach my $value ($featObj->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } BioPerl: Parsing a GenBank file primary tag: gene tag: gene value: NDP tag: note value: ND tag: db_xref value: LocusID:4693 value: MIM: primary tag: CDS
12.8 The Bio::DB::Genbank module allows us to download a specific record from the NCBI website: use Bio::DB::GenBank; $gb = new Bio::DB::GenBank; $seqObj = $gb->get_Seq_by_acc("J00522"); # or... request Fasta sequence $gb = new Bio::DB::GenBank("-format" => "Fasta"); BioPerl: downloading files from the web
12.9 First we need to have the BLAST results in a text file BioPerl can read. Here is one way to achieve this: BioPerl: reading BLAST output Text Download
12.10 BioPerl: reading BLAST output
12.11 BioPerl: reading BLAST output
12.12 The Bio::SearchIO module can read and parse BLAST output: use Bio::SearchIO; my $blast_report = new Bio::SearchIO ("-format" => "blast", "-file" => "mice.blast"); while (my $result = $blast_report-> next_result ) { print "Checking query ", $result-> query_name, "\n"; while (my $hit = $result-> next_hit ()) { print "Checking hit ", $hit->name(), "\n"; my $hsp = $hit-> next_hsp (); print $hsp-> hit->start ()... $hsp-> hit->end ()... } } (See the blast example in lesson 1) BioPerl: reading BLAST output