11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");

Slides:



Advertisements
Similar presentations
Lecture 6 More advanced Perl…. Substitute Like s/// function in vi: #cut with EcoRI and chew back $linker = “GGCCAATTGGAAT”; $linker =~ s/CAATTG/CG/g;
Advertisements

Welcome to lecture 5: Object – Oriented Programming in Perl IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept.
INTRODUCTION TO BIOPERL Gautier Sarah & Gaëtan Droc.
Objected Oriented Perl An introduction – because I don’t have the time or patience for an in- depth OOP lecture series…
Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains.
12.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
11.1 Variable types in PERL ScalarArrayHash $number $string %hash $array[0] $hash{key}
4ex.1 More loops. 4ex.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) {
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
9.1 Subroutines and sorting. 9.2 A subroutine is a user-defined function. Subroutine definition: sub SUB_NAME { STATEMENT1; STATEMENT2;... } Subroutine.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,
Sup.1 Supplemental Material (NOT part of the material for the exam)
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
14.1 Wrapping up Revision 14.3 References are your friends…
7.1 Some Eclipse Tips Try Ctrl+Shift+L Quick help (keyboard shortcuts) Try Ctrl+SPACE Auto-complete Source→Format ( Ctrl+Shift+F ) Correct indentation.
13.1 Wrapping up Running Other Programs 13.3 You may run programs using the system function: $exitValue = system("blastall.exe..."); if ($exitValue!=0)
Scripting Languages Perl Chapter #4 Subroutines. Writing your own Functions Functions is a programming language serve tow purposes: –They allow you to.
8.1 Last time on: Pattern Matching. 8.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
10.1 Variable types in PERL ScalarArrayHash $number $string %hash => $array[0] $hash{key}
1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel October 2009 By Eyal Privman and Dudu.
8ex.1 References and complex data structures. 8ex.2 An associative array (or simply – a hash) is an unordered set of key=>value pairs. Each key is associated.
4.1 Revision. 4.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n"; my $number.
1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel January 2009 By Eyal Privman
10.1 Sorting and Modules בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
Lecture 2 BNFO 135 Usman Roshan. Perl variables Scalar –Number –String Examples –$myname = “Roshan”; –$year = 2006;
12ex.1. 12ex.2 The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science.
Bioperl modules.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
5.1 Revision: Ifs and Loops. 5.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n";
Sequence Alignment Topics: Introduction Exact Algorithm Alignment Models BioPerl functions.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
13r.1 Revision (Q&A). 13r.2 $scalar 13r.3 Multiple assignment my ($a,$b) = ('cow','dog'); = = (6,7,8,9,10);
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
11.1 Subroutines A function is a portion of code that performs a specific task. Functions Functions we've met: $newStr = substr
BioPerl - documentation Bioperl tutorial tutorial Mastering Perl for Bioinformatics: Introduction.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
Lecture 8 perl pattern matching features
Builtins, namespaces, functions. There are objects that are predefined in Python Python built-ins When you use something without defining it, it means.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
Beginning BioPerl for Biologists MPI Ploen Jun Wang.
Intermediate Perl Programming Todd Scheetz July 18, 2001.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
12.1 Running Other Programs And CGI Scripts Please fill the teaching survey at: I read it closely, and I.
Assignment feedback Everyone is doing very well!
7 1 User-Defined Functions CGI/Perl Programming By Diane Zak.
How to write & use Perl Modules. What is a Module? A separate Namespace in a separate file with related functions/variables.
Installing BioPerl – how to add a repository to the PPM Start  All Programs  Active Perl…  Perl Package manager (If you don’t see a screen like the.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
BioPerl Ketan Mane SLIS, IU. BioPerl Perl and now BioPerl -- Why ??? Availability Advantages for Bioinformatics.
Scripting Languages Diana Trandab ă ț Master in Computational Linguistics - 1 st year
Files Tutor: You will need ….
5.1 Revision: Ifs and Loops. 5.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n";
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
O Log in to amazon biolinux O For mac users O ssh O For Windows users O use putty O Hostname public_dns_address O username ubuntu.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Advanced Perl For Bioinformatics Part 1 2/23/06 1-4pm Module structure Module path Module export Object oriented programming Part 2 2/24/06 1-4pm Bioperl.
Modules and BioPerl.
Lesson 2. Control structures File IO - reading and writing Subroutines
Presentation transcript:

11ex.1 Modules and BioPerl

11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG"); CACTGC A subroutine receives its arguments and may return a scalar or a list value: Subroutine revision

11ex.3 If we want to pass arrays or hashes to a subroutine, we must pass a reference: %gene = ("protein_id" => "E4a", "strand" => "-", "CDS" => [126,523]); printGeneInfo(\%gene); sub printGeneInfo { my ($geneRef) print "Protein $geneRef->{'protein_id'}\n"; print "Strand $geneRef->{'strand'}\n"; print "From: $geneRef->{'CDS'}[0] "; print "to: $geneRef->{'CDS'}[1]\n"; } Passing variables by reference

11ex.4 What if we wanted to invoke this subroutine on every gene in the hash of genes that we created in The previous exercise? foreach $geneRef (values(%genes)) { printGeneInfo($geneRef); } Passing variables by reference %genes NAME => { " protein_id " => PROTEIN_ID " strand " => STRAND " CDS " => [START, END]}

11ex.5 Similarly, to return a hash use a reference: sub getGeneInfo { my %geneInfo; (fill hash with info) return \%geneInfo; } $geneRef = getGeneInfo(..); In this case the hash will continue to exists outside the scope of the subroutine! Returning variables by reference

11ex.6 Debugging subroutines Step into a subroutine (F7) to debug the internal work of the sub Step over a subroutine (F8) to skip the whole operation of the sub Step out of a subroutine (Ctrl+F7) when inside a sub – run it all the way to its end and return to the main script

11ex.7 Modules

11ex.8 A module or a package is a collection of subroutines, usually stored in a separate file with a “.pm” suffix (Perl Module). The subroutines of a module should deal with a well-defined task. e.g. Fasta.pm: may contain subroutines that read and write FASTA files: readFasta, writeFasta, getHeaders, getSeqNo. What are modules

11ex.9 A module is usually written in a separate file with a “.pm” suffix. The name of the module is defined by a “ package ” line at the beginning of the file: package Fasta; sub getHeaders {... } sub getSeqNo {... } The last line of the module must be a true value, so usually we just add: 1; Writing a module

11ex.10 In order to write a script that uses a module add a “ use ” line at the beginning of the script: use Fasta; Note #1: for basic use of modules put the module file is in the same directory as your script, otherwise Perl won’t find it! Note #2: You can “ use ” inside a module another module and you can have as many “ use ” as you want. Using modules * If you want to learn how to “ use ” a module from a different directory read about “ use lib ”

11ex.11 use Fasta; Now we can invoke a subroutine from within the namespace of that package: PACKAGE::SUBROUTINE(...) e.g. $seq = Fasta::getSeqNo(3); Note that we cannot access it without specifying the namespace: $seq = getSeqNo(3); Undefined subroutine &main::getSeqNo called at... Perl tells us that no subroutine by that name is defined in the “main” namespace (the global namespace) There is a way to avoid this by using the “Exporter” module that allows a package to export it’s subroutine names. You can read about it here: Using modules - namespaces

11ex.12 Using subroutines in Perl Express

11ex.13 Class exercise Change the solution for class ex11.4 (the protein-lengths hash) – move the two subroutines to a module by the name proteinLengths.pm, and make the necessary changes in the script. 2. (Home ex. 6.2) Create a module called readSeq.pm with the following functions: readFastaSeq: Reads sequences from a FASTA file. Return a hash – the header lines are the keys and the sequences are the values. readGenbank: Reads a genome annotations file and extract CDS information, as in class ex. 10, and in home ex. 4 question 5. Return the complex data structure. Test with an appropriate script! 3.* Use the readSeq.pm module to solve home exercise 4 question 6.

11ex.14

11ex.15 The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research. Things you can do with BioPerl: Read and write sequence files of different format, including: Fasta, GenBank, EMBL, SwissProt and more… Extract gene annotation from GenBank, EMBL, SwissProt files Read and analyse BLAST results. Read and process phylogenetic trees and multiple sequence alignments. Analysing SNP data. And more… BioPerl

11ex.16 BioPerl modules are called Bio::XXX You can use the BioPerl wiki: with documentation and examples for how to use them – which is the best way to learn this. We recommend beginning with the "How-tos": To a more hard-core inspection of BioPerl modules: BioPerl Module Documentation BioPerl

11ex.17 Installing modules from the internet The best place to search for Perl modules that can make your life easier is: The easiest way to download and install a module is to use the Perl Package Manager (part of the ActivePerl installation) Note: ppm installs the packages under the directory “site\lib\” in the ActivePerl directory. You can put packages there manually if you would like to download them yourself from the net, instead of using ppm. Choose “ View all packages ” Enter module name Choose module Install! Add it to the installation list

11ex.18 Installing modules from the internet Alternatively - Note: ppm installs the packages under the directory “site\lib\” in the ActivePerl directory. You can put packages there manually if you would like to download them yourself from the net, instead of using ppm.

11ex.19 Many packages are meant to be used as objects. In Perl, an object is a data structure that can use subroutines that are associated with it. To create an object from a certain package use “new”: my $obj = new PACKAGE; e.g. my $in = new FileHandle; New returns a reference to a data structure, which acts as a FileHandle object. New can also receive arguments: my $obj = new PACKAGE; my $in = new FileHandle(">$inFile"); Object-oriented use of packages $obj 0x225d14 func() anotherFunc() =>

11ex.20 To invoke a subroutine from the package for a specific object we use the “ -> ” notation again: $line = $in->getLine(); Note that this is different from accessing elements of a reference to an array or hash, because we don’t have brackets around “ getLine ”: $length = $proteinLengths->{AP_000081}; $grade = $gradesRef->[0]; Object-oriented use of packages $obj 0x225d14 func() anotherFunc() =>

11ex.21 The Bio::SeqIO module allows input/output of sequences from/to files, in many formats: use Bio::SeqIO; $in = new Bio::SeqIO("-file" => " "EMBL"); $out = new Bio::SeqIO("-file" => ">seq2.fasta", "-format" => "Fasta"); while ( my $seq = $in->next_seq() ) { $out->write_seq($seq); } A list of all the formats BioPerl can Handle can be found in: BioPerl: the SeqIO module

11ex.22 use Bio::SeqIO; $in = new Bio::SeqIO("-file" => "<seq.fasta", "-format" => "Fasta"); while ( my $seqObj = $in->next_seq() ) { print "ID:".$seqObj->id()."\n"; #1st word in header print "Desc:".$seqObj->desc()."\n"; #rest of header print "Length:".$seqObj->length()."\n";#seq length print "Sequence: ".$seqObj->seq()."\n"; #seq string } The Bio::SeqIO function “ next_seq ” returns an object of the Bio::Seq module. This module provides functions like id() (returns the first word in the header line before the first space), desc() (the rest of the header line), length() and seq() (return sequence length). You can read more about it in: BioPerl: the Seq module

11ex.23 Class exercise 14 1.Write a script that uses Bio::SeqIO to read a FASTA file (use the EHD nucleotide FASTA from the webpage) and print only sequences shorter than 3,000 bases to an output FASTA file. 2.Write a script that uses Bio::SeqIO to read a FASTA file, and print all header lines that contain the words " Mus musculus ". 3.Write a script that uses Bio::SeqIO to read a GenPept file (use preProInsulin.gp from the webpage), and convert it to FASTA. 4*Same as Q1, but print to the FASTA the reverse complement of each sequence. (Do not use the reverse or tr// functions! BioPerl can do it for you - read the BioPerl documentation). 5** Same as Q4, but only for the first ten bases (again – use BioPerl rather than substr)