Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lane Medical Library & Knowledge Management Center Perl Programming for Biologists SESSION 2: Tue Feb 10 th 2009 Yannick Pouliot,

Similar presentations


Presentation on theme: "Lane Medical Library & Knowledge Management Center Perl Programming for Biologists SESSION 2: Tue Feb 10 th 2009 Yannick Pouliot,"— Presentation transcript:

1 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Perl Programming for Biologists SESSION 2: Tue Feb 10 th 2009 Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center © 2008 The Board of Trustees of The Leland Stanford Junior University

2 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2 Prep Log into WebEx session (stanford.webex.com/Meetings) Please download all class materials for 2 nd class from FAQ at http://lane.stanford.edu/howto/index.html?id=_3824 in a directory http://lane.stanford.edu/howto/index.html?id=_3824 Open a command window and cd to that directory Start Open Perl IDE or Mac equivalent

3 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3 Reminder: Cautions All examples pertain to MS Office 2003  From MS Office 2007, save in 2003 format to use Perl code described here. All contents pertain to Perl 5.x, not 6.x

4 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4 Session #2 Focus 1. Understanding key Perl language elements Scrutinizing several variant programs 2. Altering file contents from text files And remember: Ask QUESTIONS

5 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5 Recap from Session 1

6 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6 Recap Questions from last session? → Stomp the teacher!

7 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7 Reviewing Simple1.pl Understanding what each element does #!C:\Perl\bin # --------------------------------------------------------------------------- # Simple1 # --------------------------------------------------------------------------- use strict; use warnings; # --------------------------- sub Multiply { my $f1 = shift; my $f2 = shift; return ($f1 * $f2); } # --------------------------- # main print "Let's test Perl \n"; my $TempVar = 0; my @InputNumbers = @ARGV; print "The two numbers are: $InputNumbers[0] and $InputNumbers[1] \n"; my $Result = Multiply($InputNumbers[0],$InputNumbers[1]); print "Here's the value of both numbers multiplied: $Result \n"; print "I'm done! \n";

8 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 8 Simple2.pl: Introducing New Language Elements → let’s look at it using Open Perl IDE and XXX

9 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9 A Final Example: Biologically Useful Perl Program What it does: 1. Reads input from an Excel worksheet containing public identifiers for DNA sequences associated with genes 2. Uses Entrez Utilities provided by NCBI to retrieve: UniGene cluster ID UniGene Gene symbol NCBI Gene ID 3. Writes the result into another Excel worksheet Features a mix of procedural and object programmingobject programming Relevant links:  http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_db=unigene http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_db=unigene  Entrez Utilities Entrez Utilities

10 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10 What Excel3.pl does:

11 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11 Let’s Run Excel3.pl Type “perl -f Excel3.pl” in the directory where you installed the demonstration programs

12 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous

13 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13 Moving On: Altering file contents

14 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 14 Converting Data Stored in Flatfiles Input: ConvertOuput.csv  = renamed file generated by Excel3.pl, converted to csv format Let’s look and run Convert1.pl →Convert5.pl

15 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15 Convert1.pl Structure of program Run program Exercise: what is chomp?chomp Understanding file handlesfile handles What is $_ ?$_ Create an error: uncomment line 22 and run Introducing the escape character: “\”

16 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16 Convert2.pl: Like Convert1.pl, but Prints Only First Item Using arrays to process contents of a line  Introducing splitsplit Changing directories  Useful to segregate data files  Need to change the path to make this work in your environment Note difference between Mac and Windows syntax for path names

17 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17 Convert3.pl: Like Convert2.pl, but Prints Changed Order of Columns Run program Q: how would you avoid printing the title line in the input file?

18 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18 Convert4.pl: Like Convert3.pl, but Removes “.” in Cluster IDs Run program  Introducing the match and substitute operator:match and substitute Matching: ‘/something/’ Substituting: ‘s/something1/something2/’ Used in regular expressions for text matching (more later)  Introducing the tab operator: “\t”

19 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 19 Convert5.pl: Like Convert3.pl, but with Smarts + Prints More Elements Run program Introducing “regular expressions”regular expressions  Q: how would you modify this code to print only when a “Gene: Gene Symbol” was found → tip: use matching operator: If (not($var =~ /something/)) { do something } → Try doing it: 10 min

20 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 20 More on Regular Expressions Very powerful  i.e., flexible, fast Complicated topic  Can require lots of trial and error to get it right  Quick reference card essential  Best comprehensive resource Covers more than Perl Friedl, 2006

21 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 21 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous

22 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 22 Part 2: Practical examples of programs that alter file contents using regular expressions

23 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 23 Regular Expressions: More Examples The example we’ll use: Extracting clone IDs for CDH5 by… 1. Importing SOURCE results directly into ExcelSOURCE 2. Parsing the.csv version of that file (CDH5Clones.csv)

24 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 24 Processing EST IDs from SOURCE Input: CDH5Clones.csv or CDH5Clones.xls

25 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 25 Clone1.pl: Filtering of Results What it does:  Reads.csv file of SOURCE results  Finds all clones from PLACE library  Returns list in single column form Run the program Why the error?

26 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 26 Clone2.pl: Numerical Filtering of Results Problem: Suppose you only want clones with IDs >= 7002000 because you already have clones with ID<7002000? Solution: Check numerical value of clone ID and decide whether to retain it or not. → Run program!


Download ppt "Lane Medical Library & Knowledge Management Center Perl Programming for Biologists SESSION 2: Tue Feb 10 th 2009 Yannick Pouliot,"

Similar presentations


Ads by Google