Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Slides:



Advertisements
Similar presentations
Input and Output READ WRITE OPEN. FORMAT statement Format statements allow you to control how data are read or written. Some simple examples: Int=2; real=
Advertisements

Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
Programming and Perl for Bioinformatics Part III.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
5.1 Previously on... PERL course (let ’ s practice some more loops)
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
I/O while ($line= ){ #remove new line char \n chomp($line); if($line eq “quit”){ exit(1); } while ( ){ #remove new line char \n chomp($_); if($_ eq “quit”){
Guide To UNIX Using Linux Third Edition
Physical Mapping II + Perl CIS 667 March 2, 2004.
Perl Programming WeeSan Lee
Scripting Languages Chapter 8 More About Regular Expressions.
Bash Shell Scripting 10 Second Guide Common environment variables PATH - Sets the search path for any executable command. Similar to the PATH variable.
Subroutines Just like C, PERL offers the ability to use subroutines for all the same reasons – Code that you will use over and over again – Breaking large.
Sort the Elements of an Array Using the ‘sort’ keyword, by default we can sort the elements of an array lexicographically. Elements considered as strings.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Introduction To Perl Susan Lukose. Introduction to Perl Practical Extraction and Report Language Easy to learn and use.
Lecture 10A Perl: Programming Freedom Fundamentals of Engineering For Honors – H192 By Robert Mohr, Ted Pavlic, and Joe Ryan.
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
BINF634 Fall15 LECTURE 21 Topics Quiz 1 Match operators Substitution Transliteration String functions length, reverse Array functions scalar, reverse,
Bioinformatics 生物信息学理论和实践 唐继军
Introduction to Perl Yupu Liang cbio at MSKCC
Perl Language Yize Chen CS354. History Perl was designed by Larry Wall in 1987 as a text processing language Perl has revised several times and becomes.
Perl: Lecture 1 The language. What Perl is Merger of Unix tools – Very popular under UNIX – shell, sed, awk Programming language – C syntax Scripting.
Chapter 10: BASH Shell Scripting Fun with fi. In this chapter … Control structures File descriptors Variables.
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
Introduction to Unix – CS 21
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607 Office Hours – Tuesday and.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
5 1 Data Files CGI/Perl Programming By Diane Zak.
Iteration While / until/ for loop. While/ Do-while loops Iteration continues until condition is false: 3 important points to remember: 1.Initialise condition.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Topic 4:Subroutines CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 4, pages 56-72, Programming Perl 3rd edition pages 80-83,
CPTG286K Programming - Perl Chapter 1: A Stroll Through Perl Instructor: Denny Lin.
Introduction to PERL Genetics PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text.
Topic 2: Working with scalars CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 2, pages 19-38, Programming Perl 3rd edition chapter.
Department of Electrical and Computer Engineering Introduction to Perl By Hector M Lugo-Cordero August 26, 2008.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
PERL By C. Shing ITEC Dept Radford University. Objectives Understand the history Understand constants and variables Understand operators Understand control.
Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:
CSC 4630 Meeting 17 March 21, Exam/Quiz Schedule Due to ice, travel, research and other commitments that we all have: –Quiz 2, scheduled for Monday.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
2000 Copyrights, Danielle S. Lahmani Foreach example = ( 3, 5, 7, 9) foreach $one ) { $one*=3; } is now (9,15,21,27)
File Handle and conditional Lecture 2. File Handling The Files associated with Perl are often text files: e.g. text1.txt Files need to be “opened for.
The Scripting Programming Language
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Assignment #2. Regular Expression (RE) Represent a string pattern – Consists of regular characters and wild cards Assignment #2: implement a subset of.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Perl Subroutines User Input Perl on linux Forks and Pipes.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Unix Fundamentals - Part iii vi Editor
Presentation transcript:

Perl II Part III: Motifs and Loops

Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops Use basic regular expressions Responding to conditional tests Examining sequence data in detail

Conditional Tests if (1 == 1) { print “1 equals 1\n”; } if (1) { print “What does this evaluate to?\n”; } if (1 == 0) { print “1 equals 0\n”; } if (0) { print “1 evaluates to true\n”; }

Conditional if/else if (1 == 1) { print “1 equals 1\n\n”; } else { print “1 does not equal 1\n\n”; } unless (1 == 0) { print “1 does not equal 0\n\n”; } else { print “1 does 0?\n\n”; } Conditionals also use: ==, !=, >=,, For text: “” and ‘’ evaluate to true

More conditionals … #!usr/bin/perl –w #if-elseif-else $word = “MNIDDKL”; if ($word eq ‘QSTLV’) { print “QSTLV\n”; } elseif ($word eq ‘MSRQQNKISDH’) { print “MSRQQNKISDH\n”; } else { print “What is \”$word\”?\n”; exit;

Using Loops to Open and Read Files #!/usr/bin/perl –w $proteinFilename = “NM_ pep”; #open the file and catch the error unless (open(MUPPETFILE, $proteinFilename) ) { print “Could not open file $proteinFilename!\n”; exit; } #read data using a while loop, and print while ($protein = ) { print “##### Here is the next line of the file:\t”; print $protein,”\n”; } close MUPPETFILE; exit;

Motif finding – Something genuinely useful Program Flow – Reads in protein sequence from file Puts all sequence data into one string for easy searching Looks for motifs that the user types into the keyboard

#!/usr/bin/perl –w #searching for motifs #Ask the user for the filename of the data file print “Please type the filename of the data file: “; $proteinFilename = ; chomp $proteinFilename; #Open the file or exit open (PROTEINFILE, $proteinFilename) or die (“Error: $!”); #Read file into an array and = ; close PROTEINFILE; #Put data into a single string to make it easier to search $protein = $protein =~ s/[\s\t\f\r\n ]//g; Reading: ”<filename” Writing: “>filename”, discard current contents if it already exists Append: “>>filename”, open or create file for writing at end of file Update: “+<filename”, open a file for update (reading and writing) New Update: “+>filename”, create file for update is non-existent This operator will read data in until it reached the special $/ character, which is set to default as \n

#Ask the user for a motif, search for it, and report #if it was found. Exit if no motif was entered. do { print “Enter a motif to search for: “; $motif = ; chomp $motif; if ($protein =~ m/$motif/) { print “I found it!\n\n”; } else { print “I couldn’t find it!\n\n”; } #exit on user prompt } until ($motif =~ /^\s*$/); exit;

Regular Expressions Very powerful methods for matching wildcards to strings Very cryptic Perl reads =~ /n/ as =~ m/n/ The delimiter is flexible, it acccepts any nonalphanumeric nonwhitespace character (eg. #({[,.’)

Metasymbols SymbolMeaningSymbolMeaning \0Null Char (ASCII NULL)\NNNChar given in octal, to 377 \nNth previously captured string\aMatch the alarm character \ATrue, at the beginning of a string\bMatch the backspace char \bTrue, at word boundary\BTrue when not a word boundary \cXMatch the control char\dAny digit \DAny nondigit\eMatch the escape char \E\fFormfeed \G\nMatch the newline (NL or CR) \rMatch the return char\sMatch any whitespace \SMatch any nonwhitespace\tMatch any tab (HT) char \wMatch any ‘word’ char _AZ09\WMatch any nonword char \x{abcd}Match the char given in hex\zTrue at end of string \Z\z or before newline()Pattern brackets [ ]Either/or pattern brackets|or {n}Match only n characters.Any char but a newline *Zero or more times$Occurring at end of string ^Beginning of string{n,m}Match n to m char, inclusive ?Zero or one occurrences+One or more occurrences !~Non-match

Look-behind assertion (?<=value1)value2 $string = “English goodly spoken here”; $string =~ s/(?<=English )goodly/well/; (?=value1)value2 : look ahead (!=value1)value2 : not look ahead (!<=value1)value2 : not look behind

Backreferences Pattern == “2y y2y” $string =~ /(\d\w)\s+(\d)\s+(\d)\s\3\1\1/; backreferencing works within brackets from left to right

#!/usr/bin/perl –w #determining the frequency of nucleotides #Ask the user for the filename of the data file print “Please type the filename of the data file: “; $dnaFilename = ; ? $dnaFilename; #Open the file or exit open (DNA, $dnaFilename) or die (“Error: ?”); #Read file into an array and = ; close DNA; #Put data into a single string to make it easier to search $dna = $dna =~ s/[\s\t\f\r\n ]//g;

#Explode the $dna string into an array where it will be #easier to iterate through them and count their = split(‘’,$dna); #Initialize the counts $A_Number = 0; $C_Number = 0; $G_Number = 0; $T_Number = 0; $Errors = 0;

#Loop through the bases, examine each to determine what #each nucleotide is and increment the appropriate number foreach $base { if ($base eq ‘A’) ++$A_Number; elseif ($base eq ‘C’) ++$C_Number; elseif ($base eq ‘G’) ++$G_Number; elseif ($base eq ‘T’) ++$T_Number; else { print “Error: I don’t recognize the base\n”; ++$Errors; } print “Base\tNumber\nA=\t$A_Number\nB=\t$B_Number\n”; print “C=\t$C_Number\nG=\t$G_Number\n\n”;

foreach $base { if ($base eq ‘A’) ++$A_Number; elseif ($base eq ‘C’) ++$C_Number; elseif ($base eq ‘G’) ++$G_Number; elseif ($base eq ‘T’) ++$T_Number; else { print “Error: I don’t recognize the base\n”; ++$Errors; } foreach { if (/A/) ++$A_Number; elseif (/C/) ++$C_Number; elseif (/G/) ++$G_Number; elseif (/T/) ++$T_Number; else { Print “Error when reading base\n”; ++$Errors; } }

Tricky little ifs if ($string =~ /\d{3,4}/) print “the string is 3 to four characters long\n”; = print “the string is 3 to four characters long\n” if ($string =~ /\d{3,4}/);

#!/usr/bin/perl –w #determining the frequency of nucleotides #Ask the user for the filename of the data file print “Please type the filename of the data file: “; $dnaFilename = ; chomp $dnaFilename; #See if the file exists then open it unless( -e $dnaFilename) { print “\”$dnaFilename\” does not exist”; exit; } open (DNA, $dnaFilename) or die (“File Error”); #Put data into a single string to make it easier to search $dna = $dna =~ s/[\s\t\f\r\n ]//g; Let’s do the same thing but save on some memory by not creating an = ; close DNA;

#Initialize the counts $A_Number = 0; $C_Number = 0; $G_Number = 0; $T_Number = 0; $Errors = 0;

#Loop through the bases, examine each to determine what #each nucleotide is and increment the appropriate number for ($position=0; $position<length $dna; ++$position) { $base = substr($dna, $position, 1); $_ if ($base eq ‘A’) ++$A_Number; elseif ($base eq ‘C’) ++$C_Number; elseif ($base eq ‘G’) ++$G_Number; elseif ($base eq ‘T’) ++$T_Number; else { print “Error: I don’t recognize the base\n”; ++$Errors; } print “Base\tNumber\nA=\t$A_Number\nB=\t$B_Number\n”; print “C=\t$C_Number\nG=\t$G_Number\n\n”; while($base =~ /a/ig){$a++} while($base =~ /c/ig){$c++} while($base =~ /g/ig){$g++} while($base =~ /t/ig){$t++} while($base !~ /[acgt]/ig){$e++}

Writing to files #All text data can be written to files $outputfile = “results.txt”; open(RESULTS, “>$ouputfile”) or die (“Error: $!”); print RESULTS “These results are overwriting everything that existed in the file results.txt\n”; Close RESULTS;

Command line arguments and subroutines #!/usr/bin/perl –w use strict; #Arguments collected on the command line go into a special var # and the program name resides in the var $0 my($title) = “$0 DNA\n\n”; { print $title; exit; } my($input) print $input,”\n\n”; exit;

Command line arguments and subroutines #!/usr/bin/perl –w use strict; #Arguments collected on the command line go into a special var # and the program name resides in the var $0 my($title) = “$0 DNA\n\n”; { print $title; exit; } my($input) my($subRoutineResults) = Find_Length($input); print “the length of your input is $subRoutineResults\n”; exit; sub Find_Length { my($tmp) $results = length($tmp); $return $results; }

Passing by value vs reference Simple routines pass everything by value However, because of the subroutine values of arrays, hashes and scalers get flattened. Ex. = (1..10); = sub { print }

= (1..10); = (1..23); #returned arrays can be referenced but are global print sub { my ($i, $j) print $$j[2]; ‘4’); }