Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perl Practical Extration and Reporting Language An Introduction by Shwen Ho.

Similar presentations


Presentation on theme: "Perl Practical Extration and Reporting Language An Introduction by Shwen Ho."— Presentation transcript:

1 Perl Practical Extration and Reporting Language An Introduction by Shwen Ho

2 What is Perl good for? Designed for text manipulation Very fast to implement Allows many different ways to solve the same problem Runs on many different platform –Windows, Mac, Unix, Linux, Dos, etc

3 Running Perl Perl scripts do not need to be compiled They are interpreted at the point of execution They do not necessarily have a particular file extension although the.pl file extension is used commonly.

4 Running Perl Executing it via the command line command line> perl script.pl arg1 arg2... Or add the line "#!/usr/bin/perl" to the start of the script if you are using unix/linux –Remember to set the correct file execution permissions before running it. chmod +x perlscript.pl./perlscript.pl

5 Beginning Perl Every statement end with a semi colon ";". Comments are prefixed at the start of the line with a hash "#". Variable are assigned a value using the character "=". Variables are not statically typed, i.e., you do not have to declare what kind of data you want to hold in them. Variables are declared the first time you initialise them and they can be anywhere in the program.

6 Scalar Variables Contains single piece of data '$' character shows that a variable is scalar. Scalar variables can store either a number of a string. A string is a chunk of text surrounded by quotes. $name = "paul"; $year = 1980; print "$name is born in $year"; output: paul is born in 1980

7 Arrays Variables (List) Ordered list of data, separated by commas. character shows that a variable is an array Array of = (1980, 1975, 1999); Array of = ("Paul", "Jake", "Tom"); Array of both string and = (14,"Cleveland St","NSW",2030);

8 Retrieving data from Arrays Printing = ("Paul", "Jake", "Tom"); print Accessing individual elements in an = ("Paul", "Jake", "Tom"); print "$name[1]"; What has to $name –To access individual elements use the syntax $array[index] Why did $name[1] print the second element? –Perl, like Java and C, uses index 0 to represent the first element.

9 Interesting things you can do with = ("Paul", "Jake", "Tom"); print Paul Jake Tom = 3

10 Basic Arithmetic Operators + Addition - Subtraction * multiplication / division ++ adding one to the variable -- subtracting one from the variable $a += 2 incrementing variable by 2 $b *= 3 tripling the value of the variable

11 Relational Operators ComparisonNumericString Equals ==eq Not equal !=ne Less than gt Less than or equal <=le Greater than or equal >=gt Comparison cmp

12 Control Operators - If if ( expression 1) {... } elsif (expression 2) {... } else {... }

13 Iteration Structures while (CONDITION) { BLOCK } until (CONDITION) {BLOCK} do {BLOCK} while (CONDITION) for (INITIALIZATION ; CONDITION ; Re-INITIALIZATION) {BLOCK} for VAR (LIST) {BLOCK} foreach VAR (LIST) {BLOCK}

14 Iteration Structures $i = 1; while($i <= 5){ print "$i\n"; $i++; } for($x=1; $x <=5; $x++) { print "$x\n"; = [1,2,3,4,5]; foreach $number print "$number\n"; }

15 String Operations Strings can be concatenated with the dot operator $lastname = "Harrison"; $firstname = "Paul"; $name = $firstname. $lastname; $name = "$firstname$lastname"; String comparison can be done with the relational operator $string1 = "hello"; $string2 = "hello"; if ($string1 eq $string2) { print "they are equal"; } else { print "they are different"; }

16 String comparison using patterns The =~ operator return true if the pattern within the / quotes are found. $string1 = "HELLO"; $string2 = "Hi there"; # test if the string contains the pattern EL if ($string1 =~ /EL/) { print "This string contains the pattern"; } else { print "No pattern found"; }

17 Functions in Perl No strict variable type restriction during function call – java example variable_type function (variable_type variable_name) public int function1 (int var1, char var2) { … } Perl has provided lots of useful functions within the language to get you started. –chop - remove the first character of a string –chomp - often used to remove the carriage return character from the end of a string –push - append one or more element into an array –pop - remove the last element of an array and return it –shift - remove the first element of an array and return it –s - replace a pattern with a string

18 Functions in Perl The "split" function breaks a given string into individual segments given a delimiter. split( /pattern/, string) returns a = split (/\s/, $string); # breaks the sentence into = split (//, $string); # breaks the sentence into single = split (/,/, $string); # breaks the sentence into chunks separated by a comma. join ( /delimiter/, array) returns a string

19 Functions in Perl A simple perl function sub sayHello { print "Hello!!\n"; } sayHello();

20 Executing functions in Perl Function arguments are stored automatically in a temporary array sub sayHelloto $count foreach $person print "Hello $person\n"; } return $count; = ("Paul", "Jake", "Tom"); sayHelloto("Mary", "Jane", "Tylor", 1,2,3);

21 Input / Output Perl allows you to read in any input that is automatically sent to your program via standard input by using the handle. One way of handling inputs via is to use a loop to process every line of input

22 Input / Output Count the number of lines from standard input and print the line number together with the 1st word of each line. $count = 1; foreach $line ( = split(/\s/, $line); print "$count $array[0]\n"; $count++; } Other I/O topics include reading and writing to files, Standard Error (STDERR) and Standard Output (STDOUT).

23 Regular Expression Regular expression is a set of characters that specify a pattern. Used for locating piece of text in a file. Regular expression syntax allows the user to do a "wildcard" type search without necessarily specifying the character literally. Available across OS platform and programming language.

24 A simple regular expression contains the exact string to match $string = "aaaabbbbccc"; if($string =~ /bc/){ print "found pattern\n"; } output: found pattern Simple Regular Expression

25 The variable $& is automatically set to the matched pattern $string = "aaaabbbbccc"; if($string =~ /bc/){ print "found pattern : $&\n"; } output: found pattern bc

26 Simple Regular Expression What happen when you want to match a generalised pattern like an "a" followed by some "b"s and a single "c" $string = "aaaabbbbccc"; if($string =~ /abbc/){ print "found pattern : $&\n"; } else {print "nothing found\n"; } output: nothing found

27 Regular Expression - Quantifiers We can specify the number of times we want to see a specific character in a regular expression by adding operators behind the character. * (asterisk) matches zero or more copies of a specific character + (plus) matches one or more copies of a specific character

28 Regular Expression - = ["ac", "abc", "abbc", "abbbc", "abb", "bbc", "bcf", "abbb", "c"]; foreach $string if($string =~ /ab*c/){ print "$string "; } output: ac abc abbc abbbc

29 Regular Expression - Quantifiers Regular ExpMatched pattern abc ab*cac abc abbc abbbc ab+cabc abbc = ["ac", "abc", "abbc", "abbbc", "abb", "bbc", "bcf", "abbb", "c"];

30 Regular Expression - Anchors You can use Anchor restrictions preceding and behind the pattern to specify where along the string to match to. ^ indicates a beginning of a line restriction $ indicates an end of line restriction

31 Regular Expression - Anchors Regular ExpMatched pattern ^bcbc ^b*cbbc bcf c ^b*c$bbc c b*c$ac abc abbc abbbc bbc = ["ac", "abc", "abbc", "abbbc", "abb", "bbc", "bcf", "abbb", "c"];

32 Regular Expression - Range […] is used to identify the exact characters you are searching for. [ ] will match a single numeric character. [0-9] will also match a single numeric character [A-Za-z] will match a single alphabet of any case.

33 Regular Expression - Range Search for a word that –starts with the uppercase T –second letter is a lowercase alphabet –third letter is a lower case vowel –is 3 letters long followed by a space Regular expression : "^T[a-z][aeiou] " Note : [z-a] is backwards and does not work Note : [A-z] does match upper and lowercase but also 6 additional characters between the upper and lower case letters in the ASCII chart: [ \ ] ^ _ `

34 Regular Expression - Others Match a single character (non specific) with "." (dot) a.c = matches any string with "a" follow by one character and followed by "c" Specifying number of repetition sets with \{ and \} [a-z]\{4,6\} = match four, five or six lower case alphabet Remembering Patterns with \(,\) and \1 Regular Exp allows you to remember and recall patterns

35 RegExp problem and strategies You tend to match more lines than desired. A.*B matches AAB as well as AAAAAAACCCAABBBBAABBB Knowing what you want to match Knowing what you dont want to match Writing a pattern out to describe that you want to match Testing the pattern More info : type "man re_syntax" in a unix shell

36 Example problem - Background Biologists are interested in analysing proteins that are from a particular biochemical enzyme class "CDK1, CDK2 or CDK3". In additional, biologists would like to extract those protein sequences that contain the amino acid pattern (motif) that represents a particular virus binding site. Serine, Glutamic Acid, (multiple occurrence of) Alanine, Glycine Serine = S, Glutamic Acid = E, Alanine = A, Glycine = G

37 Example Problem - Dataset Dataset was downloaded from an online phosphorylation protein database. Contains protein entries in one file. One entry per line and terminates with carriage return character. Comma delimited entries –field1, field2, field3, field4, …..

38 Example Problem - Dataset fields 1. acc - unique database ID 2. sequence - amino acid sequence for the protein 3. position - position along sequence that is phophorylated 4. code - amino acid that is phophorylated 5. pmid - unique protein ID linked to an international protein database 6. kinase - enzyme class of this protein 7. source - where this protein found 8. entry_date - date entered into the database

39 Example Problem - Dataset fields 1. acc - unique database ID 2. sequence - amino acid sequence for the protein 3. position - position along sequence that is phophorylated 4. code - amino acid that is phophorylated 5. pmid - unique protein ID linked to an international protein database 6. kinase - enzyme class of this protein 7. source - where this protein found 8. entry_date - date entered into the database

40 The task 1. Extract those entries that have the string CDK1, CDK2 or CDK3 in the enzyme column. 2. Within our extracted entries, search and match those sequences that contain the virus binding pattern. 3. Print out the database ID of the positively matched entries.

41 Problem: Divide and conquer 1. enzyme class CDK1, CDK2 or CDK3 2. extract those protein with the pattern Serine, Glutamic Acid, (multiple occurrence of) Alanine, Glycine Serine = S, Glutamic Acid = E, Alanine = A, Glycine = G

42 Interesting parts of Perl not covered in this lecture Hashes –One unique variable that is linked to another variable "Lecture 1002" ---> "Thur 3pm" "Lecture 1002" ---> 25 "Lecture 1002" ---> [name1, name2, … ] "Lecture 1002" ---> [{name1},{name2}.. ] {name2} -> student ID {name1} --> student ID

43 Interesting parts of Perl not covered in this lecture CGI (Common Gateway Interface) –Creation of dynamic web pages using perl –CGI, PHP, JavaScript, Java Applet, etc. Object Oriented Perl Perl books & references to explore at your own curiosity –http://perldoc.perl.org/http://perldoc.perl.org/ –http://www.oreilly.com/pub/topic/perlhttp://www.oreilly.com/pub/topic/perl –Book: OReilly - Perl Cookbook - This will save you someday –Book: O'Reilly - Mastering Regular Expressions


Download ppt "Perl Practical Extration and Reporting Language An Introduction by Shwen Ho."

Similar presentations


Ads by Google