Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming and Perl for Bioinformatics Part III.

Similar presentations


Presentation on theme: "Programming and Perl for Bioinformatics Part III."— Presentation transcript:

1 Programming and Perl for Bioinformatics Part III

2 Basic Data Types Perl has three basic data types: Perl has three basic data types: scalar scalar array (list) array (list) associative array (hash) associative array (hash)

3 Associative Arrays/Hashes List of scalar values (like array) List of scalar values (like array) Elements referred to by key, not index number Elements referred to by key, not index number Elements stored as a list of key-value pairs Elements stored as a list of key-value pairs %threeletter = ('A','ALA','V','VAL','L','LEU'); %threeletter = ('A','ALA','V','VAL','L','LEU'); key value key value key value key value key value key value print $threeletter{'A'}; # “ALA” print $threeletter{'A'}; # “ALA” print $threeletter{'L'}; ? print $threeletter{'L'}; ? exists checks if a specific hash key exists exists checks if a specific hash key exists if ($threeletter{'E'}) print ($threeletter{'E'}); ? print "Exists\n" if exists $array{$key}; print "Defined\n" if defined $array{$key}; print "True\n" if $array{$key};

4 Getting all keys and values in a hash %threeletter = ('A','ALA','V','VAL','L','LEU'); keysreturns a list of all keys keysreturns a list of all keys valuesreturns a list of all values valuesreturns a list of all values each returns one key-value pair each time it’s called each returns one key-value pair each time it’s called ($key, $val) = each %threeletter; Unlike array, not an ordered list (order of key-value pairs determined by the Perl interpreter) Unlike array, not an ordered list (order of key-value pairs determined by the Perl interpreter) foreach $k ( keys %threeletter ) { print $k;} # Might return, for instance, “A L V”, # not “A V L” (need not to be sorted) foreach $v ( values %threeletter ) { print $v;} ?

5 Associative Arrays Some common functions: Some common functions: keys(%hash) #returns a list of all the keys keys(%hash) #returns a list of all the keys values(%hash) #returns a list of all the values values(%hash) #returns a list of all the values each(%hash) #each time this is called, it will #return a 2 element list #consisting of the next #key/value pair in the array each(%hash) #each time this is called, it will #return a 2 element list #consisting of the next #key/value pair in the array delete($hash{[key]}) #remove the pair associated #with key delete($hash{[key]}) #remove the pair associated #with key

6 More on Perl Subroutines and Functions Subroutines and Functions A way to organize a program A way to organize a program Wrap up a block of code Wrap up a block of code Have a name Have a name Provide a way to pass values to the block and report back the results Provide a way to pass values to the block and report back the results Regular expression Regular expression

7 Basics about Subroutines # define a subroutine # define a subroutine sub myblock { my ($arg1, $arg2, $arg3, …, $argN) = @_; my ($arg1, $arg2, $arg3, …, $argN) = @_; # @_ is special variable containing args # @_ is special variable containing args print "Please enter something: "; print "Please enter something: ";} # function call # function call myblock($arg1, $arg2, …, $argN); Example Example sub add8A { sub add8A { my ($rna) = @_; $rna.= "AAAAAAAA"; return $rna; } #the original rna $rna = "CGAAUCUAGGAU " ; $longer_rna = add8A($rna); print " I added 8 As to $rna to get $longer_rna.\n";

8 More example sub denaturizing { my (@products) = @_; my (@products) = @_; my @strands = (); my @strands = (); foreach $pairs (@products) { foreach $pairs (@products) { ($A,$B) = split /\s/, $pairs; ($A,$B) = split /\s/, $pairs; @strands = (@strands, $A, $B); @strands = (@strands, $A, $B); } return @strands; return @strands;} #templates are in the form "A B". Ex. “ACGT TGCA” @Denatured = denaturizing(@PCRproducts);

9 Variables Scope A variable $a is used both in the subroutine and in the main part program of the program. A variable $a is used both in the subroutine and in the main part program of the program. $a = 0; print "$a\n"; sub changeA { $a = 1; $a = 1;} print "$a\n"; changeA(); The value of $a is printed three times. Can you guess what values are printed? The value of $a is printed three times. Can you guess what values are printed? $a is a global variable $a is a global variable use strict; my $a = 0; print "$a\n"; sub changeA { my $a = 1; my $a = 1;} print "$a\n"; changeA();

10 Ex: What would be the output? #!/usr/bin/perl -w $dna = 'AAAAA'; $result = A_to_T($dna); print "I changed all the A's in $dna to T's and got $result\n\n"; ############################################# # Subroutines sub A_to_T { my($input) = @_; my($input) = @_; $dna = $input; $dna = $input; $dna =~ s/A/T/g; $dna =~ s/A/T/g; return $dna; return $dna;} Output?

11 Regular Expressions Regular Expressions: Language for specifying text strings Regular Expressions: Language for specifying text strings Regular Expressions is a mechanism for specifying character patterns Regular Expressions is a mechanism for specifying character patterns Useful for Useful for Finding files by name Finding files by name Finding text in a file Finding text in a file Finding (or not finding) interesting text in a string Finding (or not finding) interesting text in a string Text based search and replace Text based search and replace Finding and extracting text Finding and extracting text

12 Pattern Finding Problem: find an ORF in nucleotide sequence Look for start (ATG) and stop codons (TAA, TAG, TGA) Look for start (ATG) and stop codons (TAA, TAG, TGA) Pattern search operator: Pattern search operator: m// or // $string =~ / / returns true if the pattern matches somewhere in $string, false otherwise $string =~ / / returns true if the pattern matches somewhere in $string, false otherwise Example: Example: $dna = "GATGCCATGACACTGTTCA"; if ($dna =~ /ATG/){ print "starting codon is there"; } else { print "no starting codon!\n"; }

13 Regular Expressions Optional characters ?, * and + Optional characters ?, * and + /colou?r/  color or colour /colou?r/  color or colour ? (0 or 1) ? (0 or 1) /oo*h!/  oh! or ooh! or ooooh! /oo*h!/  oh! or ooh! or ooooh! * (0 or more) * (0 or more) /o+h!/  oh! or ooh! or ooooh! + (1 or more) Wild cards. /beg.n/  begin or began or begun *+*+ Stephen Cole Kleene

14 White-space characters \t (tab), \n (newline), \r (return) \s: match a whitespace character x: character 'x'.: any character except newline ^r: match at beginning of line r$: match at end of line r|s: match either or (r): group characters (to be saved in $1, $2, etc) [xyz]: character class, in this case, matches either an 'x', a 'y', or a 'z' [abj-oZ]: character class with a range in it; matches 'a', 'b', any letter from 'j' through 'o', or 'Z' r*: zero or more r's, where r is any regular expression r+: one or more r's r?: zero or one r's (i.e., an optional r) {name} : expansion of the "name" definition rs : RE r followed by RE s (e.g., concatenation) Common Regular Expressions

15 Exercise Ex1: $dna = AGGCTCGTACGACG; if( $dna =~ /CT[CGT]ACG/ ) { print "I found the motif!!\n"; #? } Ex2: Find an ORF in nucleotide sequence (look for start (ATG) and stop codons (TAA, TAG, TGA)) $dna = "tatggagcctcctgaggctacagccacacctgagccactctaaga"; ?

16 Exercise Ex2: Find an ORF in nucleotide sequence (look for start (ATG) and stop codons (TAA, TAG, TGA)) $dna = "tatggagcctcctgaggctacagccacacctgagccactctcaga"; if ($dna =~ m/(atg(...)*((tag)|(taa)|(tga)))/) { print $1, "\n"; } else { print "does not exit!\n"; }


Download ppt "Programming and Perl for Bioinformatics Part III."

Similar presentations


Ads by Google