Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use = (‘john’, ‘steve’,

Similar presentations


Presentation on theme: "1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use = (‘john’, ‘steve’,"— Presentation transcript:

1 1 Introduction to Perl Part II

2 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’, ‘aaron’, ‘max’, ‘juan’, ‘sue’) %hash = ( ‘apple’ => 12, ‘pear’ => 3, ‘cherry’ =>30, ‘lemon’ => 2, ‘peach’ => 6, ‘kiwi’ => 3); 102345 ArrayHash pear apple cherry lemon peach kiwi ‘john’ ‘steve’ ‘aaron’ ‘max’ ‘juan’ ‘sue’ 12 3 30 2 6 3

3 3 Using hashes { } operator Set a value – $hash{‘cherry’} = 10; Access a value – print $hash{‘cherry’}, “\n”; Remove an entry – delete $hash{‘cherry’};

4 4 Get the Keys keys function will return a list of the hash keys @keys = keys %fruit; for my $key ( keys %fruit ) { print “$key => $hash{$key}\n”; } Would be ‘apple’, ‘pear’,... Order of keys is NOT guaranteed!

5 5 Get just the values @values = values %hash; for my $val ( @values ) { print “val is $val\n”; }

6 6 Iterate through a set Order is not guaranteed! while( my ($key,$value) = each %hash){ print “$key => $value\n”; }

7 7 Subroutines Set of code that can be reused Can also be referred to as procedures and functions

8 8 Defining a subroutine sub routine_name { } Calling the routine – routine_name; – &routine_name; (& is optional)

9 9 Passing data to a subroutine Pass in a list of data – &dosomething($var1,$var2); – sub dosomething { my ($v1,$v2) = @_; } sub do2 { my $v1 = shift @_; my $v2 = shift; }

10 10 Returning data from a subroutine The last line of the routine set the return value sub dothis { my $c = 10 + 20; } print dothis(), “\n”; Can also use return specify return value and/or leave routine early

11 sub is_stopcodon { my $val = shift @_; if( length($val) != 3 ) { return -1; } elsif( $val eq ‘TAA’ || $val eq ‘TAG’ || $val eq ‘TGA’ ) { return 1; } else { return 0; } Write subroutine which returns true if codon is a stop codon (for standard genetic code). -1 on error, 1 on true, 0 on false

12 12 Context array versus scalar context my $length = @lst; my ($first) = @lst; Want array used to report context subroutines are called in Can force scalar context with scalar my $len = scalar @lst;

13 13 subroutine context sub dostuff { if( wantarray ) { print “array/list context\n”; } else { print “scalar context\n”; } dostuff(); # scalar my @a = dostuff(); # array my %h = dostuff(); # array my $s = dostuff(); # scalar

14 14 Why do you care about context? sub dostuff { my @r = (10,20); return @r; } my @a = dostuff(); # array my %h = dostuff(); # array my $s = dostuff(); # scalar print “@a\n”; # 10 20 print join(“ “, keys %h),”\n”; # 10 print “$s\n”; # 2

15 15 References Are “pointers” the data object instead of object itsself Allows us to have a shorthand to refer to something and pass it around Must “dereference” something to get its actual value, the “reference” is just a location in memory

16 16 Reference Operators \ in front of variable to get its memory location – my $ptr = \@vals; [ ] for arrays, { } for hashes Can assign a pointer directly – my $ptr = [ (‘owlmonkey’, ‘lemur’)]; my $hashptr = { ‘chrom’ => ‘III’, ‘start’ => 23};

17 17 Dereferencing Need to cast reference back to datatype my @list = @$ptr; my %hash = %$hashref; Can also use ‘{ }’ to clarify – my @list = @{$ptr}; – my %hash = %{$hashref};

18 18 Really they are not so hard... my @list = (‘fugu’, ‘human’, ‘worm’, ‘fly’); my $list_ref = \@list; my $list_ref_copy = [@list]; for my $item ( @$list_ref ) { print “$item\n”; }

19 19 Why use references? Simplify argument passing to subroutines Allows updating of data without making multiple copies What if we wanted to pass in 2 arrays to a subroutine? sub func { my (@v1,@v2) = @_; } How do we know when one stops and another starts?

20 20 Why use references? Passing in two arrays to intermix. sub func { my ($v1,$v2) = @_; my @mixed; while( @$v1 || @$v2 ) { push @mixed, shift @$v1 if @$v1; push @mixed, shift @$v2 if @$v2; } return \@mixed; }

21 21 References also allow Arrays of Arrays my @lst; push @lst, [‘milk’, ‘butter’, ‘cheese’]; push @lst, [‘wine’, ‘sherry’, ‘port’]; push @lst, [‘bread’, ‘bagels’, ‘croissants’]; my @matrix = [ [1, 0, 0], [0, 1, 0], [0, 0, 1] ];

22 22 Hashes of arrays $hash{‘dogs’} = [‘beagle’, ‘shepherd’, ‘lab’]; $hash{‘cats’} = [‘calico’, ‘tabby’, ‘siamese’]; $hash{‘fish’} = [‘gold’,’beta’,’tuna’]; for my $key (keys %hash ) { print “$key => “, join(“\t”, @{$hash{$key}}), “\n”; }

23 23 More matrix use my @matrix; open(IN, $file) || die $!; # read in the matrix while( ) { push @matrix, [split]; } # data looks like # GENENAME EXPVALUE STATUS # sort by 2nd column for my $row ( sort { $a->[1] $b->[1] } @matrix ) { print join(“\t”, @$row), “\n”; }

24 24 Funny operators my @bases = qw(C A G T) my $msg = <<EOF This is the message I wanted to tell you about EOF ;

25 25 Part of “amazing power” of Perl Allow matching of patterns Syntax can be tricky Worth the effort to learn! Regular Expressions

26 26 if( $fruit eq ‘apple’ || $fruit eq ‘Apple’ || $fruit eq ‘pear’) { print “got a fruit $fruit\n”; } if( $fruit =~ /[Aa]pple|pear/ ){ print “matched fruit $fruit\n”; } A simple regexp

27 27 use the =~ operator to match if( $var =~ /pattern/ ) {} - scalar context my ($a,$b) = ( $var =~ /(\S+)\s+(\S+)/ ); if( $var !~ m// ) { } - true if pattern doesn’t m/REGEXPHERE/ - match s/REGEXP/REPLACE/ - substitute tr/VALUES/NEWVALUES/ - translate Regular Expression syntax

28 28 Search a string for a pattern match If no string is specified, will match $_ Pattern can contain variables which will be interpolated (and pattern recompiled) while( ) { if( /A$num/ ) { $num++ } } while( ) { if( /A$num/o ) { $num++ } } m// operator (match)

29 29 m// -if specify m, can replace / with anything e.g. m##, m[], m!! /i - case insensitive /g - global match (more than one) /x - extended regexps (allows comments and whitespace) /o - compile regexp once Pattern extras

30 30 \s - whitespace (tab,space,newline, etc) \S - NOT whitespace \d - numerics ([0-9]) \D - NOT numerics \t, \n - tab, newline. - anything Shortcuts

31 31 + - 1 -> many (match 1,2,3,4,... instances ) /a+/ will match ‘a’, ‘aa’, ‘aaaaa’ * - 0 -> many ? - 0 or 1 {N}, {M,N} - match exactly N, or M to N [], [^] - anything in the brackets, anything but what is in the brackets Regexp Operators

32 32 Things in parentheses can be retrieved via variables $1, $2, $3, etc for 1st,2nd,3rd matches if( /(\S+)\s+([\d\.\+\-]+)/) { print “$1 --> $2\n”; } my ($name,$score) = ($var =~ /(\S+)\s+([\d\.\+\-]+)/); Saving what you matched

33 33 Simple Regexp my $line = “aardvark”; if( $line =~ /aa/ ) { print “has double a\n” } if( $line =~ /(a{2})/ ) { print “has double a\n” } if( $line =~ /(a+)/ ) { print “has 1 or more a\n” }

34 34 Matching gene names # File contains lots of gene names # YFL001C YAR102W - yeast ORF names # let-1, unc-7 - worm names http://biosci.umn.edu/CGC/Nomenclature/nomenguid.htm # ENSG000000101 - human Ensembl gene names while( ) { if( /^(Y([A-P])(R|L)(\d{3})(W|C)(\-\w)?)/ ) { printf “yeast gene %s, chrom %d,%s arm, %d %s strand\n”, $1, (ord($2)-ord(‘A’))+1, $3, $4; } elsif( /^(ENSG\d+)/ ) { print “human gene $1\n” } elsif( /^(\w{3,4}\-\d+)/ ) { print “worm gene $1\n”; } }

35 35 A parser for output from a gene prediction program Putting it all together

36 GlimmerM (Version 3.0) Sequence name: BAC1Contig11 Sequence length: 31797 bp Predicted genes/exons Gene Exon Strand Exon Exon Range Exon # # Type Length 1 1 + Initial 13907 13985 79 1 2 + Internal 14117 14594 478 1 3 + Internal 14635 14665 31 1 4 + Internal 14746 15463 718 1 5 + Terminal 15497 15606 110 2 1 + Initial 20662 21143 482 2 2 + Internal 21190 21618 429 2 3 + Terminal 21624 21990 367 3 1 - Single 25351 25485 135 4 1 + Initial 27744 27804 61 4 2 + Internal 27858 27952 95 4 3 + Internal 28091 28576 486 4 4 + Internal 28636 28647 12 4 5 + Internal 28746 28792 47 4 6 + Terminal 28852 28954 103 5 3 - Terminal 29953 30037 85 5 2 - Internal 30152 30235 84 5 1 - Initial 30302 30318 17

37 37 while(<>) { if(/^(Glimmer\S*)\s+\((.+)\)/ { $method = $1; $version = $2; } elsif( /^(Predicted genes)|(Gene)|(\s+\#)/ || /^\s+$/ ) { next } elsif( # glimmer 3.0 output /^\s+(\d+)\s+ # gene num (\d+)\s+ # exon num ([\+\-])\s+ # strand (\S+)\s+ # exon type (\d+)\s+(\d+) # exon start, end \s+(\d+) # exon length /ox ) { my ($genenum,$exonnum,$strand,$type,$start,$end, $len) = ( $1,$2,$3,$4,$5,$6,$7); } } Putting it together

38 38 Same as m// but will allow you to substitute whatever is matched in first section with value in the second section $sport =~ s/soccer/football/ $addto =~ s/(Gene)/$1-$genenum/; s/// operator (substitute)

39 39 Match and replace what is in the first section, in order, with what is in the second. lowercase - tr/[A-Z]/[a-z]/ shift cipher - tr/[A-Z]/[B-ZA]/ revcom - $dna =~ tr/[ACGT]/[TGCA]/; $dna = reverse($dna); The tr/// operator (translate)

40 40 aMino - {A,C}, Keto - {G,T} puRines - {A,G}, prYmidines - {C,T} Strong - {G,C}, Weak - {A,T} H (Not G)- {ACT}, B (Not A), V (Not T), D(Not C) $str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/ tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/; (aside) DNA ambiguity chars


Download ppt "1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use = (‘john’, ‘steve’,"

Similar presentations


Ads by Google