7) && (substr($line,2,5) eq "TITLE") ) { while ((defined $line) && (substr($line,2,7) ne "JOURNAL")) { chomp $line; $title = $title.substr($line,12); # concatenate the title line $line = ; } # push entire title to title array $title=""; } # if reached FEATURES line - sort and print titles array if ((substr($line,0,8) eq "FEATURES") ) = foreach $title { print "$title\n"; = (); # empty title array } $line = ; }"> 7) && (substr($line,2,5) eq "TITLE") ) { while ((defined $line) && (substr($line,2,7) ne "JOURNAL")) { chomp $line; $title = $title.substr($line,12); # concatenate the title line $line = ; } # push entire title to title array $title=""; } # if reached FEATURES line - sort and print titles array if ((substr($line,0,8) eq "FEATURES") ) = foreach $title { print "$title\n"; = (); # empty title array } $line = ; }">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

8.1 Common Errors – Exercise #3 Assuming something on the variable part of the input file. When parsing a format file (genebank, fasta or any other format),

Similar presentations


Presentation on theme: "8.1 Common Errors – Exercise #3 Assuming something on the variable part of the input file. When parsing a format file (genebank, fasta or any other format),"— Presentation transcript:

1 8.1 Common Errors – Exercise #3 Assuming something on the variable part of the input file. When parsing a format file (genebank, fasta or any other format), you should only rely on the format for parsing and not on the variable part of the input. Thus parsing by features such as these is wrong: –Assuming each line in the title will start with a lowercase letter –Assuming the title will be composed of only 2 lines It is legitimate to rely on the presence of the words ‘TITLE’ and ‘JOURNAL’ for the parsing as these are a part of the format. Reading the whole file at once (@all_lines = ;). This is risky in case the file is very large… When we do not need all the lines in the file at once, we try to use $line = in a ‘while’ loop. Performing an action on a variable without checking if it is defined. This can generate errors in some cases. Use of functions/features not taught in class.

2 8.2 Solution to HW3 Q#6 For each protein record print the first line (the LOCUS line) followed by a sorted list of its reference TITLEs. 1. Read the file 2. if reached LOCUS line print it 3. if reached TITLE start an inner loop until reaching the JOURNAL line (to take the full title) 4. push entire TITLE to titles array 5. If reached a FEATURES line print the title array and initialize...

3 8.3 my $line = ; # read input lines while (defined $line){ chomp($line); # if reached LOCUS line print it if ((substr($line,0,5) eq "LOCUS") ) { print "\n$line\n"; } # if reached TITLE start an inner loop until reaching the JOURNAL line if ( (length($line) > 7) && (substr($line,2,5) eq "TITLE") ) { while ((defined $line) && (substr($line,2,7) ne "JOURNAL")) { chomp $line; $title = $title.substr($line,12); # concatenate the title line $line = ; } push(@titleArray,$title); # push entire title to title array $title=""; } # if reached FEATURES line - sort and print titles array if ((substr($line,0,8) eq "FEATURES") ) { @titleArray = sort(@titleArray); foreach $title (@titleArray) { print "$title\n"; } @titleArray = (); # empty title array } $line = ; }

4 8.4 Hashes (associative arrays)

5 8.5 Let's say we want to create a phone book... Enter a name that will be added to the phone book: Dudi Enter a phone number: 6409245 Enter a name that will be added to the phone book: Dudu Enter a phone number: 6407693 Hash Motivation

6 8.6 An associative array (or simply – a hash) is an unordered set of pairs of keys and values. Each key is associated with a value. A hash variable name always start with a “%”: my %hash; Initialization: %hash = ("a"=>5, "bob"=>"zzz", 50=>"John"); Accessing: you can access a value by its key: print $hash{50};John Tip you can reset the hash (to an empty one) by %hash = (); Note: a key in a hash will be interpreted as a string. These are equivalent: Hash – an associative array %hash 5"a" => "zzz""bob" => "John"50 => 50=>”John” “50”=>”John” $hash{50} $hash{“50”}

7 8.7 modifying : $hash{bob} = "aaa"; (modifying an existing value) adding : $hash{555} = "z"; (adding a new key-value pair) You can ask whether a certain key exists in a hash: if (exists $hash{50} )... You can delete a certain key-value pair in a hash: delete($hash{50}); Hash – an associative array %hash 5"a" => "zzz""bob" => "John"50 => %hash 5"a" => "aaa""bob" => "John"50 => %hash 5"a" => "aaa""bob" => "John"50 => "z"555 => %hash 5"a" => "aaa""bob" => "z"555 =>

8 8.8 Variable types in PERL ScalarArrayHash $number -3.54 $string "hi\n" @array %hash => $array[0] $hash{key}

9 8.9 An associative array of the phone book suggested in the first slide (we will see a more elaborated version later on): Declare my %phoneBook; Updating $phoneBook{"Dudi"} = 9245; $phoneBook{"Dudu"} = 7693; Fetching print $phoneBook{"Dudi"}; Hash – an associative array %hash 9245"Dudi" => 7693"Dudu" =>

10 8.10 It is possible to get a list of all the keys in %hash my @hashKeys = keys(%hash); Similarly you can get an array of the values in %hash my @hashVals = values(%hash); Iterating over hash elements %hash 5"a" => "zzz""bob" => "John"50 => @hashKeys "bob"50 "a" @hashVals 5 "John" "zzz"

11 8.11 To iterate over all the values in %hash my @hashVals = values(%hash); foreach my $value (@hashVals)... To iterate over the keys in %hash my @hashKeys = keys(%hash); foreach my $key (@hashKeys)... Iterating over hash elements %hash 5"a" => "zzz""bob" => "John"50 => @hashKeys "bob"50 "a" @hashVals 5 "John" "zzz"

12 8.12 For example, iterating over the keys in %hash : my @hashKeys = keys(%hash); foreach my $key (@hashKeys) { print "The key is $key\n"; print "The value is $hash{$key}\n"; } Iterating over hash elements %hash 5"a" => "zzz""bob" => "John"50 => The key is bob The value is zzz The key is a The value is 5 The key is 50 The value is John @hashKeys "bob"50 "a" @hashVals 5 "John" "zzz"

13 8.13 Notably: The elements are given in an arbitrary order, so if you want a certain order use sort: my @hashKeys = keys(%hash); my @sortedHashKeys = sort(@hashKeys); foreach $key (@sortedHashKeys) { print "The key is $key\n"; print "The value is $hash{$key}\n"; } Iterating over hash elements %hash 5"a" => "zzz""bob" => "John"50 => @hashKeys "bob"50 "a" @hashVals 5 "John" "zzz"

14 8.14 ###################################### # Purpose: Store names and phone numbers in a hash, # and allow the user to ask for the number of a certain name. # Input: Enter name-number pairs, enter "END" as a name to stop, # then enter a name to get his/her number # use strict; my %phoneNumbers = (); my $number; Example – phoneBook.pl #1

15 8.15 # Ask user for names and numbers and store in a hash my $name = ""; while (1==1) { print "Enter a name that will be added to the phone book:\n"; $name = ; chomp $name; if ($name eq "END") { last; } print "Enter a phone number: \n"; $number = ; chomp $number; $phoneNumbers{$name} = $number; } Example – phoneBook.pl #2

16 8.16 # Ask for a name and print the corresponding number $name = ""; while (1==1) { print "Enter a name to search for in the phone book:\n"; $name = ; chomp $name; if (exists($phoneNumbers{$name})) { print "The phone number of $name is: $phoneNumbers{$name}\n"; } elsif ($name eq "END") { last; } else { print "Name not found in the book\n"; } Example – phoneBook.pl #3

17 8.17 Class exercise 8 1.Write a script that reads a file with a list of protein names and lengths (proteinLengths ): AP_000081 181 AP_000174 104 AP_000138 145 stores the names of the sequences as hash keys, with the length of the sequence as the value. Print the keys of the hash.proteinLengths 2.Add to Q1: Read another file, and print the names that appeared in both files with the same length. Print a warning if the name is the same but the length is different. 3.Write a script that reads a GenPept file (you may use the preproinsulin record), finds all JOURNAL lines, and stores the journal name (as key) and year of publication (as value) in a hash:the preproinsulin record a. Store only the first year (order of appearance in the file) value for each journal name b*.Store all years for each journal name Then print the names and years, sorted by the journal name (no need to sort the years for the same journal in b*, unless you really want to do so … )


Download ppt "8.1 Common Errors – Exercise #3 Assuming something on the variable part of the input file. When parsing a format file (genebank, fasta or any other format),"

Similar presentations


Ads by Google