Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:

Similar presentations


Presentation on theme: "Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:"— Presentation transcript:

1 part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments: @ARGV array Modules in Perl: How to use/share libraries of functions Functions/Subroutines: Repetitive use of functional blocks Error messages: How to interrupt program on a mistake die statement

2 part 4 Arrays as a “FIRST-COME … LAST-SERVED” storage @a = (7,-1,2,4,5); 5 numbers array # zero array @a = (); # store numbers push @a, 7; push @a, -1; push @a, 2; push @a, 4; push @a, 5; $lastNumber = pop @a; print “last number stored in @a was $lastNumber\n”; Jar of 5 numbers 7 5 4 2 push 5 pop

3 part 4 When push/pop commands are useful? #!/usr/local/bin/perl # storing file data @fileLines = (); open (INP, “ ) { chomp($line); push @fileLines, $line; } close(INP); # calculating number of lines in the file $nLines = $#fileLines + 1; print “There are $nLines lines in data.txt file\n”; # printing out data.txt file content foreach $line (@fileLines) { print “$line\n”; } 1 18 23 2 Finding potential regulatory elements in noncoding regions of the human genome is a challenging problem. Analyzing novel sequences for the presence of known transcription factor binding sites or their weight matrices produces a huge number of @a = (1..6); foreach $d (@a) { print “$d “; } print “\n”; 1 2 3 4 5 6

4 part 4 Command line arguments #!/usr/local/bin/perl # determine file name $fName = $ARGV[0]; # open, read and print out file open (INP, “ ) { print $line; } close(INP); printFile.pl -- program, which prints out contents of a file 1 18 23 2 -123 Finding potential regulatory elements in noncoding regions of the human genome is a challenging problem. Analyzing novel sequences for the presence of known transcription factor binding sites or their weight matrices produces a huge number of numbers.txt words.txt printFile.pl numbers.txt printFile.pl words.txt @ARGV -- array of arguments following program name @ARGV = (“numbers.txt”);

5 part 4 Example. Print out N-th line of the file #!/usr/local/bin/perl # determine file name, and line index $fName = $ARGV[0]; $lineNo = $ARGV[1]; # open and read file open (INP, “ ) { push @fileLines, $line; } close(INP); # print out N-th line print $fileLines[ $lineNo-1 ]; Finding potential regulatory elements in noncoding regions of the human genome is a challenging problem. Analyzing novel sequences for the presence of known transcription factor binding sites or their weight matrices produces a huge number of words.txt printFile.pl words.txt 3 a challenging problem. Analyzing novel

6 part 4 Error messages #!/usr/local/bin/perl # check whether we’ve got 2 arguments or not if ($#ARGV != 1) { die “Error. Incorrect number of arguments\n”; }... printFile.pl words.txt 3 How to stop correctly a program with an indication of a run problem? Example problem: Program should be executed with 2 arguments, but user specifies only 1: printFile.pl 3 Program should stop and report about an error Print out a message and stop the program Stop on incorrect indication of a line number:... if ($ARGV[1] <= 0) { die “Error. Incorrect line number: $ARGV[1]\n”; }...

7 part 4 Defining novel functions and commands $x = min(5,3); print “Smallest of 5 and 3 is: $x\n”; # Function min sub min { ($a, $b) = @_; if ($a < $b) { $small = $a; } else { $small = $b; } return $small; } Defining min function, which returns minimum of 2 numbers: Function is a “mini computer” inside a program, it gets input data and produces output results FUNCTION (filtering out numbers) INPUT 2 Hello 3 4 7 Everybody 33 57 OUTPUT Hello Everybody INPUT parameters

8 part 4 Regular expressions $string1 = “Total: 576 genes, 2763 exons, some introns”; $string2 = “human -G-ACT---TTGC------AA----A---A----”; How to extract 2 numbers? How to extract just DNA sequence? Special symbols substituting groups of common type characters (called patterns): \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit character ^ Match the beginning of the line. Match any character (except newline) $ Match the end of the line \t Tabulation symbol (HT, TAB) \n Newline (LF, NL)

9 part 4 Grouping options: * Match 0 or more times + Match 1 or more times [] Character class Patterns management: $string = “Total: 576 genes, 2763 exons, some introns”; $string =~ s/\d+/some/g; --> “Total: some genes, some exons, some introns”; $string =~ s/\s+/#/g; --> “Total:#576#genes,#2763#exons,#some#introns”; $string =~ s/\D+/\*/g; --> “* 576 * 2763 * * *”;

10 part 4 Localizing substrings: 10 20 human -G-ACT---TTGC------AA----A---A-----CG-----G-AT-------TGGG--- | ||| ||| || | | || | || |||| mouse TGAACTCAAGTGCTATTTTAATTCCATTCATTCTCCGTGGCTGCATCAGGGCCTGGGGCT 10 20 30 40 50 60 30 human ---------------C----GG------GA-------TG-AG--AGG------------- | || || || || ||| mouse CTACCTCCTGACAAACATTTGGTCTCTAGAAGGCTTCTGAAGTTAGGCAAGTCTGAAAAT 70 80 90 100 110 120 alignment.blast while ($line = ) { if ($line =~ /^mouse/) { print $line;} How to extract only the lines starting with ‘mouse’ ? mouse TGAACTCAAGTGCTATTTTAATTCCATTCATTCTCCGTGGCTGCATCAGGGCCTGGGGCT mouse CTACCTCCTGACAAACATTTGGTCTCTAGAAGGCTTCTGAAGTTAGGCAAGTCTGAAAAT

11 part 4 Obtaining substrings after localization: 10 20 human -G-ACT---TTGC------AA----A---A-----CG-----G-AT-------TGGG--- | ||| ||| || | | || | || |||| mouse TGAACTCAAGTGCTATTTTAATTCCATTCATTCTCCGTGGCTGCATCAGGGCCTGGGGCT 10 20 30 40 50 60 30 human ---------------C----GG------GA-------TG-AG--AGG------------- | || || || || ||| mouse CTACCTCCTGACAAACATTTGGTCTCTAGAAGGCTTCTGAAGTTAGGCAAGTCTGAAAAT 70 80 90 100 110 120 alignment.blast $humanSeq = “”; $mouseSeq = “”; while ($line = ) { if ($line =~ /^mouse (\S+)$/) { $mouseSeq.= $1; } elsif ($line =~ /^human (\S+)$/) { $humanSeq.= $1; } } print “Human sequence: $humanSeq\n”; print “Mouse sequence: $mouseSeq\n”; How to extract human and mouse sequences? /...(xxx)...(xxx)../ -- substrings enclosed into parenthesizes are available after a search in a format of variables $1, $2,...

12 part 4 Modules: Perl does not have functions for all the cases, but majority of those functions are already programmed by other people… And they share their libraries of functions, which are called modules Perl does not know how to create pictures, use GD; -- now it knows How to communicate with databases? use DBI; How to do DNA sequence analysis? use BioPerl; How to extract command line options? use Getopt; http://cpan.org/ -- storage of Perl modules use X; command indicates that functions from X module should be used


Download ppt "Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:"

Similar presentations


Ads by Google