# Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China.

## Presentation on theme: "Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China."— Presentation transcript:

Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China August 17 - August 29, 2009 Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China August 17 - August 29, 2009 Fredj Tekaia Institut Pasteur tekaia@pasteur.fr

perl A basic program #!/bin/perl # Program to print a message print 'Hello world.';# Print a message

Variables, Arrays \$val=9; \$val=ABC transporter; case sensitive: \$val is different from \$Val

Perl uses arithmetic operators: \$a = 1 + 2;# Add 1 and 2 and store in \$a \$a = 3 - 4; # Subtract 4 from 3 and store in \$a \$a = 5 * 6;# Multiply 5 and 6 \$a = 7 / 8;# Divide 7 by 8 to give 0.875 \$a = 9 ** 10;# Nine to the power of 10 \$a = 5 % 2;# Remainder of 5 divided by 2 \$a++;# Return \$a and then increment it \$a--;# Return \$a and then decrement it for strings perl has among others: \$a = \$b. \$c;# Concatenate \$b and \$c \$a = \$b x \$c;# \$b repeated \$c times Operations and Assignment

To assign values perl includes \$a = \$b;# Assign \$b to \$a \$a += \$b;# Add \$b to \$a \$a -= \$b;# Subtract \$b from \$a \$a.= \$b;# Append \$b onto \$a

Array variables An array variable is a list of scalars (ie numbers and/or strings). they are prefixed by: @ @SEQNAME = (MG001", MG002", MG003"); \$SEQNAME [2] (MG003) Attention: 0, 1, 2,.... @num = (0,1,2,3);

@L_CODONS = ('TTT','TTC','TTA','TTG', 'CTT','CTC','CTA','CTG', 'ATT','ATC','ATA','ATG', 'GTT','GTC','GTA','GTG', 'TCT','TCC','TCA','TCG', 'CCT','CCC','CCA','CCG', 'ACT','ACC','ACA','ACG', 'GCT','GCC','GCA','GCG', 'TAT','TAC','TAA','TAG', 'CAT','CAC','CAA','CAG', 'AAT','AAC','AAA','AAG', 'GAT','GAC','GAA','GAG', 'TGT','TGC','TGA','TGG', 'CGT','CGC','CGA','CGG', 'AGT','AGC','AGA','AGG', 'GGT','GGC','GGA','GGG');

@AA = ('A','R','N','D','C','Q','E','G','H','I','L','K','M','F','P','S','T','W','Y','V','B'); @mm = ( 'a','r','n','d','c','q','e','g','h','i','l','k','m','f','p','s','t','w','y','v','b );

Associative arrays : hash tables Ordinary list arrays allow us to access their element by number. The first element of array @AA is \$AA[0]. The second element is \$AA[1], and so on. But perl also allows us to create arrays which are accessed by string. These are called associative arrays. array itself is prefixed by a % sign

%ages = (Michael", 39, "Angie", 27, "Willy", "21 years", "The Queen Mother", 108); \$ages{"Michael"};# Returns 39 \$ages{"Angie"};# Returns 27 \$ages{"Willy"};# Returns "21 years" \$ages{"The Queen Mother"};# Returns 108

File handling #!/bin/perl open(FILE,GMG.pep); while { print \$_; } close (FILE); a script (cat.pl) equivalent to the UNIX cat: use: chmod a+x cat.pl ; cat.pl

split #!/bin/perl open(FILE,GMG.pep); while { @tab=split(/ \s+/, \$_); print \$tab[0]; } close (FILE); A very useful function in perl: splits up a string and places it into an array.

#!/bin/perl open(FILE,GMG.pep); while { @tab=split(/ \s+/, \$_, 2); \$NOM{\$tab[0]} = \$tab[1]; print \$NOM{\$tab[0]} ; } close (FILE); @tab=split(/\s+/,\$_,n);

Control structures foreach To go through each line of an array or other list-like structure (such as lines in a file) perl uses the foreach structure. This has the form foreach \$nom (@SEQNAME)# Visit each item in turn # and call it \$nom { print "\$nom\n";# Print the item }

foreach \$j ( 0.. 2)# Visit each value in turn # and call it \$j { print "\$SEQNAM [\$j] \n";# Print the item } foreach \$j ( 0.. \$#AA)# Visit each value in turn # and call it \$j { print "\$AA [\$j] \n";# Print the item }

Testing Here are some tests on numbers and strings. \$a == \$b# Is \$a numerically equal to \$b? #Beware: Don't use the = operator. \$a != \$b# Is \$a numerically unequal to \$b? \$a eq \$b# Is \$a string-equal to \$b? \$a ne \$b# Is \$a string-unequal to \$b? You can also use logical and, or and not: (\$a && \$b)# Is \$a and \$b true? (\$a || \$b)# Is either \$a or \$b true? !(\$a)# is \$a false?

for for (initialise; test; inc) { first_action; second_action; etc.... } for (\$i = 0; \$i < 10; ++\$i)# Start with \$i = 1 # Do it while \$i < 10 # Increment \$i before repeating { print "\$i\n"; }

Conditionals if (\$a) { print "The string is not empty\n"; } else { print "The string is empty\n"; } #!/bin/perl open(FILE,GMG.pep); while { print \$_ if ( m/>/ ); } close (FILE);

String matching \$a eq \$b# Is \$a string-equal to \$b? \$a ne \$b# Is \$a string-unequal to \$b? Here are some special RE characters and their meaning.# Any single character except a newline ^# The beginning of the line or string \$# The end of the line or string *# Zero or more of the last character +# One or more of the last character ?# Zero or one of the last character

\n# A newline \t# A tab \w# Any alphanumeric (word) character. # The same as [a-zA-Z0-9_] \W# Any non-word character. # The same as [^a-zA-Z0-9_] \d# Any digit. The same as [0-9] \D# Any non-digit. The same as [^0-9] \s# Any whitespace character: space, # tab, newline, etc \S# Any non-whitespace character \b# A word boundary, outside [] only \B# No word boundary Some more special characters

Characters like \$, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash (\). So: \|# Vertical bar \[# An open square bracket \)# A closing parenthesis \*# An asterisk \^# A carat symbol \/# A slash \\# A backslash

Substitution and translation s/london/London/ \$sentence =~ s/london/London/ global substitution; i option (for "ignore case"). s/london/London/gi Translation \$sentence =~ tr/abc/edf/ tr/a-z/A-Z/; #converts \$_ to upper case tr/A-Z/a-z/; #converts \$_ to lower case

-given a nucleotide sequence: base composition -given a protein sequence: amino-acid composition; -given a nucleic databse (in fasta format): base composition -given a protein database (in fasta format): amino-acid composition Simple scripts

-sequence size (base or amino-acids) -extract a portion of a sequence: (pos start; pos end) -extract a sequence by name (from a database of sequences) -gene sequence: codon count; given allxxseqnew file: -script to compute frequencies of multiple matches; see splitfasta.pl; splitdnafasta.pl

given allxxseqnew file: -script to compute frequencies of multiple matches; Exercices de manipulation des données : - home-directory, mkdir, cd, pathway, pwd, find ; - notation : DB.pep, DB.dna, seq.dna, seq.prt ; - utiliser « tab » comme séparateur ; - utilisation de sed et de grep ; - le format fasta des séquences ; - compter le nombre des séquences dans une base de séquences au format fasta ; (grep « > » DB.pep wc –l ) - changer un caractère par un autre : -extraire les séquences dune base (fichier au format fasta) (splitfasta.pl, splitdnafasta.pl); -extraire 1 partie dune séquence (la séquence est au format fasta); -fréquence des aa dune séquence protéique ; -fréquence des bases dune séquence nucléotidique ; -taille dune séquence ; -tailles des séquence dune base ; -fréquence des codons dune séquence codante ; -Codons volatilité :. correspondance codons/amino-acids ;

Download ppt "Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China."

Similar presentations