# Lecture 6 More advanced Perl…. Substitute Like s/// function in vi: #cut with EcoRI and chew back \$linker = “GGCCAATTGGAAT”; \$linker =~ s/CAATTG/CG/g;

## Presentation on theme: "Lecture 6 More advanced Perl…. Substitute Like s/// function in vi: #cut with EcoRI and chew back \$linker = “GGCCAATTGGAAT”; \$linker =~ s/CAATTG/CG/g;"— Presentation transcript:

Substitute Like s/// function in vi: #cut with EcoRI and chew back \$linker = “GGCCAATTGGAAT”; \$linker =~ s/CAATTG/CG/g; Or… to get number of sites also: \$ecosites = (\$linker =~ s/CAATTG/CG/g);

Reverse and Translate my \$DNA = ‘CCGTAA’; \$DNA =~ tr/ACGT/TGCA/; print “\$DNA”; \$DNA = reverse \$DNA; print “ \$DNA\n”; Could have been made with DNA in mind…

Also, quick way to calc. GC% my \$DNA = ‘CCGTAA’; my \$gc = (\$DNA =~ tr/CG/GC/); my \$at = (\$DNA =~ tr/AT/TA/); print ((\$gc/(\$gc+\$at))*100); \$DNA = reverse \$DNA; print "\n\$DNA\n";

push, shift, unshift and pop @DNA = ( “A", “C", “G", “T" ); Add “A” to the END of @DNA push( @DNA, “A" ); Remove “A” (or whatever is there) from the END of @DNA \$end = pop( @DNA ); Add “T" to the START of @DNA unshift( @DNA, “T" ); Remove “T” (or whatever is there) from the START of @DNA \$a = shift( @DNA );

Arguments Arguments are data given to a function in UNIX or Perl, e.g. [matt@mrmarsh]\$hmmpfam myprotein.fas Pfam.ls You can get data into a Perl script with arguments [matt@mrmarsh]\$myscript.pl myprotein.fas

Arguments, cont. The arguments end up in a special array called @ARGV: my \$file = (shift @ARGV); open FILE, “\$file”;

Arguments, cont. You can put as many arguments as you like in @ARGV – the number can be arbitrary my @data; foreach my \$file (@ARGV){ open FILE, “\$file”; while { push \$_, @data; }

Index Returns the position of a match: Takes three arguments: index string, substring, offset \$linker = “GGCCAATTGGAAT”; while ((\$pos = index (\$linker, “CAATTG”, \$pos)) > -1) { print “EcoRI at \$pos\n”; \$pos ++; }

Bioperl Bioperl is a HUGE set of ready-made perl programs that do almost all the jobs you need for bioinformatics. Examples – recover DNA sequence from a website, translate DNA to protein, read GenBank files, convert to FASTA, parse BLAST output files into spreadsheets…

Bioperl, cont. Unfortunately, there is a downside. Bioperl is extremely complex and very difficult to use. Also, the code is only as good as the people who wrote it. Still, it can save you an awful lot of time. But in order to use it, you need to learn the Perl syntax for objects and references

Object syntax Perl can be used as an object oriented programming language, although this isn’t enforced (as with Java or C++). Bioperl is an object oriented set of modules You pass Bioperl either a function call or a reference. It will return an object. You need to know what to do with the object when you get it, or Bioperl isn’t much use.

References and dereferences Hashes, arrays and also variables can be big – don’t always want to duplicate them Can pass a reference to these structures to another function. This is a variable that tells the code where to find a variable, hash or array without duplicating it. my \$reference = \\$DNA; my \$data = \${\$DNA}; objects can also be dereferenced by the dereference operator: ->

References and dereferences Magically, because a reference is a variable, you can fill a hash or an array with references. This is very useful for spreadsheet-type matrix data: my @data; #just a normal array open FILE, \$spreadsheet or die \$!; while ( ) { my @line = split “\t”, \$_; push @data, \@line; }

References and dereferences And then, to get data back, just dereference: foreach (@data) { foreach (@{\$_}){ print "\$_\t"; }

References and dereferences Also, you can make a hash of arrays: my %fileshash; foreach my \$file (@filelist) { open FILE, \$file or die \$!; my @lines = ( ); close FILE; \$fileshash{\$file} = \@lines; }

References and dereferences And then, you can get back any line of any file: foreach (keys %fileshash){ print join “\n”, @{\$fileshash{\$_}}; } #or my \$file = \$ARGV[0]; my \$line = \$ARGV[1]; print \${\$fileshash{\$file}}[\$line]

References and dereferences And of course you can also make an array of hashes: foreach my \$file (@filelist) { open FILE, \$file or die \$!; my %quesandans; while ( ) { /([^t]+)\t(.+)/; \$quesandans{\$1} = \$2; } push @hashrefarray, \%quesandans; }

References and dereferences And get the data back in a similar way: foreach (@hashrefarray) { print “answer for \$question is “; print \${\$_{\$question}}; }

References and dereferences This, of course, leads to a very flexible and powerful set of data structures, since you can go as deep as you like: Hashes of hashes of arrays Arrays of hashes of hashes of hashes etc. When they get this complicated, the dereference notation -> starts to get useful.

Bioperl: Example Open a FASTA format sequence file: use Bio::Perl; use strict; my \$file = \$ARGV[0]; die "give me a sequence filename!\n" unless \$file; my @seq_object_array = read_all_sequences(\$file,'fasta'); The read_all_sequences function returns all the sequences in the fasta file as an array of object references

Bioperl: Example The sequences from the file are now all in a long string, which can be accessed by dereferencing foreach my \$object (@seq_object_array) { my \$sequence = uc (\$object->seq()); my \$name = \$object->display_id; my \$pos =0; while ((my \$pos = index (\$sequence, “CAATTG”, \$pos)) > -1) { print “EcoRI at \$pos of \$name\n”; \$pos ++; }

More Bioperl I could spend a whole semester on Bioperl, but I won’t. You are going to have to figure it out for yourselves if you need it. perldoc bioperl I recommend going through the example script bptutorial.pl. I have downloaded and put this in your home directories.

That’s it for now The more you know, the more there is to learn The only way to really learn this stuff is to write programs You need to get cracking with some programming projects in class!

Similar presentations