' to redirect the output to a file: $exitValue = system("blastall.exe... > out.blast"); If you want to capture the output use “back-ticks” (left of the “1” key on your = `blastall.exe...`; In this case the output of blast is stored in the array. Running programs from a script"> ' to redirect the output to a file: $exitValue = system("blastall.exe... > out.blast"); If you want to capture the output use “back-ticks” (left of the “1” key on your = `blastall.exe...`; In this case the output of blast is stored in the array. Running programs from a script">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sup.1 Supplemental Material (NOT part of the material for the exam)

Similar presentations


Presentation on theme: "Sup.1 Supplemental Material (NOT part of the material for the exam)"— Presentation transcript:

1 Sup.1 Supplemental Material (NOT part of the material for the exam)

2 Sup.2 More about Blast

3 Sup.3 You may run programs using the system function: $exitValue = system("blastall.exe..."); if ($exitValue!=0) {die "blast failed!";} This way the output of blast will be seen on the screen. You can use ' > ' to redirect the output to a file: $exitValue = system("blastall.exe... > out.blast"); If you want to capture the output use “back-ticks” (left of the “1” key on your keyboard): @blastOutput = `blastall.exe...`; In this case the output of blast is stored in the array. Running programs from a script

4 Sup.4 1. You could install blast on your computer from: ftp.ncbi.nlm.nih.gov ftp.ncbi.nlm.nih.gov There go to the directory: blast/executables/release/ The current version can be downloaded here: ftp://ftp.ncbi.nih.gov/blast/executables/release/2.2.21/blast-2.2.21-ia32-win32.exe some non-official, but useful help can be found in: http://www.people.vcu.edu/~elhaij/IntroBioinf/Links/RunLocalBlast.html And the official help is here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/ Running a local blast

5 Sup.5 You can also work on the Unix servers of the bioinformatics unit you can use local blast that is already installed there. Genbank databases that are installed there can be used for blast and for any other work, such as getting a sequence by its accession. Running a local blast

6 Sup.6 Advanced Sorting

7 Sup.7 sub reverseComplement { my ($seq) = @_; $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG"); CACTGC A subroutine receives its arguments through @_ and may return a scalar or a list value: Subroutine revision

8 Sup.8 If we want to pass arrays or hashes to a subroutine, we must pass a reference: %gene = ("protein_id" => "E4a", "strand" => "-", "CDS" => [126,523]); printGeneInfo(\%gene); sub printGeneInfo { my ($geneRef) = @_; print "Protein $geneRef->{'protein_id'}\n"; print "Strand $geneRef->{'strand'}\n"; print "From: $geneRef->{'CDS'}[0] "; print "to: $geneRef->{'CDS'}[1]\n"; } Passing variables by reference

9 Sup.9 We learned the default sort, which is lexicographic: print sort("Yossi","Bracha","Moshe"); Bracha Moshe Yossi print sort(8,3,45,8.5); 3 45 8 8.5 To sort by a different order rule we need to give a comparison subroutine – a subroutine that compares two scalars and says which comes first sort COMPARE_SUB (LIST); Advanced sorting no comma here

10 Sup.10 sort COMPARE_SUB (LIST); COMPARE_SUB is a special subroutine that compares two scalars $a and $b, and says which comes first (by returning 1, 0 or -1). For example: sub compareNumber { if ($a > $b){return 1;} elsif ($a == $b){return 0;} else{return -1;} } print sort compareNumber (8,3,45,8.5); 3 8 8.5 45 Sorting numbers no comma here

11 Sup.11 The operator does exactly that – it returns 1 for “greater than”, 0 for “equal” and -1 for “less than”: sub compareNumber { return $a $b; } print sort compareNumber (8,3,45,8.5); For easier use, you can use a temporary subroutine definition in the same line: print sort {return $a $b;} (8,3,45,8.5); or just: print sort {$a $b;} (8,3,45,8.5); The operator

12 Sup.12 Now we can also sort complex data: @sortedGenes = sort compareGene @genes; sub compareGenes { return $a->{"CDS"}[0] $b->{"CDS"}[0]; } @genes {protein_id => PROTEIN_ID strand => STRAND CDS => [START, END]}

13 Sup.13 Now we can also sort complex data: @sortedGenes = sort compareGene @genes; sub compareGenes { if ($a->{"CDS"}[0] != $b->{"CDS"}[0]) { return $a->{"CDS"}[0] $b->{"CDS"}[0]; } else { return $a->{"CDS"}[1] $b->{"CDS"}[1]; } } @genes {protein_id => PROTEIN_ID strand => STRAND CDS => [START, END]}

14 Sup.14 The returns 1, 0 or -1 if the first value is greater, equal or lesser then the second numerically, respectively. The equivalent alphabetical operator is cmp. It returns 1, 0 or -1 if the first value is greater, equal or lesser then the second alphabetical. sort @arr; Is equivalent to: sort {$a cmp $b} @arr; The operator cmp

15 Sup.15 Class exercise 12 Write scripts that read an input file with the following data, sort them and print them in a sorted order to the screen: 1. Sort a file of grades and names, according to the grades (e.g. grades.txt from the course website). 2. Sort a file where each line is a date. e.g. 24/7/2003 (e.g. dates.txt). 3. Sort the proteins in the file from ex. 9.1 by their lengths (create an array of keys sorted by the protein lengths). 4.* From the home exercise 4: Sort the CDSs from the adeno genome file: - First by the number of the exons - Then by the length of the CDS (without the introns!) e.g. E1B 55K (1 exon, 1449bp) comes before E1A (2 exons, 801), but after E1B 19K (1 exon, 492bp). Use an array of gene hashes as in class ex. 10, and an appropriate comparison subroutine. Print the sorted protein IDs with their number of exons and lengths of CDS.


Download ppt "Sup.1 Supplemental Material (NOT part of the material for the exam)"

Similar presentations


Ads by Google