Presentation is loading. Please wait.

Presentation is loading. Please wait.

4.1 More loops. 4.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) { $num.

Similar presentations


Presentation on theme: "4.1 More loops. 4.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) { $num."— Presentation transcript:

1 4.1 More loops

2 4.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) { $num = ; } print "correct!\n"; my @names = ; chomp(@names); my $name; foreach $name (@names) { print "Hello $name!\n"; }

3 4.3 Loops: for The for loop is controlled by three statements: 1 st is executed before the first iteration 2 nd is the stop condition 3 rd is executed before every re-iteration for (my $i=0; $i<10; $i++) { print "$i\n"; } my $i=0; while ($i<10){ print "$i\n"; $i++; } These are equivalent

4 4.4 Breaking out of loops next – skip to the next iteration last – skip out of the loop my @lines = ; foreach $line (@lines) { if (substr($line,0,1) eq ">") { next; } if (substr($line,0,8) eq "**stop**") { last; } print $line; }

5 4.5 Breaking out of loops die – end the program and print an error message to the standard error if ($score < 0) { die "score must be positive"; } score must be positive at test.pl line 8. Note: if you end the string with a "\n" then only your message will be printed * warn does the same thing as die without ending the program

6 4.6 The Programming Process

7 4.7 The programming process It pays to plan ahead before writing a computer program: 1.Define the purpose of the program 2.Identify the required inputs 3.Decide how to present the outputs 4.Make an overall design of the program 5.Refine the design, specify more details 6.Write the code – one stage at a time and test each stage 7.Debug …

8 4.8 An example: SAGE libraries 1.Double-stranded cDNA is generated from cell extracts 2.The cDNA is cleaved with a restriction enzyme (NlaIII) 3.The most 3'-end of the cDNA is then collected by their poly-A 4.The fragments are ligated to linkers containing a recognition site for a type IIS restriction enzyme and a PCR primer site 5.This restriction enzyme cuts 15bp away from its recognition site 6.Ligation, PCR, cleavage, concatenation, cloning, sequencing …  A 10bp tag sequence from each mRNA 7.10bp sequences are searched in an mRNA database and the corresponding genes are identified SAGE (Serial Analysis of Gene Expression) is used to identify all transcripts that are expressed in a tissue: (1) (2&3) (4&5)

9 4.9 An example: SAGE libraries SAGE (Serial Analysis of Gene Expression) is used to identify all transcripts that are expressed in a tissue:

10 4.10 Predicting the SAGE tag of an mRNA It would be useful to know what tag to expect for each mRNA in the database. So lets write a script: 1. Purpose: To predict the 10bp sequence of the SAGE tag of a given mRNA 2. Inputs: A list of mRNA sequences in FASTA format >gi|24646380|ref|NM_079608.2| Mus musculus EH-domain containing 4 (EHD4), mRNA GTGGTATTTCTTCGTTGTCTCTGGCGTGGTCACGTTGATTGGTCCGCTATCTGGACCGAAAAAAGTCGTA...... GTCGACGGCGATGGGTTCCTGGACTCTGACGAGTTCGCGCTGGCCTTGCACTTAATCAACGTCAAGCTGG AAGGCTGCGAGCTGCCCACCGTGCTGCCGGAGCACTTAGTACCGCCGTCGAAGCGCTATGACTAGTGTCC TGTAGCATACGCATACGCACACTAGATCACACAGCCTCACAATTCCCAAAAAAAAAAAAAAAA >gi|71895640|ref|NM_001031040.1| Mus musculus EH-domain containing 3 (EHD3), mRNA GGTAGGGCGCTACCGCCTCCGCCCGCCTCTCGCGCTGTTCCTCCGCGGTATGCCCGCGCCGGCAGCCGGC...... TATTATATAGAGAAATATATTGTGTATGTAGGATGTGCTTATTGCATTACATTTATCACTTGTCTTAACT AGAATGCATTAACCTTTTTTGTACCCTGGTCCTAAAACATTATTAAAAAGAAAGGCTAAAAAAAAAAAAA AAAA >gi|55742710|ref|NM_153068.2| Mus musculus EH-domain containing 2 (Ehd2), mRNA TGAGGGGGCCTGGGGCCCGCCCTGCTCGCCGCTCCTAGCGCACGCGGCCCCACCCGTCTCACTCCACTGC......

11 4.11 3. Decide how to present the results Simply print the header line of each mRNA and then it ’ s predicted 10bp tag, like so: > gi|24646380|ref|NM_079608.2| Mus musculus EH-domain containing 4 (EHD4), mRNA ATCACACAGC >gi|71895640|ref|NM_001031040.1| Mus musculus EH-domain containing 3 (EHD3), mRNA AATGCATTAA...

12 4.12 4. Overall design: 1.For each mRNA in the input: 1.Read the sequence 2.Find the most downstream recognition site of NlaII (CTAG) 3.Get the 10bp tag after that site 4.Print it

13 4.13 Read sequence Find most downstream CTAG Get the 10bp tag Print the tag End of input? No End Start Flow diagram:

14 4.14 5.Refine the design, specify more details: 1.For each mRNA in the input (use a loop): 1.Read the sequence 1.Store its header line in one string variable 2.Concatenate all lines of the sequence and store it in another string variable 2.Find the most downstream recognition site of NlaII (CTAG) 1.Go over the sequence with a loop, starting from the 3 ’ tail, and going back until the first CTAG is found 3.Get the 10bp tag after that site 1.Take a substr of length 10 4.Print it 6.Write the code

15 4.15 Read sequence Find most downstream CTAG Get the 10bp tag Print the tag End of input? No End Start Save header Read line Header? Yes Concatenate to sequence No Read line

16 4.16 Start pos. at end of sequence Check pos. for “ CTAG ” “ CTAG ” at pos? pos-- Yes Read sequence Find most downstream CTAG Get the 10bp tag Print the tag End of input? No End Start Pos < 0?

17 4.17 Start pos. at end of sequence Check pos. for “ CTAG ” “ CTAG ” at pos? pos-- Yes Pos < 0? Yes Find most downstream CTAG Print “ no tag ”

18 4.18 Start pos. at end of sequence Check pos. for “ CTAG ” “ CTAG ” at pos? pos-- Yes Pos < 0? Yes Pos < 0? YesNo Print tag Print “ no tag ” Find most downstream CTAG

19 4.19 FASTA: Analyzing complex input Overall design: 1.Read the sequence 2.Do something Let’s see how it’s done… Do something End of input? No End Start Save header Read line Header? Yes Concatenate to sequence No Read line

20 4.20 $line = ; my $endOfInput = 0; while ($endOfInput==0) { # 1.1. Read sequence name from FASTA header if (substr($line,0,1) eq ">") { $name = substr($line,1); } else... # 1.2. Read sequence until next FASTA header $seq = ""; $line = ; while (substr($line,0,1) ne ">") { $seq = $seq. $line; $line = ; if (!defined($line)) { $endOfInput = 1; last; } # 2. Do something... } Do something End of input? No End Start Save header Read line Header? Yes Concatenate to sequence No Read line

21 4.21 ################################### # 1. Foreach sequence in the input my (@lines, $line, $name, $seq); $line = ; chomp $line; my $endOfInput = 0; while ($endOfInput==0) { ################################ # 1.1. Read sequence name from FASTA header if (substr($line,0,1) eq ">") { $name = substr($line,1); } else { die "bad FASTA format"; } # 1.2. Read sequence until next FASTA header $seq = ""; $line = ; chomp $line; # Read until next header or end of input while (substr($line,0,1) ne ">") { $seq = $seq. $line; $line = ; if (!defined($line)) { $endOfInput = 1; last; } chomp $line; } ################################ # 2. Do something... } Do something End of input? No End Start Save header Read line Header? Yes Concatenate to sequence No Read line

22 4.22 FASTA: An alternative approach (which is more confusing and generally not recommended!) my @fasta = ; my $oneline = join("", @fasta); # Concatenate all lines for ($i=0; $i<length($oneline); $i++) { my $c = substr($oneline,$i,1); my $sub10 = substr($oneline,$i,10); if ($c eq ">") { # Save header start position $start = ($i+1); } if ($c eq "]") { # Save header end position $end = $i; } if(???) { # If we found what we were looking for... # Print last header $name = substr($oneline,$start,$end-$start+1); }


Download ppt "4.1 More loops. 4.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) { $num."

Similar presentations


Ads by Google