4.1 More loops. 4.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) { $num.

Slides:



Advertisements
Similar presentations
Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
Advertisements

Computer programming Lecture 3. Lecture 3: Outline Program Looping [Kochan – chap.5] –The for Statement –Relational Operators –Nested for Loops –Increment.
Perl for Bioinformatics Lecture 4. Variables - review A variable name starts with a $ It contains a number or a text string Use my to define a variable.
4.1 Controls: Ifs and Loops. 4.2 Controls: if ? Controls allow non-sequential execution of commands, and responding to different conditions else { print.
4ex.1 More loops. 4ex.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) {
1 Controlling Script Flow! David Lash Chapter 3 Conditional Statements.
6. More on the For-Loop Using the Count Variable Developing For-Loop Solutions.
5.1 Previously on... PERL course (let ’ s practice some more loops)
Algorithm Design CS105. Problem Solving Algorithm: set of unambiguous instructions to solve a problem – Breaking down a problem into a set of sub- problems.
Gene Expression Analysis by SAGE. Gene Expression Some challenges: –Large number of genes How do you keep samples and equipment small and affordable?
4.1 Revision. 4.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n"; my $number.
5.1 Revision: Ifs and Loops. 5.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n";
3.1 Ifs and Loops. 3.2 Revision: variables Scalar variables can store scalar values: Variable declaration my ($priority); Numerical assignment $priority.
2.1 Lists and Arrays Summary of 1 st lesson Single quoted and double quoted strings Backslash ( \ ) – the escape character: \t \n Operators:
3ex.1 Note: use strict on the first line Because of a bug in the Perl Express debugger you have to put “use strict;” on the first line of your scripts.
Gene Expression Analysis by SAGE and MPSS Amanda Sitterly.
Computer Science 111 Fundamentals of Programming I Iteration with the for Loop.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Python programs How can I run a program? Input and output.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"
1 Operating Systems Lecture 3 Shell Scripts. 2 Brief review of unix1.txt n Glob Construct (metacharacters) and other special characters F ?, *, [] F Ex.
* What kind of loop would I use to complete the following: A. Output all of the prime numbers that are less than 100,000 B. Output the Fibonacci sequence.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
COMPUTER PROGRAMMING. Control Structures A program is usually not limited to a linear sequence of instructions. During its process it may repeat code.
19.1 Techniques of Molecular Genetics Have Revolutionized Biology
Slide 3-1 CHAPTER 3 Conditional Statements Objectives To learn to use conditional test statements to compare numerical and string data values To learn.
© 2006 Pearson Education 1 More Operators  To round out our knowledge of Java operators, let's examine a few more  In particular, we will examine the.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Iteration While / until/ for loop. While/ Do-while loops Iteration continues until condition is false: 3 important points to remember: 1.Initialise condition.
PROGRAM DEVELOPMENT CYCLE. Problem Statement: Problem Statement help diagnose the situation so that your focus is on the problem, helpful tools at this.
8 th Semester, Batch 2008 Department Of Computer Science SSUET.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
CPTG286K Programming - Perl Chapter 1: A Stroll Through Perl Instructor: Denny Lin.
Chapter 8: MuPAD Programming I Conditional Control and Loops MATLAB for Scientist and Engineers Using Symbolic Toolbox.
Department of Electrical and Computer Engineering Introduction to Perl By Hector M Lugo-Cordero August 26, 2008.
8.1 Common Errors – Exercise #3 Assuming something on the variable part of the input file. When parsing a format file (genebank, fasta or any other format),
5.1 Revision: Ifs and Loops. 5.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n";
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
Loops and Files. 5.1 The Increment and Decrement Operators.
中国免疫学信息网 SAGE 的原理及其应用 新乡医学院免疫学研究中心 王 辉.
SAP DEVELOPMENT BASICS Bohuslav Tesar. TRAINING OVERVIEW Amazing life of ABAP developer ;) SAP introduction ABAP basics ABAP Reporting.
EGR 115 Introduction to Computing for Engineers Branching & Program Design – Part 3 Friday 03 Oct 2014 EGR 115 Introduction to Computing for Engineers.
Computer Programming for Biologists Class 4 Nov 14 th, 2014 Karsten Hokamp
Chapter Looping 5. The Increment and Decrement Operators 5.1.
Assigning Values 1. $ set One Two Three [Enter] $echo $1 $2 $3 [Enter] 2. $set `date` [Enter] $echo $1 $2 $3 [Enter] 3. $echo $1 $2 $3 $4 $5 $6 [Enter]
PERL By C. Shing ITEC Dept Radford University. Objectives Understand the history Understand constants and variables Understand operators Understand control.
Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Ligate tags SAGE: Procedure Digest with “Tagging enzyme” BsmFI tm Isolate mRNA, RT to cDNA Digest with “Anchoring.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
CMSC 104, Section 301, Fall Lecture 18, 11/11/02 Functions, Part 1 of 3 Topics Using Predefined Functions Programmer-Defined Functions Using Input.
Computer Program Flow Control structures determine the order of instruction execution: 1. sequential, where instructions are executed in order 2. conditional,
Loops ( while and for ) CSE 1310 – Introduction to Computers and Programming Alexandra Stefan 1.
PH2150 Scientific Computing Skills Control Structures in Python In general, statements are executed sequentially, top to bottom. There are many instances.
More about Iteration Victor Norman CS104. Reading Quiz.
Teaching Materials by Ivan Ovcharenko
Chapter 5 - Control Structures: Part 2
Basic operators - strings
ITM 352 Flow-Control: Loops
Perl for Bioinformatics
Control Structures (Structured Programming) for controlling the procedural aspects of programming CS1110 – Kaminski.
Step 1: amplification and cloning procedures
Control Structures: for & while Loops
Digital Gene Expression – Tag Profiling Sample Preparation
Language Constructs Construct means to build or put together. Language constructs refers to those parts which make up a high level programming language.
Introduction to Computer Science
Control Structures (Structured Programming) for controlling the procedural aspects of programming CS1110 – Kaminski.
Control Structures.
Presentation transcript:

4.1 More loops

4.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) { $num = ; } print "correct!\n"; = ; my $name; foreach $name { print "Hello $name!\n"; }

4.3 Loops: for The for loop is controlled by three statements: 1 st is executed before the first iteration 2 nd is the stop condition 3 rd is executed before every re-iteration for (my $i=0; $i<10; $i++) { print "$i\n"; } my $i=0; while ($i<10){ print "$i\n"; $i++; } These are equivalent

4.4 Breaking out of loops next – skip to the next iteration last – skip out of the loop = ; foreach $line { if (substr($line,0,1) eq ">") { next; } if (substr($line,0,8) eq "**stop**") { last; } print $line; }

4.5 Breaking out of loops die – end the program and print an error message to the standard error if ($score < 0) { die "score must be positive"; } score must be positive at test.pl line 8. Note: if you end the string with a "\n" then only your message will be printed * warn does the same thing as die without ending the program

4.6 The Programming Process

4.7 The programming process It pays to plan ahead before writing a computer program: 1.Define the purpose of the program 2.Identify the required inputs 3.Decide how to present the outputs 4.Make an overall design of the program 5.Refine the design, specify more details 6.Write the code – one stage at a time and test each stage 7.Debug …

4.8 An example: SAGE libraries 1.Double-stranded cDNA is generated from cell extracts 2.The cDNA is cleaved with a restriction enzyme (NlaIII) 3.The most 3'-end of the cDNA is then collected by their poly-A 4.The fragments are ligated to linkers containing a recognition site for a type IIS restriction enzyme and a PCR primer site 5.This restriction enzyme cuts 15bp away from its recognition site 6.Ligation, PCR, cleavage, concatenation, cloning, sequencing …  A 10bp tag sequence from each mRNA 7.10bp sequences are searched in an mRNA database and the corresponding genes are identified SAGE (Serial Analysis of Gene Expression) is used to identify all transcripts that are expressed in a tissue: (1) (2&3) (4&5)

4.9 An example: SAGE libraries SAGE (Serial Analysis of Gene Expression) is used to identify all transcripts that are expressed in a tissue:

4.10 Predicting the SAGE tag of an mRNA It would be useful to know what tag to expect for each mRNA in the database. So lets write a script: 1. Purpose: To predict the 10bp sequence of the SAGE tag of a given mRNA 2. Inputs: A list of mRNA sequences in FASTA format >gi| |ref|NM_ | Mus musculus EH-domain containing 4 (EHD4), mRNA GTGGTATTTCTTCGTTGTCTCTGGCGTGGTCACGTTGATTGGTCCGCTATCTGGACCGAAAAAAGTCGTA GTCGACGGCGATGGGTTCCTGGACTCTGACGAGTTCGCGCTGGCCTTGCACTTAATCAACGTCAAGCTGG AAGGCTGCGAGCTGCCCACCGTGCTGCCGGAGCACTTAGTACCGCCGTCGAAGCGCTATGACTAGTGTCC TGTAGCATACGCATACGCACACTAGATCACACAGCCTCACAATTCCCAAAAAAAAAAAAAAAA >gi| |ref|NM_ | Mus musculus EH-domain containing 3 (EHD3), mRNA GGTAGGGCGCTACCGCCTCCGCCCGCCTCTCGCGCTGTTCCTCCGCGGTATGCCCGCGCCGGCAGCCGGC TATTATATAGAGAAATATATTGTGTATGTAGGATGTGCTTATTGCATTACATTTATCACTTGTCTTAACT AGAATGCATTAACCTTTTTTGTACCCTGGTCCTAAAACATTATTAAAAAGAAAGGCTAAAAAAAAAAAAA AAAA >gi| |ref|NM_ | Mus musculus EH-domain containing 2 (Ehd2), mRNA TGAGGGGGCCTGGGGCCCGCCCTGCTCGCCGCTCCTAGCGCACGCGGCCCCACCCGTCTCACTCCACTGC......

Decide how to present the results Simply print the header line of each mRNA and then it ’ s predicted 10bp tag, like so: > gi| |ref|NM_ | Mus musculus EH-domain containing 4 (EHD4), mRNA ATCACACAGC >gi| |ref|NM_ | Mus musculus EH-domain containing 3 (EHD3), mRNA AATGCATTAA...

Overall design: 1.For each mRNA in the input: 1.Read the sequence 2.Find the most downstream recognition site of NlaII (CTAG) 3.Get the 10bp tag after that site 4.Print it

4.13 Read sequence Find most downstream CTAG Get the 10bp tag Print the tag End of input? No End Start Flow diagram:

Refine the design, specify more details: 1.For each mRNA in the input (use a loop): 1.Read the sequence 1.Store its header line in one string variable 2.Concatenate all lines of the sequence and store it in another string variable 2.Find the most downstream recognition site of NlaII (CTAG) 1.Go over the sequence with a loop, starting from the 3 ’ tail, and going back until the first CTAG is found 3.Get the 10bp tag after that site 1.Take a substr of length 10 4.Print it 6.Write the code

4.15 Read sequence Find most downstream CTAG Get the 10bp tag Print the tag End of input? No End Start Save header Read line Header? Yes Concatenate to sequence No Read line

4.16 Start pos. at end of sequence Check pos. for “ CTAG ” “ CTAG ” at pos? pos-- Yes Read sequence Find most downstream CTAG Get the 10bp tag Print the tag End of input? No End Start Pos < 0?

4.17 Start pos. at end of sequence Check pos. for “ CTAG ” “ CTAG ” at pos? pos-- Yes Pos < 0? Yes Find most downstream CTAG Print “ no tag ”

4.18 Start pos. at end of sequence Check pos. for “ CTAG ” “ CTAG ” at pos? pos-- Yes Pos < 0? Yes Pos < 0? YesNo Print tag Print “ no tag ” Find most downstream CTAG

4.19 FASTA: Analyzing complex input Overall design: 1.Read the sequence 2.Do something Let’s see how it’s done… Do something End of input? No End Start Save header Read line Header? Yes Concatenate to sequence No Read line

4.20 $line = ; my $endOfInput = 0; while ($endOfInput==0) { # 1.1. Read sequence name from FASTA header if (substr($line,0,1) eq ">") { $name = substr($line,1); } else... # 1.2. Read sequence until next FASTA header $seq = ""; $line = ; while (substr($line,0,1) ne ">") { $seq = $seq. $line; $line = ; if (!defined($line)) { $endOfInput = 1; last; } # 2. Do something... } Do something End of input? No End Start Save header Read line Header? Yes Concatenate to sequence No Read line

4.21 ################################### # 1. Foreach sequence in the input my $line, $name, $seq); $line = ; chomp $line; my $endOfInput = 0; while ($endOfInput==0) { ################################ # 1.1. Read sequence name from FASTA header if (substr($line,0,1) eq ">") { $name = substr($line,1); } else { die "bad FASTA format"; } # 1.2. Read sequence until next FASTA header $seq = ""; $line = ; chomp $line; # Read until next header or end of input while (substr($line,0,1) ne ">") { $seq = $seq. $line; $line = ; if (!defined($line)) { $endOfInput = 1; last; } chomp $line; } ################################ # 2. Do something... } Do something End of input? No End Start Save header Read line Header? Yes Concatenate to sequence No Read line

4.22 FASTA: An alternative approach (which is more confusing and generally not recommended!) = ; my $oneline = # Concatenate all lines for ($i=0; $i<length($oneline); $i++) { my $c = substr($oneline,$i,1); my $sub10 = substr($oneline,$i,10); if ($c eq ">") { # Save header start position $start = ($i+1); } if ($c eq "]") { # Save header end position $end = $i; } if(???) { # If we found what we were looking for... # Print last header $name = substr($oneline,$start,$end-$start+1); }