Presentation is loading. Please wait.

Presentation is loading. Please wait.

BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment

Similar presentations


Presentation on theme: "BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment"— Presentation transcript:

1 BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
#7 Still more DP BCB 444/544 9/5/07 Lecture 7 Still more: Dynamic Programming Global vs Local Alignment Scoring Matrices & Alignment Statistics BLAST #7_Sept5 BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

2 Required Reading (before lecture)
#7 Still more DP Required Reading (before lecture) 9/5/07 √Last week: - for Lectures 4-7 Pairwise Sequence Alignment, Dynamic Programming, Global vs Local Alignment, Scoring Matrices, Statistics Xiong: Chp 3 Eddy: What is Dynamic Programming? 2004 Nature Biotechnol 22:909 Wed Sept 5 - for Lecture 7 & Lab 3 Database Similarity Searching: BLAST (more DP!!) Chp 4 - pp 51-62 Fri Sept - for Lecture 8 BLAST variations; BLAST vs FASTA BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

3 Assignments & Announcements
#7 Still more DP Assignments & Announcements 9/5/07 √Tues Sept 4 - Lab #2 Exercise Writeup due by 5 PM Send via to Pete Zaback (For now, no late penalty - just send ASAP) √Wed Sept 5 - Notes for Lecture 5 posted online - HW#2 posted online & sent via & handed out in class Fri Sept HW#2 Due by 5 PM Fri Sept Exam #1 BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

4 Chp 3- Sequence Alignment
#7 Still more DP 9/5/07 Chp 3- Sequence Alignment SECTION II SEQUENCE ALIGNMENT Xiong: Chp 3 Pairwise Sequence Alignment √Evolutionary Basis √Sequence Homology versus Sequence Similarity √Sequence Similarity versus Sequence Identity Methods - cont Scoring Matrices Statistical Significance of Sequence Alignment Adapted from Brown and Caragea, 2007, with some slides from: Altman, Fernandez-Baca, Batzoglou, Craven, Hunter, Page. BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

5 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
Methods 9/5/07 √Global and Local Alignment √Alignment Algorithms √Dot Matrix Method Dynamic Programming Method - cont Gap penalities DP for Global Alignment DP for Local Alignment Scoring Matrices Amino acid scoring matrices PAM BLOSUM Comparisons between PAM & BLOSUM Statistical Significance of Sequence Alignment BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

6 Global vs Local Alignment
#7 Still more DP 9/5/07 Global vs Local Alignment Global alignment Finds best possible alignment across entire length of 2 sequences Aligned sequences assumed to be generally similar over entire length Local alignment Finds local regions with highest similarity between 2 sequences Aligns these without regard for rest of sequence Sequences are not assumed to be similar over entire length BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

7 Global vs Local Alignment - example
#7 Still more DP 9/5/07 Global vs Local Alignment - example 1 = CTGTCGCTGCACG 2 = TGCCGTG CTGTCGCTGCACG -TGCCG-T----G Global alignment -TG-C-C-G--TG CTGTCGCTGCACG -TGCCG-TG---- Local alignment Which is better? BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

8 Global vs Local Alignment Which should be used when?
#7 Still more DP 9/5/07 Global vs Local Alignment Which should be used when? It is critical to choose correct method! Global Alignment vs Local Alignment? Shout out the answers!! Which should we use for? Searching for conserved motifs in DNA or protein sequences? Aligning two closely related sequences with similar lengths? Aligning highly divergent sequences? Generating an extended alignment of closely related sequences? Generating an extended alignment of closely related sequences with very different lengths? Hmmm - we'll work on that Excellent! BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

9 Global vs Local Alignment Which should be used when?
#7 Still more DP 9/5/07 Global vs Local Alignment Which should be used when? It is critical to choose correct method! Global Alignment vs Local Alignment? Shout out the answers!! Which should we use for? Searching for conserved motifs in DNA or protein sequences? Local Aligning two closely related sequences with similar lengths? Global Aligning highly divergent sequences? Local (at least initially) Generating an extended alignment of closely related sequences? Global Generating an extended alignment of closely related sequences with very different lengths? Hmmm - we'll work on that BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

10 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
9/5/07 Alignment Algorithms 3 major methods for pairwise sequence alignment: Dot matrix analysis √ - practice in HW2 Dynamic programming - more today & in HW2 Word or k-tuple methods (later, in Chp 4) BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

11 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
Dynamic Programming 9/5/07 For Pairwise sequence alignment Idea: Display one sequence above another with spaces inserted in both to reveal similarity C A T - T C A - C | | | | | C - T C G C A G C BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

12 Global Alignment: Scoring
#7 Still more DP Global Alignment: Scoring 9/5/07 CTGTCG-CTGCACG -TGC-CG-TG---- Reward for matches:  Mismatch penalty:  Space/gap penalty:  Score = w – x - y w = #matches x = #mismatches y = #spaces Note: I changed symbols & colors on this slide! BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

13 Global Alignment: Scoring
#7 Still more DP 9/5/07 Global Alignment: Scoring Reward for matches: 10 Mismatch penalty: -2 Space/gap penalty: -5 C T G T C G – C T G C - T G C – C G – T G - Total = 11 Note: I changed symbols & colors on this slide! We could have done better!! BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

14 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
Alignment Algorithms 9/5/07 Global: Needleman-Wunsch Local: Smith-Waterman Both NW and SW use dynamic programming Variations: Gap penalty functions Scoring matrices BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

15 Dynamic Programming - Key Idea:
#7 Still more DP Dynamic Programming - Key Idea: 9/5/07 The score of the best possible alignment that ends at a given pair of positions (i, j) is equal to: the score of best alignment ending just previous to those two positions (i.e., ending at i-1, j-1) PLUS the score for aligning xi and yj BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

16 Global Alignment: DP Problem Formulation & Notations
#7 Still more DP Global Alignment: DP Problem Formulation & Notations 9/5/07 Given two sequences (strings) X = x1x2…xN of length N x = AGC N = 3 Y = y1y2…yM of length M y = AAAC M = 4 Construct a matrix with (N+1) x (M+1) elements, where S(i,j) = Score of best alignment of x[1..i]=x1x2…xi with y[1..j]=y1y2…yj S(2,3) = score of best alignment of AG (x1x2) to AAA (y1y2y3) x1 x2 x3 y1 y2 y3 y4 Which means: S(i,j) = Score of best alignment of a prefix of X and a prefix of Y BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

17 Dynamic Programming - 4 Steps:
#7 Still more DP 9/5/07 Dynamic Programming - 4 Steps: Define score of optimal alignment, using recursion Initialize and fill in a DP matrix for storing optimal scores of subproblems, by solving smallest subproblems first (bottom-up approach) Calculate score of optimal alignment(s) Trace back through matrix to recover optimal alignment(s) that generated optimal score BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

18 1- Define Score of Optimal Alignment using Recursion
#7 Still more DP 1- Define Score of Optimal Alignment using Recursion 9/5/07 Define: Initial conditions:  = Match Reward = Mismatch Penalty  = Gap penalty Recursive definition: For 1  i  N, 1  j  M: (xi,yj) =  or   = Gap penalty BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

19 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
9/5/07 2- Initialize & Fill in DP Matrix for Storing Optimal Scores of Subproblems Construct sequence vs sequence matrix Fill in from [0,0] to [N,M] (row by row), calculating best possible score for each alignment ending at residues at [i,j] 1 N S(0,0)=0 1 S(i,j) S(N,M) M BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

20 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
9/5/07 How do we calculate S(i,j)? i.e., Score for alignment of x[1..i] to y[1..j]? 1 of 3 cases  optimal score for this subproblem: x1 x xi-1 xi y1 y yj-1 yj S(i-1,j-1) + (xi,yj) x1 x xi-1 xi y1 y yj — S(i-1,j)  x1 x xi — S(i,j-1)  xi aligns to yj xi aligns to a gap yj aligns to a gap BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

21 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
9/5/07 Specific Example: Note: I changed sequences on this slide (to match the rest of DP example) Scoring Consequence? Case 1: Line up xi with yj i - 1 i x: C - T C G C A y: C A T - T C A Match Bonus j - 1 j Case 2: Line up xi with space i - 1 i x: C - T C G C - A y: C A T - T C A - Space Penalty j Case 3: Line up yj with space i x: C - T C G C A - y: C A T - T C - A Space Penalty j -1 j BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

22 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
9/5/07 Ready? Fill in DP Matrix Keep track of dependencies of scores (in a pointer matrix) 1 N S(0,0)=0 + (xi,yj) =  or  1 S(i-1,j-1) S(i-1,j)  = Match Reward = Mismatch Penalty  = Gap penalty -  S(i,j-1) S(i,j) -  S(N,M) M Initialization Recursion BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

23 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
9/5/07 Fill in the DP matrix !! λ C T C G C A G C λ C -5 -10 -15 -20 -25 -30 -35 10 5 A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A C +10 for match, -2 for mismatch, -5 for space BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

24 3- Calculate Score S(N,M) of Optimal Alignment - for Global Alignment
#7 Still more DP 9/5/07 3- Calculate Score S(N,M) of Optimal Alignment - for Global Alignment λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) +10 for match, -2 for mismatch, -5 for space BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

25 3- Calculate Score S(N,M) of Optimal Alignment - for Global Alignment
#7 Still more DP 9/5/07 3- Calculate Score S(N,M) of Optimal Alignment - for Global Alignment λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) +10 for match, -2 for mismatch, -5 for space BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

26 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
9/5/07 4- Trace back through matrix to recover optimal alignment(s) that generated the optimal score How? "Repeat" alignment calculations in reverse order, starting at from position with highest score and following path, position by position, back through matrix Result? Optimal alignment(s) of sequences BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

27 Traceback - for Global Alignment
#7 Still more DP Traceback - for Global Alignment 9/5/07 Start in lower right corner & trace back to upper left Each arrow introduces one character at end of alignment: A horizontal move puts a gap in left sequence A vertical move puts a gap in top sequence A diagonal move uses one character from each sequence BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

28 Traceback to Recover Alignment
#7 Still more DP Traceback to Recover Alignment 9/5/07 λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 Can have >1 optimal alignment; this example has 2 BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

29 Traceback to Recover Alignment
#7 Still more DP Traceback to Recover Alignment 9/5/07 λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 Where did red arrows come from? BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

30 Traceback to Recover Alignment
#7 Still more DP Traceback to Recover Alignment 9/5/07 λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 +10 for match, -2 for mismatch, -5 for space Where did 33 come from? Match = 10, so 33-10= 23 Must have come from diagonal Where did 23 come from? (Not a match) Left? 28-5= 23; Diag? 13-2= 11; Top? 8-5= 3 BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

31 Traceback to Recover Alignment
#7 Still more DP Traceback to Recover Alignment 9/5/07 λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 +10 for match, -2 for mismatch, -5 for space Where did 8 come from? Two possibilities: = 8 or 10-2=8 Then, follow both paths BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

32 Traceback to Recover Alignment
#7 Still more DP Traceback to Recover Alignment 9/5/07 λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 C with C - with A T with T C with - G with T C with C A with A G with - C with C Great - but what are the alignments? #1 BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

33 Traceback to Recover Alignment
#7 Still more DP Traceback to Recover Alignment 9/5/07 λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 C with C - with A T with T C with T G with - C with C A with A G with - C with C Great - but what are the alignments? #2 BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

34 What are the 2 Global Alignments with Optimal Score = 33?
#7 Still more DP 9/5/07 What are the 2 Global Alignments with Optimal Score = 33? Top: C T C G C A G C Left: C A T T C A C C T C G C A G C 1: C T C G C A G C 2: A horizontal move puts a gap in left sequence A vertical move puts a gap in top sequence A diagonal move uses one character from each sequence BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

35 What are the 2 Global Alignments with Optimal Score = 33?
#7 Still more DP 9/5/07 What are the 2 Global Alignments with Optimal Score = 33? Top: C T C G C A G C Left: C A T T C A C C T C G C A G C C A T T C A C 1: C T C G C A G C C A T T C A C 2: A horizontal move puts a gap in left sequence A vertical move puts a gap in top sequence A diagonal move uses one character from each sequence BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs

36 BCB 444/544 F07 ISU Dobbs #7 - Still More DP
Check Traceback? 9/5/07 λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 v d 1 d h d h 2 d h A horizontal move puts a gap in left sequence A vertical move puts a gap in top sequence A diagonal move uses one character from each sequence BCB 444/544 F07 ISU Dobbs #7 - Still More DP BCB 444/544 Fall 07 Dobbs


Download ppt "BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment"

Similar presentations


Ads by Google