Presentation is loading. Please wait.

Presentation is loading. Please wait.

 A superposition of two sequences that reveals a large number of common regions (matches)  Possible alignment of ACATGCGATT and GAGATCTGA -AC-ATGC-GATT.

Similar presentations


Presentation on theme: " A superposition of two sequences that reveals a large number of common regions (matches)  Possible alignment of ACATGCGATT and GAGATCTGA -AC-ATGC-GATT."— Presentation transcript:

1  A superposition of two sequences that reveals a large number of common regions (matches)  Possible alignment of ACATGCGATT and GAGATCTGA -AC-ATGC-GATT 6 matches, 6 gaps, 0 mismatches GA-GAT-CTGA-- -ACATGC-GATT 6 matches, 5 gaps, 1 mismatches GAGAT-CTGA-- -ACATGCGATT 5 matches, 3 gaps, 3 mismatches GAGATCTGA— Pairwise Alignment

2  An alignment is a hypothesis about the transformations that have converted one sequence into another GATTACA  mutationsGATTAGA  deletionsGAT. ACA  insertionsGATTTACA (the gaps represent insertions/deletions, also called indels) Pairwise Alignment

3  To evaluate the quality of an alignment assign scores for  matches(m)  gaps(g)  mismatches(s) Score = #matches × m + #gaps × g + #mismatches × s  With m = 2, g = -2, s = -1 Scoring Function -AC-ATGC-GATT Score = 6 × 2 + 6 × -2 + 0 × -1 = 0 GA-GAT-CTGA-- -ACATGC-GATT Score = 6 × 2 + 5 × -2 + 1 × -1 = 1 GAGAT-CTGA-- -ACATGCGATT Score = 5 × 2 + 3 × -2 + 3 × -1 = 1 GAGATCTGA--

4 Computing Alignment  Different types of alignment depending on research question  Global Alignment – find the overall similarity  Semiglobal Alignment – ignore trailing gaps at both ends of alignment  Local Alignment – look for a maximal scoring common fragment  All can be computed using variation of Dynamic Programming (table-filling) algorithm  Illustrative example – a tour of Manhattan

5  A sightseeing tour starts at 1 st str, 1 st ave up to 7 th str, 9 th ave  The tourists are allowed to move only South and East  Goal: See as many landmarks as possible Manhattan Tour avenue (1, 1) (7, 9)

6  For each crossing record max # of sites that can be seen Manhattan Tour Strategy 02 3 57 7 8 11 12 0 2 4 4 7 8 3 56810 1113 4 67 911 12 13 15 5 68 1113 15 17 5 9 9 11 1517 19 8 11 12 13 1618 20 21 9 12 14 1819 21 22 ENTER

7  Let T(s, a) denote the maximum number of sites that can be seen starting from the origin up to intersection (s, a)  Then the previous algorithm uses the fact that T(s-1, a) + # of sites between streets s-1 and s T(s, a-1) + # of sites between avenues a-1 and a  In other words, to get to (s, a) we could have moved one block East, from (s, a-1) or one block South, from (s-1, a) If we know the max # of sites that could be seen up to (s, a-1) and up to (s-1, a) we just need to add the number of sites along each direction and pick the larger number Manhattan Tour Strategy T(s, a) = max

8  How is Manhattan Tour related to global sequence alignment  Given strands A, B of length m and n align A[1:m] and B[1:n] option 1: ignore last base of A (pair with gap) – then align A[1 : m-1] and B[1 : n] option 2: ignore last base of B (pair with gap) – then align A[1 : m] and B[1 : n-1] option 3: pair up last two bases of A and B – then align A[1 : m-1] and B[1 : n-1] (Pick the best option) Global Alignment gap penalty match/mismatch penalty

9  In other words, if Score(i, j) denotes the best score for aligning A[1 : i] and B[1 : j] then Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i, j) = max Score(i-1, j-1) + mif A[i] == B[j] Score(i-1, j-1) + sif A[i] <> B[j]  Just like the Manhattan tour if we use a 2D table the contents of cell (i, j) depends only on  the cell above: (i-1, j)  the cell to the left: (i, j-1)  the cell diagonally above: (i-1, j-1) Computing Global Alignment

10  What do we do when one strand runs out of bases, i.e.  aligning first i bases of A, A[1 : i], with first 0 bases of B (empty) Score(i, 0) = i*g  aligning first 0 bases of A (empty) with first j bases of B, B[1 : j] Score(0, j) = j*g Computing Global Alignment

11  Align CACTAG and GATTACA using g = -2, s = -1, m = 2 Global Alignment Example -GATTACA - C A C T A G

12  Align CACTAG and GATTACA using g = -2, s = -1, m = 2 Global Alignment Example -GATTACA -0-2-4-6-8-10-1214 C-2 A-4 C-6 T-8 A-10 G-12

13  Align CACTAG and GATTACA using g = -2, s = -1, m = 2 Global Alignment Example -GATTACA -0-2-4-6-8-10-1214 C-2120312 A-43420223 C-63221310 T-8142 A-10 G-12

14  Align CACTAG and GATTACA using g = -2, s = -1, m = 2 Global Alignment Example -GATTACA -0-2-4-6-8-10-1214 C-2-3-5-7-9-8-10 A-4-31-3-5-7-6 C -50-2-4-3-5 T-8-7-3120-2-4 A-10-9-50420 G-12-8-7-3-2231

15 -AGATC - G C T G C  Align GCTGC and AGATC using g = -2, s = -1, m = 2 Global Alignment Example

16  Align GCTGC and AGATC using g = -2, s = -1, m = 2 Global Alignment Example -AGATC - G C T G C 0 -2 -4-6-8 -10 -2 -4 -6 -8 -10 0-2 -4 -6 -3 -2 -3-2 -5 -4-3 1 -7 -3 -50 -9 -5 -4 -31 GCTGC: AGATC: C C - G T T A C G G A -

17  If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i, j) = max Score(i-1, j-1) + mif A[i] == B[j] Score(i-1, j-1) + sif A[i] <> B[j] Score(i, 0) = i * g Score(j, 0) = j * g  Identifying the actual alignment is done by tracing back the pointers starting at lower-right corner Global Alignment Summary

18 To compute GLOBAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize the cells of row 0 and column 0 only 2. for each column c, set cell(0, c) to c*gap 3. for each row r, set cell(r, 0) to r*gap 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 7. set the current cell to the largest value of option1, option2, option3 8. return the Matrix (or highest score) Global Alignment Algorithm


Download ppt " A superposition of two sequences that reveals a large number of common regions (matches)  Possible alignment of ACATGCGATT and GAGATCTGA -AC-ATGC-GATT."

Similar presentations


Ads by Google