Download presentation
Presentation is loading. Please wait.
1
A superposition of two sequences that reveals a large number of common regions (matches) Possible alignment of ACATGCGATT and GAGATCTGA -AC-ATGC-GATT 6 matches, 6 gaps, 0 mismatches GA-GAT-CTGA-- -ACATGC-GATT 6 matches, 5 gaps, 1 mismatches GAGAT-CTGA-- -ACATGCGATT 5 matches, 3 gaps, 3 mismatches GAGATCTGA— Pairwise Alignment
2
An alignment is a hypothesis about the transformations that have converted one sequence into another GATTACA mutationsGATTAGA deletionsGAT. ACA insertionsGATTTACA (the gaps represent insertions/deletions, also called indels) Pairwise Alignment
3
To evaluate the quality of an alignment assign scores for matches(m) gaps(g) mismatches(s) Score = #matches × m + #gaps × g + #mismatches × s With m = 2, g = -2, s = -1 Scoring Function -AC-ATGC-GATT Score = 6 × 2 + 6 × -2 + 0 × -1 = 0 GA-GAT-CTGA-- -ACATGC-GATT Score = 6 × 2 + 5 × -2 + 1 × -1 = 1 GAGAT-CTGA-- -ACATGCGATT Score = 5 × 2 + 3 × -2 + 3 × -1 = 1 GAGATCTGA--
4
Computing Alignment Different types of alignment depending on research question Global Alignment – find the overall similarity Semiglobal Alignment – ignore trailing gaps at both ends of alignment Local Alignment – look for a maximal scoring common fragment All can be computed using variation of Dynamic Programming (table-filling) algorithm Illustrative example – a tour of Manhattan
5
A sightseeing tour starts at 1 st str, 1 st ave up to 7 th str, 9 th ave The tourists are allowed to move only South and East Goal: See as many landmarks as possible Manhattan Tour avenue (1, 1) (7, 9)
6
For each crossing record max # of sites that can be seen Manhattan Tour Strategy 02 3 57 7 8 11 12 0 2 4 4 7 8 3 56810 1113 4 67 911 12 13 15 5 68 1113 15 17 5 9 9 11 1517 19 8 11 12 13 1618 20 21 9 12 14 1819 21 22 ENTER
7
Let T(s, a) denote the maximum number of sites that can be seen starting from the origin up to intersection (s, a) Then the previous algorithm uses the fact that T(s-1, a) + # of sites between streets s-1 and s T(s, a-1) + # of sites between avenues a-1 and a In other words, to get to (s, a) we could have moved one block East, from (s, a-1) or one block South, from (s-1, a) If we know the max # of sites that could be seen up to (s, a-1) and up to (s-1, a) we just need to add the number of sites along each direction and pick the larger number Manhattan Tour Strategy T(s, a) = max
8
How is Manhattan Tour related to global sequence alignment Given strands A, B of length m and n align A[1:m] and B[1:n] option 1: ignore last base of A (pair with gap) – then align A[1 : m-1] and B[1 : n] option 2: ignore last base of B (pair with gap) – then align A[1 : m] and B[1 : n-1] option 3: pair up last two bases of A and B – then align A[1 : m-1] and B[1 : n-1] (Pick the best option) Global Alignment gap penalty match/mismatch penalty
9
In other words, if Score(i, j) denotes the best score for aligning A[1 : i] and B[1 : j] then Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i, j) = max Score(i-1, j-1) + mif A[i] == B[j] Score(i-1, j-1) + sif A[i] <> B[j] Just like the Manhattan tour if we use a 2D table the contents of cell (i, j) depends only on the cell above: (i-1, j) the cell to the left: (i, j-1) the cell diagonally above: (i-1, j-1) Computing Global Alignment
10
What do we do when one strand runs out of bases, i.e. aligning first i bases of A, A[1 : i], with first 0 bases of B (empty) Score(i, 0) = i*g aligning first 0 bases of A (empty) with first j bases of B, B[1 : j] Score(0, j) = j*g Computing Global Alignment
11
Align CACTAG and GATTACA using g = -2, s = -1, m = 2 Global Alignment Example -GATTACA - C A C T A G
12
Align CACTAG and GATTACA using g = -2, s = -1, m = 2 Global Alignment Example -GATTACA -0-2-4-6-8-10-1214 C-2 A-4 C-6 T-8 A-10 G-12
13
Align CACTAG and GATTACA using g = -2, s = -1, m = 2 Global Alignment Example -GATTACA -0-2-4-6-8-10-1214 C-2120312 A-43420223 C-63221310 T-8142 A-10 G-12
14
Align CACTAG and GATTACA using g = -2, s = -1, m = 2 Global Alignment Example -GATTACA -0-2-4-6-8-10-1214 C-2-3-5-7-9-8-10 A-4-31-3-5-7-6 C -50-2-4-3-5 T-8-7-3120-2-4 A-10-9-50420 G-12-8-7-3-2231
15
-AGATC - G C T G C Align GCTGC and AGATC using g = -2, s = -1, m = 2 Global Alignment Example
16
Align GCTGC and AGATC using g = -2, s = -1, m = 2 Global Alignment Example -AGATC - G C T G C 0 -2 -4-6-8 -10 -2 -4 -6 -8 -10 0-2 -4 -6 -3 -2 -3-2 -5 -4-3 1 -7 -3 -50 -9 -5 -4 -31 GCTGC: AGATC: C C - G T T A C G G A -
17
If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i, j) = max Score(i-1, j-1) + mif A[i] == B[j] Score(i-1, j-1) + sif A[i] <> B[j] Score(i, 0) = i * g Score(j, 0) = j * g Identifying the actual alignment is done by tracing back the pointers starting at lower-right corner Global Alignment Summary
18
To compute GLOBAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize the cells of row 0 and column 0 only 2. for each column c, set cell(0, c) to c*gap 3. for each row r, set cell(r, 0) to r*gap 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 7. set the current cell to the largest value of option1, option2, option3 8. return the Matrix (or highest score) Global Alignment Algorithm
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.