Pairwise sequence Alignment.

Slides:



Advertisements
Similar presentations
Sequence Alignments.
Advertisements

Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Global Sequence Alignment by Dynamic Programming.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
BLAST Sequence alignment, E-value & Extreme value distribution.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Pairwise Sequence Alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Sequence Similarity Searching Class 4 March 2010.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Sequence Alignment Slides courtesy of Serafim Batzoglou, Stanford Univ.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Introduction to sequence alignment Mike Hallett (David Walsh)
Sequence comparison: Dynamic programming
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Sequence comparison: Local alignment
Definition of Minimum Edit Distance
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Introduction to bioinformatics 2007
Sequence comparison: Traceback and local alignment
Global, local, repeated and overlaping
Sequence Alignment 11/24/2018.
Intro to Alignment Algorithms: Global and Local
Pairwise Sequence Alignment
Sequence comparison: Local alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Variants of HMMs.
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool (BLAST)
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Pairwise sequence Alignment

Types of Alignment • Global alignment: Aligning the whole sequences • Appropriate when aligning two very closely related sequencs • Local alignment: Aligning certain regions in the sequences • Appropriate for aligning multi-domain protein sequences • It is important to use the “appropriate” type Distinction between global and local alignments of two sequences.

How do we compute the best alignment? AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA Too many possible alignments: >> 2N (exercise) AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC

Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x1x2...xM, y = y1y2…yN, an alignment is an assignment of gaps to positions 0,…, M in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence

Alignment is additive Observation: The score of aligning x1……xM y1……yN Say that x1…xi xi+1…xM aligns to y1…yj yj+1…yN The two scores add up: F(x[1:M], y[1:N]) = F(x[1:i], y[1:j]) + F(x[i+1:M], y[j+1:N])

Calculation of an alignment score

DP Algorithms for PairwiseAlignment The number of all possible pairwise alignments (if gaps are allowed) is exponential in the length of the sequences Therefore, the approach of “score every possible alignment and choose the best” is infeasible in practice Efficient algorithms for pairwise alignment have been devised using dynamic programming (DP)

Two kinds of sequence alignment: global and local We will first consider the global alignment algorithm of Needleman and Wunsch (1970). We will then explore the local alignment algorithm of Smith and Waterman (1981). Finally, we will consider BLAST, a heuristic version of Smith-Waterman. We will cover BLAST in detail on Monday. Page 63

Global alignment with the algorithm of Needleman and Wunsch (1970) • Two sequences can be compared in a matrix along x- and y-axes. • If they are identical, a path along a diagonal can be drawn • Find the optimal subpaths, and add them up to achieve the best score. This involves --adding gaps when needed --allowing for conservative substitutions --choosing a scoring system (simple or complicated) N-W is guaranteed to find optimal alignment(s) Page 63

İnitial stage of filling in the DP Sm and Sn m+1 x n+1 9 x 10 The sequences are written across the top and down the left side of a matrix, respectively, An extra row and column labeled “gap” are added to allow the alignment to begin with a gap of any length in either sequence. The gap rows are filled with penalty scores for gaps of increasing lengths, as indicated. A zero is placed in the upper right box corresponding to no gaps in either sequence. columns rows

Gap=-8 Gap=-4

Three steps to global alignment with the Needleman-Wunsch algorithm [1] set up a matrix [2] score the matrix [3] identify the optimal alignment(s) Page 63

Four possible outcomes in aligning two sequences 1 2 [1] identity (stay along a diagonal) [2] mismatch (stay along a diagonal) [3] gap in one sequence (move vertically!) [4] gap in the other sequence (move horizontally!) Page 64

Page 64

Necessary values in adjacent cells

x (x1x2...xm) and y (y1y2...yn) The matrix has (m+1) rows labeled 0➝m and (n+1) columns labeled 0➝n The rows correspond to the residues of sequence x, and the columns correspond to the residues of sequence y y S0,0 + s(x1,y1) = 0+s(I,T)=0-1=-1 S1,0 + g = -8-8=-16 S0,1 + g = -8-8=-16 x

s11 is the score for an a1-b1 match added to 0 in the upper left position Trial values for s12 are calculated and the maximum score is chosen. Trial 1 is to add the score for the a1-b2 match to s11 and subtract a penalty for a gap of size 1. The other three trials shown by arrows include gap penalties and so likely cannot yield a higher score than trial 1.

Global alignment of two protein sequences by the Needleman-Wunsch algorithm with enhancements by Smith and Waterman. sequence 1 = MNALSDRT and sequence 2 = MGSDRTTET. Notice the subsequence SDRT in the two sequences which one might expect to be aligned if the sequences are aligned properly. JMB, 1970

-12 is the penalty for opening the gap in the alignment, and -4 is the penalty for each additional sequence character in the gap. Use PAM250 M S0,0 + s(x1,y1) = 0+s(M,M)=0+6=6 S1,0 + g = -12-12=-24 S0,1 + g = -12-12=-24 M - - M S1,1 = - M M -

sequence 1 M - N A L S D R T sequence 2 M G S D R T T E T score 6 -12 1 0 -3 1 0 -1 3 = -5 sequence 1 M N A - L S D R T score 6 -12 1 0 -3 1 0 -1 3 = -5

Example 2 score(H,P) = -2, gap penalty=-8 (linear) - H E A G W -8 -16  - H E A G W -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2

Example contd. score(E,P) = 0, score(E,A) = -1, score(H,A) = -2 -  H E A G W  - -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -10 -3

H E A G A W G H E - E Optimal alignment: - P - - A W - H E A E H E A G   H E A G W -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -33 -42 -49 -57 -65 -73 -10 -3 -4 -12 -19 -28 -36 -44 -52 -60 -18 -11 -6 -7 -15 -21 -29 -37 -14 -13 -9 -22 4 -5 -30 2 -38 The value in the final cell is the best score for the alignment

Alignments and Paths through Example 3

t a c g - c a a - - - a c g t g a a t t

t - - a c g c a - - a a c g t g - - a a t t

Example 4 - T G C A T - A - A T - C - T G A T Alignment:

Tracing back a solution (I)

Tracing back a solution (II) The algorithm is called with PRINT-LCS(b,V,n,m)

Computing Distance di-1,j + 1 di,j-1 + 1 di-1,j-1 , if vi=wj di,j=min Only deletions/insertions are allowed

Needleman-Wunsch: dynamic programming N-W is guaranteed to find optimal alignments, although the algorithm does not search all possible alignments. It is an example of a dynamic programming algorithm: an optimal path (alignment) is identified by incrementally extending optimal subpaths. Thus, a series of decisions is made at each step of the alignment to find the pair of residues with the best score. Page 67

Local sequence alignment Suppose, we have a long DNA sequence (e.g., 4000 bp) and we want to compare it with the complete yeast genome (12.5M bp). What if only a portion of our query, say 200 bp length, has strong similarity to a gene in yeast. Can we find this 200 bp portion using (semi) global alignment? Probably not. Because, we are trying to align the complete 4000 bp sequence, thus a random alignment may get a better score than the one that aligns 200 bp portion to the similar gene in yeast.

Global alignment versus local alignment Global alignment (Needleman-Wunsch) extends from one end of each sequence to the other Local alignment finds optimally matching regions within two sequences (“subsequences”) Local alignment is almost always used for database searches such as BLAST. It is useful to find domains (or limited regions of homology) within sequences Smith and Waterman (1981) solved the problem of performing optimal local sequence alignment. Other methods (BLAST, FASTA) are faster but less thorough. Page 69

How the Smith-Waterman algorithm works Set up a matrix between two proteins (size m+1, n+1) No values in the scoring matrix can be negative! S > 0 The score in each cell is the maximum of four values: [1] s(i-1, j-1) + the new score at [i,j] (a match or mismatch) [2] s(i,j-1) – gap penalty [3] s(i-1,j) – gap penalty [4] zero Page 69

Local alignemnt The major difference between this scoring matrix and the Needleman-Wunsch matrix is that there are no negative scores in the Smith-Waterman scoring matrix. The effect of this change is that an alignment can begin anywhere without receiving a negative penalty from a previously low- scoring alignment. sequence 1 S D R T sequence 2 S D R T score 2 4 6 3 = 15

Example Linear gap model Gap = -1 Match = 4 Mismatch = -2 Q: E Q L L K A L E F K L P: K V L E F G Y - E Q L L K A L E F K L - K V L E F G Y

Example Linear gap model Gap = -1 Match = 4 Mismatch = -2 Q: E Q L L K A L E F K L P: K V L E F G Y - E Q L L K A L E F K L - K V L E F G Y

Example Linear gap model Gap = -1 Match = 4 Mismatch = -2 Q: E Q L L K A L E F K L P: K V L E F G Y - E Q L L K A L E F K L - K V L E F G Y 4 3 2 1 6 5 7 10 9 8 14 13 12 11

Example Linear gap model Gap = -1 Match = 4 Mismatch = -2 Q: E Q L L K A L E F K L P: K V L E F G Y - E Q L L K A L E F K L - K V L E F G Y 4 3 2 1 6 5 7 10 9 8 14 13 12 11

Example Alignment - E Q L L K A L E F K L - K V L E F G Y Q: E Q L L K A L E F K L P: K V L E F G Y Q: K A - L E F P: K - V L E F - E Q L L K A L E F K L - K V L E F G Y 4 3 2 1 6 5 7 10 9 8 14 13 12 11

Example Alignment - E Q L L K A L E F K L - K V L E F G Y Q: E Q L L K A L E F K L P: K V L E F G Y Q: K - A L E F P: K V - L E F - E Q L L K A L E F K L - K V L E F G Y 4 3 2 1 6 5 7 10 9 8 14 13 12 11

Example Alignment - E Q L L K A L E F K L - K V L E F G Y Q: E Q L L K A L E F K L P: K V L E F G Y Q: K A L E F P: K V L E F - E Q L L K A L E F K L - K V L E F G Y 4 3 2 1 6 5 7 10 9 8 14 13 12 11

Another Example Linear gap model Find the local alignment between: Match = +5 Mismatch = -4 Q: G C T G G A A G G C A T P: G C A G A G C A C G Q -- G C T A P

Another Example Q P Q’s subsequence: G A A G – G C A P’s subsequence: G C A G A G C A Q -- G C T A 5 1 10 6 2 15 11 7 3 8 13 12 4 9 17 22 18 14 P