Sequence comparison: Local alignment

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Pairwise Sequence Alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
Structural bioinformatics
 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,
Sequence Similarity Searching Class 4 March 2010.
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Developing Pairwise Sequence Alignment Algorithms
Presented by Liu Qi Pairwise Sequence Alignment. Presented By Liu Qi Why align sequences? Functional predictions based on identifying homologues. Assumes:
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
We want to calculate the score for the yellow box. The final score that we fill in the yellow box will be the SUM of two other scores, we’ll call them.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
INTRODUCTION TO BIOINFORMATICS
Sequence comparison: Dynamic programming
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Sequence comparison: Significance of similarity scores
Sequence comparison: Traceback and local alignment
Global, local, repeated and overlaping
Sequence Alignment 11/24/2018.
BNFO 236 Smith Waterman alignment
Sequence comparison: Dynamic programming
Pairwise sequence Alignment.
Pairwise Sequence Alignment
Sequence comparison: Local alignment
Sequence comparison: Traceback
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Sequence comparison: Significance of similarity scores
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas http://faculty.washington.edu/jht/GS559_2012/

Review – global alignment C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Fill DP matrix from upper left to lower right, traceback alignment from lower right corner.

Review - three legal moves A diagonal move aligns a character from each sequence. A vertical move aligns a gap in the sequence along the top edge. A horizontal move aligns a gap in the sequence along the left edge. The move you keep is the best scoring of the three.

Local alignment A single-domain protein may be similar only to one region within a multi-domain protein. A DNA query may align to a small part of a genome. An alignment that spans the complete length of both sequences may be undesirable.

BLAST does local alignments Typical search has a short query against long targets. The alignments returned show only the well-aligned match region of both query and target. targets (e.g. genome contigs) query matched regions returned in alignment

Review - global alignment DP Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

Local alignment DP Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty. (corresponds to start of alignment)

Local DP in equation form keep max of these four values

initialize the same way as for global alignment A simple example initialize the same way as for global alignment A C G T 2 -7 -5 A G C d = -5

A simple example A C G T 2 -7 -5 A G ? C d = -5

A simple example A C G T 2 -7 -5 A G ? C d = -5

A simple example A C G T 2 -7 -5 A G C d = -5 2 -5 -5

A simple example A A C G T 2 -7 -5 A G C d = -5 2

A simple example A C G T 2 -7 -5 A G ? C d = -5 2

A simple example A G C 2 A C G T 2 -7 -5 d = -5 C d = -5 2 (signify no preceding alignment with no arrow)

A simple example A G ? C 2 A C G T 2 -7 -5 d = -5 ? C d = -5 2 (signify no preceding alignment with no arrow)

A simple example A C G T 2 -7 -5 A G 2 C d = -5 2

A simple example A C G T 2 -7 -5 A G 2 ? C d = -5 2

A simple example A C G T 2 -7 -5 A G 2 4 C d = -5 2

Traceback AG A G 2 4 C A C G T 2 -7 -5 2 d = -5 Start traceback at highest score anywhere in matrix, follow arrows back until you reach 0

Multiple local alignments Traceback from highest score, setting each DP matrix score along traceback to zero. Now traceback from the remaining highest score, etc. The alignments may or may not include the same parts of the two sequences. 1 2

Local alignment Two differences from global alignment: If a DP score is negative, replace with 0. Traceback from the highest score in the matrix and continue until you reach 0. Global alignment algorithm: Needleman-Wunsch. Local alignment algorithm: Smith-Waterman.

(some) specific uses for alignments make a pairwise or multiple alignment (duh) test whether two sequences share a common ancestor (i.e. are significantly related) find matches to a sequence in a large database build a sequence tree (phylogenetic tree) make a genome assembly (find overlaps of sequence reads) repeat mask a genome sequence (find matches to a database of known repeats) map sequence reads to a reference genome

Another example Find the optimal local alignment of AAG and GAAGGC. Use a gap penalty of d = -5. A C G T 2 -7 -5 A G 2 4 6 C

Traceback A G 2 4 6 C AAG

You don’t actually need first row and column DP matrix Traceback matrix You don’t actually need first row and column A A G 2 4 6 (-10) -10 G A A G G C 0 = diagonal, -1 = gap left, +1 = gap top, -10 = no alignment

Problem – find the best GLOBAL alignment Find the optimal global alignment of AAG and GAAGGC. Use a gap penalty of d = -5. A C G T 2 -7 -5 A G -5 -10 -15 -20 -25 C -30 (contrast with the best local alignment)