Alignment II Dynamic Programming

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

Eugene W.Myers and Webb Miller. Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Final presentation Final presentation Tandem Cyclic Alignment.
Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Sequence Alignment Tutorial #2
S. Maarschalkerweerd & A. Tjhang1 Probability Theory and Basic Alignment of String Sequences Chapter
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 12: Refining Core String.
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Space Efficient Alignment Algorithms and Affine Gap Penalties
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11 sections4-7 Lecturer:
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
CS 262 Discussion Section 1. Purpose of discussion sections To clarify difficulties/ambiguities in the problem set questions and lecture material. To.
Inexact Matching General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic programming.
Sequence Alignment Cont’d. Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Midterm Review. Review of previous weeks Pairwise sequence alignment Scoring matrices PAM, BLOSUM, Dynamic programming Needleman-Wunsch (Global) Semi-global.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Sequence Alignment III CIS 667 February 10, 2004.
Sequence Alignments Introduction to Bioinformatics.
CISC667, F05, Lec6, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Pairwise sequence alignment Smith-Waterman (local alignment)
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment of Genomic Sequences Wen-Hsiung Li Ecology & Evolution Univ. of Chicago.
Pairwise alignment Computational Genomics and Proteomics.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
Sequence comparison: Local alignment
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
CS 5263 Bioinformatics Lecture 4: Global Sequence Alignment Algorithms.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Some Independent Study on Sequence Alignment — Lan Lin prepared for theory group meeting on July 16, 2003.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
8/31/07BCB 444/544 F07 ISU Dobbs #6 - More DP: Global vs Local Alignment1 BCB 444/544 Lecture 6 Try to Finish Dynamic Programming Global & Local Alignment.
Expected accuracy sequence alignment Usman Roshan.
8/31/07BCB 444/544 F07 ISU Dobbs #6 - Scoring Matrices & Alignment Stats1 BCB 444/544 Lecture 6 Finish Dynamic Programming Scoring Matrices Alignment Statistics.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Pairwise Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 4, 2004 ChengXiang Zhai Department of Computer Science University.
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
Piecewise linear gap alignment.
Sequence comparison: Local alignment
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Global, local, repeated and overlaping
Sequence Alignment 11/24/2018.
#7 Still more DP, Scoring Matrices
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Presentation transcript:

Alignment II Dynamic Programming

Pair-wise sequence alignments Idea: Display one sequence above another with spaces inserted in both to reveal similarity A: C A T - T C A - C | | | | | B: C - T C G C A G C

Two types of alignment S = CTGTCGCTGCACG T = TGCCGTG Global alignment Local alignment CTGTCG-CTGCACG -TGC-CG-TG---- CTGTCGCTGCACG-- -------TGC-CGTG

Global alignment: Scoring CTGTCG-CTGCACG -TGC-CG-TG---- Reward for matches:  Mismatch penalty:  Space penalty:  score(A) = w – x - y w = #matches x = #mismatches y = #spaces

Global alignment: Scoring Reward for matches: 10 Mismatch penalty: 2 Space penalty: 5 C T G T C G – C T G C - T G C – C G – T G - -5 10 10 -2 -5 -2 -5 -5 10 10 -5 Total = 11

Optimum Alignment The score of an alignment is a measure of its quality Optimum alignment problem: Given a pair of sequences X and Y, find an alignment (global or local) with maximum score The similarity between X and Y, denoted sim(X,Y), is the maximum score of an alignment of X and Y

Alignment algorithms Global: Needleman-Wunsch Local: Smith-Waterman NW and SW use dynamic programming Variations: Gap penalty functions Scoring matrices

Global Alignment: Algorithm

Theorem. C(i,j) satisfies the following relationships: Initial conditions: Recurrence relation: For 1  i  n, 1  j  m:

Justification S1 S2 . . . Si-1 Si T1 T2 . . . Tj-1 Tj C(i-1,j-1) + w(Si,Tj) C(i-1,j)  S1 S2 . . . Si — T1 T2 . . . Tj-1 Tj C(i,j-1) 

Example Case 1: Line up Si with Tj i - 1 i S: C A T T C A C T: C - T T C A G j -1 j Case 2: Line up Si with space i - 1 i S: C A T T C A - C T: C - T T C A G - j Case 3: Line up Tj with space i S: C A T T C A C - T: C - T T C A - G j -1 j

Computation Procedure C(i-1,j) C(i-1,j-1) C(i,j-1) C(i,j) C(n,m)

λ C T C G C A G C λ 0 -5 -10 -15 -20 -25 -30 -35 -40 -5 -10 -15 -20 -25 -30 -35 10 5 C A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A C +10 for match, -2 for mismatch, -5 for space

λ C T C G C A G C λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 * C A T T C A C Traceback can yield both optimum alignments

End-gap free alignment Gaps at the start or end of alignment are not penalized Match: +2 Mismatch and space: -1 Best global Best end-gap free Score = 1 Score = 9

Motivation: Shotgun assembly Shotgun assembly produces large set of partially overlapping subsequences from many copies of one unknown DNA sequence. Problem: Use the overlapping sections to ”paste” the subsequences together. Overlapping pairs will have low global alignment score, but high end-space free score because of overlap.

Motivation: Shotgun assembly

Algorithm Same as global alignment, except: Initialize with zeros (free gaps at start) Locate max in the last row/column (free gaps at end)

λ C T C G C A G C λ 0 0 0 0 0 0 0 0 0 C 10 5 10 5 10 5 0 10 A 5 8 5 8 5 20 15 10 0 15 10 5 6 15 18 13 -2 10 13 8 3 10 13 16 10 5 20 15 18 13 8 23 5 8 15 18 13 28 23 18 0 3 10 25 20 23 38 33 T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A G +10 for match, -2 for mismatch, -5 for gap

Local Alignment: Motivation Ignoring stretches of non-coding DNA: Non-coding regions are more likely to be subjected to mutations than coding regions. Local alignment between two sequences is likely to be between two exons. Locating protein domains: Proteins of different kind and of different species often exhibit local similarities Local similarities may indicate ”functional subunits”.

Local alignment: Example S = g g t c t g a g T = a a a c g a Match: +2 Mismatch and space: -1 Best local alignment: g g t c t g a g a a a c – g a - Score = 5

Local Alignment: Algorithm C [i, j] = Score of optimally aligning a suffix of s with a suffix of t. Initialize top row and leftmost column to zero.

λ C T C G C A G C λ 1 2 C A T T C A C +1 for a match, -1 for a mismatch, -5 for a space

Some Results Most pairwise sequence alignment problems can be solved in O(mn) time. Space requirement can be reduced to O(m+n), while keeping run-time fixed [Myers88]. Highly similar sequences can be aligned in O(dn) time, where d measures the distance between the sequences [Landau86].

Reducing space requirements O(mn) tables are often the limiting factor in computing large alignments There is a linear space technique that only doubles the time required [Hirschberg77]

λ C T C G C A G C λ 0 0 0 0 0 0 0 0 0 0 10 5 10 5 10 5 0 10 C 0 5 8 5 8 5 20 15 10 A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A G IDEA: We only need the previous row to calculate the next

Linear-space Alignments mn + ½ mn + ¼ mn + 1/8 mn + 1/16 mn + … = 2 mn

Affine Gap Penalty Functions Gap penalty = h + gk where k = length of gap h = gap opening penalty g = gap continuation penalty Can also be solved in O(nm) time using dynamic programming