. Sequence Alignment Tutorial #3 © Ydo Wexler & Dan Geiger.

Slides:



Advertisements
Similar presentations
Computational Genomics Lecture #3a
Advertisements

Longest Common Subsequence
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
. Sequence Alignment I Lecture #2 This class has been edited from Nir Friedman’s lecture which is available at Changes made by.
. Sequence Alignment III Lecture #4 This class has been edited from Nir Friedman’s lecture which is available at Changes made by.
Sequence Alignment Tutorial #2
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
Local Alignment Tutorial 2. Conditions –Division to sub-problems possible –(Optimal) Sub-problem solution usable (many times?) –“Bottom-up” approach Dynamic.
Sequence Alignment Tutorial #2
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
4 -1 Chapter 4 The Sequence Alignment Problem The Longest Common Subsequence (LCS) Problem A string : S 1 = “ TAGTCACG ” A subsequence of S 1 :
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Computational Genomics Lecture #3a Much of this class has been edited from Nir Friedman’s lecture which is available at Changes.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
4 - 1 Chap 4 The Sequence Alignment Problem The Sequence Alignment Problem Introduction –What, Who, Where, Why, When, How The Sequence Alignment.
Defining Scoring Functions, Multiple Sequence Alignment Lecture #4
Introduction To Bioinformatics Tutorial 2. Local Alignment Tutorial 2.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
Multiple Sequence alignment Chitta Baral Arizona State University.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
Aligning Alignments Exactly By John Kececioglu, Dean Starrett CS Dept. Univ. of Arizona Appeared in 8 th ACM RECOME 2004, Presented by Jie Meng.
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
PAM250. M. Dayhoff Scoring Matrices Point Accepted Mutations or PAM matrices Proteins with 85% identity were used -> the function is not significantly.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Multiple Sequence Alignment
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Class 2: Basic Sequence Alignment
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
. Sequence Alignment I Lecture #2 This class has been edited from Nir Friedman’s lecture. Changes made by Dan Geiger, then Shlomo Moran. Background Readings:
. Pairwise and Multiple Alignment Lecture #4 This class has been edited from Nir Friedman’s lecture which is available at Changes.
Multiple Sequence Alignment S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG.
Multiple Sequence Alignments
. Sequence Alignment and Database Searching 2 Biological Motivation u Inference of Homology  Two genes are homologous if they share a common evolutionary.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Alignment, Part I Vasileios Hatzivassiloglou University of Texas at Dallas.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Chapter 3 Computational Molecular Biology Michael Smith
1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA.
Expected accuracy sequence alignment Usman Roshan.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
. Sequence Alignment Author:- Aya Osama Supervision:- Dr.Noha khalifa.
. Sequence Alignment Tutorial #3 © Ydo Wexler & Dan Geiger.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Bioinformatics: The pair-wise alignment problem
SPIRE Normalized Similarity of RNA Sequences
Sequence Alignment 11/24/2018.
Computational Biology Lecture #6: Matching and Alignment
Computational Biology Lecture #6: Matching and Alignment
Intro to Alignment Algorithms: Global and Local
Multiple Sequence Alignment
Computational Genomics Lecture #3a
Sequence Alignment Tutorial #2
Fragment Assembly 7/30/2019.
Presentation transcript:

. Sequence Alignment Tutorial #3 © Ydo Wexler & Dan Geiger

2 Sequence Alignment (Reminder) Input: two sequences S 1, S 2 over the same alphabet Output: two sequences S’ 1, S’ 2 of equal length ( S’ 1, S’ 2 are S 1, S 2 with possibly additional gaps) Example:  S 1 = GCGCATGGATTGAGCGA  S 2 = TGCGCCATTGATGACC u A possible alignment: S’ 1 = -GCGC-ATGGATTGAGCGA S’ 2 = TGCGCCATTGAT-GACC-- Goal: How similar are two sequences S 1 and S 2 Global Alignment:

3 Input: two sequences S 1, S 2 over the same alphabet Output: two sequences S’ 1, S’ 2 of equal length ( S’ 1, S’ 2 are substrings of S 1, S 2 with possibly additional gaps) Example:  S 1 = GCGCATGGATTGAGCGA  S 2 = TGCGCCATTGATGACC u A possible alignment: S’ 1 = ATTGA-G S’ 2 = ATTGATG Goal: Find the pair of substrings in two input sequences which have the highest similarity Local Alignment: Sequence Alignment (Reminder)

4 -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements:  Perfect matches  Mismatches  Insertions & deletions (indel) u Score each position independently u Score of an alignment is sum of position scores

5 Breaking Number Example: M =AAAATTTAAATTTA E =AATTATA M 1 =AAAATTTM 2 =AAATTM 3 =A E 1 = AATTE 2 = ATE 3 =A Find an O(|M||E|) algorithm for finding the breaking number of M,E.  Input: Two sequences M,E over the same alphabet ( |M|≥|E| )  Output: The smallest k, s.t. there exist partitions: M=M 1 M 2 … M k, E=E 1 E 2 … E k s.t E i is a substring of M i for all i = 1..k. If no such k exists, then return ∞. AAAATTTAAATTTA --AATT---AT--A

6 Solution: Reduce the problem to global alignment with modifications: u Do not allow mismatches  Do not allow gaps in M u No penalty for gaps in start/end of sequence u Constant penalty for gaps (regardless of their length) Scoring scheme:  Match – 0  Mismatch - - ∞  Gap intr.- -1  Gap elong.- 0 Breaking Number (cont) Affine gap penalty (d)(e)(d)(e)   breaking number = -score of the alignment + 1. AAAATTTAAATTTA --AATT---AT--A

7 Complexity: Standard O(|M||E|) Dynamic Programming Correctness: Two-way argument 1. An alignment of score –( k-1 ) corresponds to a partition of M,E to k subsequences 2. A partition of M,E to k subsequences has an alignment score of –( k-1 )  Optimal alignment has score of - ∞  There is no valid partition (2)  Optimal alignment has score –k  - There is a valid partition to k+1 blocks (1) - There is no valid partition to less blocks (2) Breaking Number (cont)

8 Multiple Sequence Alignment S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG T-AT-A --A--A CCACCA -GC-GC

9 Multiple Sequence Alignment (cont) Input: Sequences S 1, S 2,…, S k over the same alphabet Output: Gapped sequences S’ 1, S’ 2,…, S’ k of equal length 1.|S’ 1 |= |S’ 2 |=…= |S’ k | 2.Removal of spaces from S’ i obtains S i Sum-of-pairs (SP) score for a multiple global alignment is the sum of scores of all pairwise alignments induced by it.

10 Consider the following alignment: AC-CDB- -C-ADBD A-BCDAD Multiple Sequence Alignment Example Scoring scheme: match -0 mismatch/indel --1 SP score: =-12

11 Given k strings of length n, there is a generalization of the DP algorithm that finds an optimal SP alignment: Instead of a 2-dimensional table we have a k -dimensional table Each dimension is of length ‘n’+1 Each entry depends on 2 k -1 adjacent entries Complexity: O(2 k n k ) This problem is known to be NP-hard (no polynomial-time algorithm) Multiple Sequence Alignment Complexity

12 Multiple Sequence Alignment Approximation Algorithm We use cost instead of score  Find alignment of minimal cost Assumption: the cost function δ is a distance function δ(x,x) = 0 δ(x,y) = δ(y,x) ≥ 0 δ(x,y) + δ(y,z) ≥ δ(x,z) (triangle inequality) (e.g. cost of MM ≤ cost of two indels) D(S,T) - cost of minimum global alignment between S and T

13 The ‘star’ algorithm: Input: Γ - set of k strings S 1, …,S k. 0.For each i<j calculate D(S i,S j ). 1.Find the string S’ (center) that minimizes 2.Denote S 1 =S’ and the rest of the strings as S 2, …,S k 3.Iteratively add S 2, …,S k to the alignment as follows: a.Suppose S 1, …,S i-1 are already aligned as S’ 1, …,S’ i-1 b.Align S i to S’ 1 to produce S’ i and S’’ 1 aligned c.Adjust S’ 2, …,S’ i-1 by adding spaces where spaces were added to S’’ 1 d.Replace S’ 1 by S’’ 1 Multiple Sequence Alignment Approximation Algorithm

14 Time analysis: Choosing S 1 – execute DP for all sequence-pairs - O(k 2 n 2 ) Adding S i to the alignment - execute DP for S i, S’ 1 - O(i·n 2 ). (In the i th stage the length of S’ 1 can be up-to i · n ) Multiple Sequence Alignment Approximation Algorithm total complexity

15 For all i : d(1,i)≤D(S 1,S i ) (we perform optimal alignment between S’ 1 and S i and δ(-,-) = 0 ) Multiple Sequence Alignment Approximation Algorithm Approximation ratio: M* - optimal alignment M - The alignment produced by this algorithm d(i,j) - the distance M induced on the pair S i,S j

16 Multiple Sequence Alignment Approximation Algorithm Approximation ratio: Definition of S 1 : Triangle inequality