Multiple Sequence Alignment Dynamic Programming. Multiple Sequence Alignment VTISCTGSSSNIGAG  NHVKWYQQLPG VTISCTGTSSNIGS  ITVNWYQQLPG LRLSCSSSGFIFSS.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Bioinformatics Multiple sequence alignments Scoring multiple sequence alignments Progressive methods ClustalW Other methods Hidden Markov Models Lecture.
March 2, 2004, BMI Biomedical Data Management Improving Performance of Multiple Sequence Alignment Analysis in Multi-client Environments Use of Inexpensive.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Structural bioinformatics
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Sequence analysis lecture 6 Sequence analysis course Lecture 6 Multiple sequence alignment 2 of 3 Multiple alignment methods.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
11 Ch6 multiple sequence alignment methods 1 Biologists produce high quality multiple sequence alignment by hand using knowledge of protein sequence evolution.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Sequence Analysis Tools
Multiple sequence alignments and motif discovery Tutorial 5.
Similar Sequence Similar Function Charles Yan Spring 2006.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Multiple Sequence Alignments
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Sequence comparison: Local alignment
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Figure S1_Yao Qin et al. Figure S1 Occurrence and distribution of trihelix family in different plant species. Red branches in the cladogram indicate that.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Eidhammer et al. Protein Bioinformatics Chapter 4 1 Multiple Global Sequence Alignment and Phylogenetic trees Inge Jonassen and Ingvar Eidhammer.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Copyright OpenHelix. No use or reproduction without express written consent1.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Multiple Sequence Alignment Colin Dewey BMI/CS 576 Fall 2015.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Protein Sequence Alignment Multiple Sequence Alignment
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Sequence similarity, BLAST alignments & multiple sequence alignments
Multiple sequence alignment (msa)
Sequence comparison: Local alignment
Sequence Analysis.
Multiple Sequence Alignment
Introduction to Bioinformatics
Multiple Sequence Alignment
Presentation transcript:

Multiple Sequence Alignment Dynamic Programming

Multiple Sequence Alignment VTISCTGSSSNIGAG  NHVKWYQQLPG VTISCTGTSSNIGS  ITVNWYQQLPG LRLSCSSSGFIFSS  YAMYWVRQAPG LSLTCTVSGTSFDD  YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG  ATLVCLISDFYPGA  VTVAWKADS  AALGCLVKDYFPEP  VTVSWNSG-  VSLTCLVKGFYPSD  IAVEWESNG-  Goal: Bring the greatest number of similar characters into the same column of the alignment Similar to alignment of two sequences.

CLUSTALW MSA MSA of four oxidoreductase NAD binding domain protein sequences. Red: AVFPMILW. Blue: DE. Magenta: RHK. Green: STYHCNGQ. Grey: all others. Residue ranges are shown after sequence names. Chenna et al. Nucleic Acids Research, 2003, Vol. 31, No

Multiple Sequence Alignment: Motivation Correspondence. Find out which parts “do the same thing” –Similar genes are conserved across widely divergent species, often performing similar functions Structure prediction –Use knowledge of structure of one or more members of a protein MSA to predict structure of other members –Structure is more conserved than sequence Create “profiles” for protein families –Allow us to search for other members of the family Genome assembly: Automated reconstruction of “contig” maps of genomic fragments such as ESTs MSA is the starting point for phylogenetic analysis

Multiple Sequence Alignment: Approaches Optimal Global Alignments -Dynamic programming –Generalization of Needleman-Wunsch –Find alignment that maximizes a score function –Computationally expensive: Time grows as product of sequence lengths Global Progressive Alignments - Match closely- related sequences first using a guide tree Global Iterative Alignments - Multiple re-building attempts to find best alignment Local alignments –Profiles, Blocks, Patterns

Scoring a multiple alignment Sum of pairsStarTree A A C CA A A A A A A CC CC

Sum of Pairs AAA AAC ACC A A A AA 10α A A A CA + (6α - 4β) A A C CA + (4α - 6β) = 20α - 10β

Sum-of-Pairs Scoring Function Score of multiple alignment = ∑ i <j score(S i,S j ) where score(S i,S j ) = score of induced pairwise alignment

Induced Pairwise Alignment S 1 S - T I S C T G - S - N I S 2 L - T I – C N G S S - N I S 3 L R T I S C S G F S Q N I Induced pairwise alignment of S 1, S 2 : S 1 S T I S C T G - S N I S 2 L T I – C N G S S N I

MSA: Dynamic Programming The two-sequence alignment algorithm can be generalized to any number of sequences. E.g., for three sequences X, Y, W define C[i,j,k] = score of optimum alignment among X[1..i], Y[1..j], W[1..k] As for two sequences, divide possible alignments into different classes, depending on how they end. –Use to devise recurrence relations for C[i,j,k] –C[i,j,k] is the maximum out of all possibilities

XiYjWkXiYjWk MSA: 7 ways alignment can end for 3 sequences X 1... X i-1 X i Y 1... Y j-1 Y j W 1... W k-1 W k -YjWk-YjWk Xi-WkXi-Wk XiYj-XiYj- --Wk--Wk -Yj--Yj- Xi--Xi--

Dynamic programming for three sequences VSN — S S — NA— AS——— VSNS S N A A S Start Each alignment is a path through the dynamic programming matrix

For 3 seqs. of length n, time is proportional to n 3 Dynamic Programming for Three Sequences C[i,j,k] C[i-1,j-1,k-1] There are 7 ways to get to C[i,j,k] C[i-1,j,k-1] Enumerate all possibilities and choose the best one

Dynamic Programming MSA: General Case For k sequences of length n, dynamic programming algorithm does (2 k -1) n k operations –Example: 6 sequences of length 100 require 6.4X10 13 calculations Space for table is n k Implementations (e.g., WashU MSA 2.1) use tricks and only search subset of dynamic programming table –Even this is expensive. E.g., Baylor CM Search launcher limits MSA to 8 sequences of 800 characters and 10 minutes processing time

Problems with SP scoring Pair-wise comparisons can over-score evolutionarily distant pairs. Reason: For 3 or more sequences, SP scoring does not correspond to any evolutionary tree But not:

Overcoming problems with SP scoring Use weights to incorporate evolution in sum of pairs scoring: –Some pair-wise alignments are more important than others E.g., more important to have a good alignment between mouse and human sequences than mouse and bird –Assign different weights to different pair-wise alignments. Weight decreases with evolutionary distance. Use star tree approach –one sequence is assigned as the ancestor and all others are contrasted it.

Star Alignments Construct multiple alignments using pair-wise alignment relative to a fixed sequence Out of a set S = {S 1, S 2,..., S r } of sequences, pick sequence S c that maximizes star_score(c) = ∑ {sim(S c, S i ) : 1 ≤ i ≤ r, i ≠ c} where sim(S i, S j ) is the optimal score of a pair-wise alignment between S i and S j

Algorithm 1.Compute sim(S i, S j ) for every pair (i,j) 2.Compute star_score(i) for every i 3.Choose the index c that minimizes star_score(c) and make it the center of the star 4.Produce a multiple alignment M such that, for every i, the induced pairwise alignment of S c and S i is the same as the optimum alignment of S c and S i.

Step 4: Detail S c AA--CCTT S 1 AATGCC-- S c A-ACC-TT S 2 AGACCGT- S c A-A--CC-TT S 1 A-ATGCC--- S 2 AGA--CCGT-