Multiple sequence alignment

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Molecular Evolution Revised 29/12/06
Multiple alignment: heuristics. Consider aligning the following 4 protein sequences S1 = AQPILLLV S2 = ALRLL S3 = AKILLL S4 = CPPVLILV Next consider the.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Introduction to Bioinformatics Tutorial 4 Multiple Alignment and Phylogeny.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Multiple sequence alignments and motif discovery Tutorial 5.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Multiple alignment: heuristics
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 23rd, 2014.
Multiple Sequence Alignment
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Multiple Sequence Alignments
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Trees, Stars, and Multiple Biological Sequence Alignment Jesse Wolfgang CSE 497 February 19, 2004.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Multiple Sequence Alignment Colin Dewey BMI/CS 576 Fall 2015.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Tutorial 5 Phylogenetic Trees.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Protein Sequence Alignment Multiple Sequence Alignment
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
Lecture 14 CS5661 Neighbor Joining Generates unrooted tree, allowing for unequal branches Given: Distance matrix for sequences Steps: Repeat 1-3 till all.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Phylogeny - based on whole genome data
Multiple sequence alignment (msa)
Multiple Sequence Alignment
Phylogenetic Trees.
In Bioinformatics use a computational method - Dynamic Programming.
Phylogeny.
SEEM4630 Tutorial 3 – Clustering.
Presentation transcript:

Multiple sequence alignment Tutorial 5 Multiple sequence alignment

Multiple Sequence Alignment – When? More than two sequences DNA Protein Evolutionary relation Homology  Phylogenetic tree Detect motif GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A D B C

Multiple Sequence Alignment – How? Dynamic Programming Optimal alignment Exponential in #Sequences Progressive Efficient Heuristic GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A D B C

Hierarchical Clustering A way to represent similarities graphically. Sums up a pairwise distance matrix as a dendrogram. Not all matrices can be embedded in a tree without error. TGTTAAC TGT-AAC TGT--AC ATGT--C ATGTGGC

ClustalW Pairwise alignment – calculate distance matrix Guided tree Progressive alignment using the guide tree “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al

Progressive (incremental) ClustalW Progressive (incremental) At each step align two existing alignments or sequences. Gaps present in older alignments remain fixed. Uses the Neighbor Joining algorithm.

Neighbor Joining Algorithm An agglomerative hierarchical clustering method. Constructs unrooted tree. 7

Neighbor Joining (Not assuming equal divergence) Step by step summary: Calculate all pairwise distances. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Define a new node (x). Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Continue until two nodes remain – connect with edge.

Step 1. Calculate all pairwise distances. B C D E E D C B A 41 39 22 - 43 20 18 10

Measuring Distance Problem: unrelated sequences approach a fraction of difference expected by chance  The distance measure converges. Jukes-Cantor

Measuring Distance (cont) Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences the score increases proportionally to the extent of dissimilarity between residues

Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Relative distance between i and j Distance between i and j from the distance table Negative values As the average distance from the common ancestor to the rest of the nodes increases, Mij has a lower value. Select pair that produce lowest value Reevaluate M with every iteration Distance of i from all other sequences Number of leaves (=sequences) left in the tree

Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). B A 41 39 22 - 43 20 18 10

Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Etc. E D C B A -44 -47.3 -74 - -57.3 -64 A,B is the pair with the minimal Mi,j distance. The Mij Table is used only to choose the closest pairs (lowest value) and not for calculating the distances

Step 3. Define a new node (x) B C D E X

Step 4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Now we’ll calculate the distance from X to all other nodes. E D C X 31 29 - 20 18 10

Step 5 - Continue until two nodes remain X -44 -49 - New Mi,j table A B C D E X Y

E D Y 11 9 - 10 New Di,j table Only 2 nodes are left. Let’s calculate all the distances to Z A B C D E X Y Z

And in newick tree format The tree 6 4 E D C 5 9 12 10 B A 20 Z Y X And in newick tree format ((C(D,E))(A,B))

ClustalW - Input Input sequences Scoring matrix Gap scoring http://www.ebi.ac.uk/Tools/clustalw2/index.html Input sequences Scoring matrix Gap scoring Output format Email address

Match strength in decreasing order: * : . ClustalW - Output Match strength in decreasing order: * : .

ClustalW - Output

ClustalW - Output

ClustalW - Output

Pairwise alignment scores ClustalW - Output Pairwise alignment scores Building tree Building alignment Final score

ClustalW - Output

Match strength in decreasing order: * : . ClustalW Output Sequence names Sequence positions Match strength in decreasing order: * : .

ClustalW - Output

ClustalW - Output Branch length

ClustalW - Output

ClustalW - Output