CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Molecular Evolution Revised 29/12/06
Multiple alignment: heuristics. Consider aligning the following 4 protein sequences S1 = AQPILLLV S2 = ALRLL S3 = AKILLL S4 = CPPVLILV Next consider the.
Heuristic alignment algorithms and cost matrices
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures CLUSTAL W Algorithm Lecturer:
Sequence Analysis Tools
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Multiple alignment: heuristics
Multiple sequence alignment
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
BLAST and Multiple Sequence Alignment
Sequence analysis of nucleic acids and proteins: part 1 Based on Chapter 3 of Post-genome Bioinformatics by Minoru Kanehisa, Oxford University Press, 2000.
Multiple Sequence Alignments
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Protein Sequence Alignment and Database Searching.
BINF6201/8201 Molecular phylogenetic methods
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Eidhammer et al. Protein Bioinformatics Chapter 4 1 Multiple Global Sequence Alignment and Phylogenetic trees Inge Jonassen and Ingvar Eidhammer.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Multiple sequence alignment
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Construction of Substitution matrices
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
BLAST and Psi-BLAST and MSA Nov. 1, 2012 Workshop-Use BLAST2 to determine local sequence similarities. Homework #6 due Nov 8 Chapter 5, Problem 8 Chapter.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Multiple Sequence Alignment
In Bioinformatics use a computational method - Dynamic Programming.
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Presentation transcript:

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Julie D.Thompson, Desmond G.Higgins and Toby J.Gibson Nucleic Acids Research, 1994, Vol. 22, No. 22, pp Advisor: Prof. Chang-Biau Yang Speaker: Liang-Tai Wei

ABSTRACT The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down- weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

Algorithm of Clustal W Part 1  Progressive approach ( 漸進式之方法)  1. Pairwise alignment using  1.1 the full mode, which utilizes dynamic programming with the affine gap penalties, i.e. opening (GOP) and extending (GEP).  1.2 the fast mode. Parameters such as ktup, window size etc must be set. (not described in paper)  2. Calculate distance of all pairs of Sequences. Distance = 1 – percent identity  3. Distance matrix construction using the pairwise alignment of all pairs of sequences.

Distance Matrix of a set of globins

Algorithm of Clustal W Part 2  4. Guide tree construction  4.1 UPGMA – unwighted pair-group method with arithmetic mean (old version)  4.2 Neighbor-joining method + mid point method (new version)  Neighbor-joining method >> Unrooted guide tree  Mid point method >> Rooted guide tree

Guide Tree : mid-point method  Determine the branch length using the topology of the un-rooted tree, the distance matrix and the linear algebra.  For every internal node in the un-rooted tree, bisect the tree into left tree and right tree. The total distance of the sequences to this internal node is then calculated.  The real node of the tree can be inferred from above information.

A rooted tree with branch lengths Branch lengths determines weights for each sequence. The most divergent sequence receives the highest weights.

Algorithm of Clustal W Part 3  5. Progressive alignment  Align the sequences following the branching order of the guide tree, from the tips of the tree toward the root  sequence is weighted during alignment  6. Heuristic on gaps  short stretches of hydrophilic residues usually indicate loop or random coil region and are unlikely to have gaps  GOP and GEP is determined on lots of heuristic factors, e.g. where there existing gaps in the neigborhood.

Thanks.