Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Slides:



Advertisements
Similar presentations
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Walter Pirovano 24 Oct 2007 Genome analysis Lecture 11: literature discussion.
Advertisements

Multiple Sequence Alignment (MSA) I519 Introduction to Bioinformatics, Fall 2012.
Clustal Ω for Protein Multiple Sequence Alignment Des Higgins (Conway Institute, University College Dublin, Ireland), “Clustal Omega for Protein Multiple.
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Clustal W and Clustal X version 2.0 김영호, 박준호, 최현희 The 9 th Protein Folding Winter School.
COFFEE: an objective function for multiple sequence alignments
Structural bioinformatics
Profile-profile alignment using hidden Markov models Wing Wong.
Expected accuracy sequence alignment
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
BNFO 602 Lecture 2 Usman Roshan. Sequence Alignment Widely used in bioinformatics Proteins and genes are of different lengths due to error in sequencing.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Multiple Sequence Alignment. Sequence Families Most sequences are members of large families, some with the same function and others with different functions.
Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 23rd, 2014.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
3D-COFFEE Mixing Sequences and Structures Cédric Notredame.
Multiple sequence alignment
Multiple sequence alignment Monday, December 6, 2010 Bioinformatics J. Pevsner
Catherine S. Grasso Christopher J. Lee Multiple Sequence Alignment Construction, Visualization, and Analysis Using Partial Order Graphs.
Multiple Sequence Alignment
“Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,
An Introduction to Multiple Sequence Alignments Cédric Notredame.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Phylogenetic Analysis Dayong Guo. Introduction Phylogenetics is the study of evolutionary relatedness among various species, populations, or among a set.
Coffee Shop F 黃仁暐 F 戴志華 F 施逸優 R 吳於芳 R 林與絜.
ZORRO : A masking program for incorporating Alignment Accuracy in Phylogenetic Inference Sourav Chatterji Martin Wu.
Multiple sequence alignment Tuesday, Feb Suggested installation for the following tools on your own computer: ClustalX, Mega4, GeneDoc; treeview.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
© Wiley Publishing All Rights Reserved. Building Multiple- Sequence Alignments.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Multiple Sequence Alignment (MSA) 1.Uses of MSA 2.Technical difficulties 1.Select sequences 2.Select objective function 3.Optimize the objective function.
Using the T-Coffee Multiple Sequence Alignment Package I - Overview Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
11 Overview Paracel GeneMatcher2. 22 GeneMatcher2 The GeneMatcher system comprises of hardware and software components that significantly accelerate a.
Getting the best out of multiple sequence alignment methods in the genomic era Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics.
RNA Sequencing I: De novo RNAseq
CrossWA: A new approach of combining pairwise and three-sequence alignments to improve the accuracy for highly divergent sequence alignment Che-Lun Hung,
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Integrating Biological Information In Multiple Sequence Alignments Confronting Bits and Pieces of Information Cédric Notredame CNRS-Marseille, France
Multiple sequence alignment
Cédric Notredame (07/11/2015) Recent Progress in Multiple Sequence Alignments: A Survey Cédric Notredame.
Classifying MSA Packages Multiple Sequence Alignments in the Genome Era Cédric Notredame Information Génétique et Structurale CNRS-Marseille, France.
T-Coffee tutorial ACGT Retreat 2012 Jean-François Taly, Ionas Erb and Cedrik Magis.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Grundlagen der Bioinformatik Multiples Sequenzalignment Juni 2007.
Finding, Aligning and Analyzing Non Coding RNAs Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Phyloinformatics or How to analyze LOTS of sequences Heath Blackmon University of Texas at Arlington Bioinformatics – Spring 2014.
T-COFFEE, a novel method for Multiple Sequence Alignments Cédric Notredame.
Expected accuracy sequence alignment Usman Roshan.
最佳的多重序列比對方法針對基因組 領域 Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
What is BLAST? Basic BLAST search What is BLAST?
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
T-COFFEE, a novel method for combining biological information Cédric Notredame.
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.
Aligning Kinases Applying MSA Analysis to the CDK family.
Multiple Sequence Alignment
What is BLAST? Basic BLAST search What is BLAST?
Bioinformatics for Research
ncRNA Multiple Alignments with R-Coffee
Basics of BLAST Basic BLAST Search - What is BLAST?
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Recent Progress in Multiple Sequence Alignments: A Survey
An Introduction to Multiple Sequence Alignments
An Introduction to Multiple Sequence Alignments
Multiply Aligning RNA Sequences
Genes to Trees Daniel Ayres and Adam Bazinet
Homology Modeling.
T-Coffee: What’s New in The Grinder
Presentation transcript:

Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program

T-Coffee and Concistency… SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT SeqA GARFIELD THE LAST FA-T CAT SeqB GARFIELD THE FAST CA-T --- SeqC GARFIELD THE VERY FAST CAT SeqD THE ---- FA-T CAT

Consistency: Conflicts and Information Y WZ X Z Y Z W Y Z X W X Y OR + + Non Consistent Consistent Y WZ Y Z W OR X X X

T-Coffee and Concistency… SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88 SeqB GARFIELD THE FAST CAT --- SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100 SeqD THE ---- FAT CAT SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100 SeqC GARFIELD THE VERY FAST CAT SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100 SeqD THE ---- FA-T CAT

T-Coffee and Concistency… SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88 SeqB GARFIELD THE FAST CAT --- SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100 SeqD THE ---- FAT CAT SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100 SeqC GARFIELD THE VERY FAST CAT SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100 SeqD THE ---- FA-T CAT SeqA GARFIELD THE LAST FAT CAT Weight =88 SeqB GARFIELD THE FAST CAT --- SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CAT SeqB GARFIELD THE ---- FAST CAT SeqA GARFIELD THE LAST FA-T CAT Weight =100 SeqD THE ---- FA-T CAT SeqB GARFIELD THE ---- FAST CAT

T-Coffee and Concistency… SeqA GARFIELD THE LAST FAT CAT Weight =88 SeqB GARFIELD THE FAST CAT --- SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CAT SeqB GARFIELD THE ---- FAST CAT SeqA GARFIELD THE LAST FA-T CAT Weight =100 SeqD THE ---- FA-T CAT SeqB GARFIELD THE ---- FAST CAT

T-Coffee and Concistency…

Methods Data Scalability

Running T-Coffee over the Web

Available Servers and Flavors

Which MSA Method ???

Combining Many MSAs into ONE MUSCLE MAFFT ClustalW ??????? T-Coffee

Consistency and Accuracy

What To Do Without Structures

Using the M-Coffee Server

Integrating New Types of Data Template Based Sequence Alignments

Experimental Data … TARGET Experimental Data … TARGET Template Aligner Template-Sequence Alignment Primary Library Template Alignment Template based Alignment of the Sequences Templates TARGET

Exploring The Template World TemplateGeneratorAlignment Method RNA StructurePredictionRNA Aligner Protein StructureBLAST vs PDB3D Aligner ProfileBLAST vs NRProfile/Profile Alignment Gene StructureENSEMBLGenome Aligner PromoterTransfacMeta-Aligner

Exploring The Template World TemplateGeneratorAlignment Method Mode RNA Structure PredictionRNA Aligner R-Coffee Protein Structure BLAST /PDB3D Aligner 3D-Coffee Profile BLAST/NRProfile/Profile PSI-Coffee Gene Structure ENSEMBLGenome Aligner Exoset Promoter TransfacMeta-Aligner Meta-Coffee

3D-Coffee/Expresso Incorporating Structural Information

Expresso: Finding the Right Structure Sources Templates Library BLAST SAP Template Alignment Source Template Alignment Remove Templates Templates

PSI-Coffee Homology Extension

Exploring The Template World

What is Homology Extension ? LL L ? -Simple scoring schemes result in alignment ambiguities

What is Homology Extension ? LL L L L L L L L L L I V I L L L L L L L Profile 1 Profile 2

What is Homology Extension ? LL L L L L L L L L L I V I L L L L L L L Profile 1 Profile 2

PSI-Coffee: Homology Extension Sources Templates Library BLAST Template Alignment Source Template Alignment Remove Templates Templates Profile Aligner

Benchmarks

Do Benchmarks All Tell the same story? Based on

Method TemplateScoreComment ClustalW-2ProgressiveNO22.74 PRANKGapNO26.18Science2008 MAFFTIterativeNO26.18 MuscleIterativeNO31.37 ProbConsConsistencyNO40.80 ProbConsMonoPhasicNO37.53 T-CoffeeConsistencyNO42.30 M-Coffe4ConsistencyNO43.60 PSI-CoffeeConsistencyProfile53.71 PROMALConsistencyProfile55.08 PROMAL-3DConsistencyPDB D-CoffeeConsistencyPDB61.00Expresso Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).

Method TemplateScoreComment ClustalW-2ProgressiveNO22.74 PRANKGapNO26.18Science2008 MAFFTIterativeNO26.18 MuscleIterativeNO31.37 ProbConsConsistencyNO40.80 ProbConsMonoPhasicNO37.53 T-CoffeeConsistencyNO42.30 M-Coffe4ConsistencyNO43.60 PSI-CoffeeConsistencyProfile53.71 PROMALConsistencyProfile55.08 PROMAL-3DConsistencyPDB D-CoffeeConsistencyPDB61.00Expresso Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase). Consistency

Method TemplateScoreComment ClustalW-2ProgressiveNO22.74 PRANKGapNO26.18Science2008 MAFFTIterativeNO26.18 MuscleIterativeNO31.37 ProbConsConsistencyNO40.80 ProbConsMonoPhasicNO37.53 T-CoffeeConsistencyNO42.30 M-Coffe4ConsistencyNO43.60 PSI-CoffeeConsistencyProfile53.71 PROMALConsistencyProfile55.08 PROMAL-3DConsistencyPDB D-CoffeeConsistencyPDB61.00Expresso Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase). Homology Extension

Method TemplateScoreComment ClustalW-2ProgressiveNO22.74 PRANKGapNO26.18Science2008 MAFFTIterativeNO26.18 MuscleIterativeNO31.37 ProbConsConsistencyNO40.80 ProbConsMonoPhasicNO37.53 T-CoffeeConsistencyNO42.30 M-Coffe4ConsistencyNO43.60 PSI-CoffeeConsistencyProfile53.71 PROMALConsistencyProfile55.08 PROMAL-3DConsistencyPDB D-CoffeeConsistencyPDB61.00Expresso Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase). Structural Extension

T-Coffee and The World BLAST/ SOAP -Some Templates are obtained with a BLAST -Queries can be sent to the EBI or the NCBI -No Need for a Local BLAST installation Users sequences