Copyright OpenHelix. No use or reproduction without express written consent1.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Measuring the degree of similarity: PAM and blosum Matrix
Molecular Evolution Revised 29/12/06
A Hidden Markov Model for Progressive Multiple Alignment Ari Löytynoja and Michel C. Milinkovitch Appeared in BioInformatics, Vol 19, no.12, 2003 Presented.
Heuristic alignment algorithms and cost matrices
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Bioinformatics and Phylogenetic Analysis
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Sequence Analysis Tools
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
An Introduction to Bioinformatics
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Multiple Sequence Alignment School of B&I TCD May 2010.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Sequence Alignment and Database Searching.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
© Wiley Publishing All Rights Reserved. Building Multiple- Sequence Alignments.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Copyright OpenHelix. No use or reproduction without express written consent1.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Copyright OpenHelix. No use or reproduction without express written consent1.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
MUSCLE An Attractive MSA Application. Overview Some background on the MUSCLE software. The innovations and improvements of MUSCLE. The MUSCLE algorithm.
Copyright OpenHelix. No use or reproduction without express written consent1.
Manually Adjusting Multiple Alignments Chris Wilton.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Copyright OpenHelix. No use or reproduction without express written consent1.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Copyright OpenHelix. No use or reproduction without express written consent1.
Sequence Alignment.
Copyright OpenHelix. No use or reproduction without express written consent1.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Multiple Sequence Alignment Carlow IT Bioinformatics November 2006.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Multiple sequence alignment (msa)
Multiple Sequence Alignment
Adva Yeheskel Bioinformatics Unit, Tel Aviv University 8/5/2018
Sequence Based Analysis Tutorial
Presentation transcript:

Copyright OpenHelix. No use or reproduction without express written consent1

ClustalW using EBI Toolbox Version 1 An Introduction to Multiple Sequence Alignments (MSA) using the alignment program ClustalW2 at the EBI Toolbox site Materials prepared by: Steffen Schmidt, Ph.D. and Warren C. Lathe III, Ph.D. Updated: Q2 2011

Copyright OpenHelix. No use or reproduction without express written consent3 ClustalW Using EBI Interface Agenda Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises Copyright OpenHelix. No use or reproduction without express written consent3 ClustalW2: ClustalW2 EBI Toolbox:

Copyright OpenHelix. No use or reproduction without express written consent4 ClustalW Introduction Multiple sequence alignments (MSA) are the basis of many bioinformatics analyses molecular evolutionary analysis (phylogenetic trees) find functionally important positions in a sequence family prediction of secondary and tertiary structure of proteins Creation of a “correct” MSA is difficult automatic tools often can be improved by human intervention Copyright OpenHelix. No use or reproduction without express written consent4 MyoD from UniProt smart.embl.de PDB MyoD

Copyright OpenHelix. No use or reproduction without express written consent5 Literature and Software Sources Copyright OpenHelix. No use or reproduction without express written consent5

6 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent6 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: ClustalW2 EBI Toolbox:

Copyright OpenHelix. No use or reproduction without express written consent7 Row – a sequence (protein or nucleotide) Column – “equivalent” positions in different sequences gaps can be introduced to slide amino acids to the “correct” position Theory: Multiple Sequence Alignment (MSA) Copyright OpenHelix. No use or reproduction without express written consent7 “equivalent ” sequences gaps

Copyright OpenHelix. No use or reproduction without express written consent8 “equivalent” positions means “evolutionarily related” what is the evolutionary history of the sequences in the alignment? how can the alignment be explained by a set of amino acid / nucleotide substitutions, insertions, and deletions? Theory: Problem we only know the sequences of today we need to make assumptions about the past Copyright OpenHelix. No use or reproduction without express written consent8

9 Theory: Parsimony Parsimony: the simplest explanation is the best penalize events like insertion / deletions Copyright OpenHelix. No use or reproduction without express written consent9

10 Theory: Scoring Matrix substitution of similar amino acids is more likely Copyright OpenHelix. No use or reproduction without express written consent10 Serine AG(C/T), TC(N) Threonine AC(N) Tryptophan TGG probability of substitution Serine AG(C/T), TC(N) Threonine AC(N) Tryptophan TGG Serinefrequent rare Threoninefrequent rare Tryptophanrare

Copyright OpenHelix. No use or reproduction without express written consent11 Theory: Substitution or Scoring Matrix scoring matrix contains two kind of probabilities how often an amino acid occurs at random (diagonal) how often a substitution occurs (derived from actual alignments) Copyright OpenHelix. No use or reproduction without express written consent11 (positive values – more common, negative values – less likely) observed frequency of amino acid substitution expected frequency of both amino acids Score = log 2

Copyright OpenHelix. No use or reproduction without express written consent12 multiple sequence alignments computationally too intensive need for “shortcuts” pairwise sequence alignments scoring matrix gap penalties two kinds of pairwise sequence alignments Theory: Pairwise Alignment Copyright OpenHelix. No use or reproduction without express written consent12 global MACMYFASTCAT ---MYFA-TCTT localMACMYFASTCAT- M---YFA-TC-TT

Copyright OpenHelix. No use or reproduction without express written consent13 progressively assemble alignment guided by the tree create phylogentic tree / guided tree pairwise alignment of all sequences against all ClustalW Algorithm Overview Copyright OpenHelix. No use or reproduction without express written consent progessive alignment

Copyright OpenHelix. No use or reproduction without express written consent14 ClustalW Algorithm: Pairwise alignment pairwise alignment of all sequences against all aligning the complete sequences (global alignment) uses scoring matrices to score similarity two types of gap penalties - gap opening & gap extension Copyright OpenHelix. No use or reproduction without express written consent14

Copyright OpenHelix. No use or reproduction without express written consent15 create phylogentic tree / guide tree using the pairwise distance matrix computed above neighbor-joining ClustalW Algorithm: Guided Tree pairwise alignment of all sequences against all Copyright OpenHelix. No use or reproduction without express written consent15

Copyright OpenHelix. No use or reproduction without express written consent16 ClustalW Algorithm: Assembly progressively assemble alignment guided by the tree each alignment is analyzed to build a profile which is then merged with profile of the other branch gaps introduced in an alignment step before will be kept gap penalties will be varied depending on: - sequence similarity - neighboring amino acid (individual scores) - hydrophilic stretches (prone for gaps) - previous gaps (extension allowed, new gaps penalized) scoring matrix varies depending on the estimated divergence Copyright OpenHelix. No use or reproduction without express written consent16 pairwise alignment of all sequences against all create phylogentic tree / guided tree

Copyright OpenHelix. No use or reproduction without express written consent17 ClustalW2: Improvements ClustalW2 now allows option on tree program neighbor joining (more accurate) UPGMA (faster, less accurate) ClustalW2 refinement removing each sequence and re-aligns them, and test if this alignment is better. Two possibilities: a) “alignment”: aligning to complete alignment (faster) b) “tree”: aligning to each step of alignment (more accurate) Copyright OpenHelix. No use or reproduction without express written consent17

Copyright OpenHelix. No use or reproduction without express written consent18 ClustalW: Summary ClustalW a “progressive multiple alignment method” uses global pairwise alignments to create a phylogenetic tree stepwise assembly of the MSA by the tree Drawback: method heavily depends on the initial tree no guarantee that this tree is correct misaligned regions can’t be corrected later You need to critically look at your alignment Copyright OpenHelix. No use or reproduction without express written consent18

Copyright OpenHelix. No use or reproduction without express written consent19 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent19 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: ClustalW2 EBI Toolbox:

Copyright OpenHelix. No use or reproduction without express written consent20 EBI Toolbox Overview Copyright OpenHelix. No use or reproduction without express written consent20 Sequence Analysis

Copyright OpenHelix. No use or reproduction without express written consent21 EBI Toolbox for Sequence Analysis Copyright OpenHelix. No use or reproduction without express written consent21 ClustalW2

Copyright OpenHelix. No use or reproduction without express written consent22 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent22 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: ClustalW2 EBI Toolbox:

Copyright OpenHelix. No use or reproduction without express written consent23 ClustalW2 Overview Copyright OpenHelix. No use or reproduction without express written consent23 Submit upload file

Copyright OpenHelix. No use or reproduction without express written consent24 ClustalW2 sample query Copyright OpenHelix. No use or reproduction without express written consent24 paste sequences >P02647|APOA1_HUMAN Apolipoprotein A-I precursor - Homo sapiens MKAAVLTLAVLFLTGSQARHFWQQDEPPQSPWDRVKDLATVYVDVLKDSGRDYVSQFEGS ALGKQLNLKLLDNWDSVTSTFSKLREQLGPVTQEFWDNLEKETEGLRQEMSKDLEEVKAK VQPYLDDFQKKWQEEMELYRQKVEPLRAELQEGARQKLHELQEKLSPLGEEMRDRARAHV DALRTHLAPYSDELRQRLAARLEALKENGGARLAEYHAKATEHLSTLSEKAKPALEDLRQ GLLPVLESFKVSFLSALEEYTKKLNTQ >Q00623|APOA1_MOUSE Apolipoprotein A-I precursor - Mus musculus MKAVVLAVALVFLTGSQAWHVWQQDEPQSQWDKVKDFANVYVDAVKDSGRDYVSQFESSS LGQQLNLNLLENWDTLGSTVSQLQERLGPLTRDFWDNLEKETDWVRQEMNKDLEEVKQKV QPYLDEFQKKWKEDVELYRQKVAPLGAELQESARQKLQELQGRLSPVAEEFRDRMRTHVD SLRTQLAPHSEQMRESLAQRLAELKSNPTLNEYHTRAKTHLKTLGEKARPALEDLRHSLM PMLETLKTKAQSVIDKASETLTAQ >Q9Z2L4|APOA1_MESAU Apolipoprotein A-I precursor - Mesocricetus auratus MKTVVLAVAVLFLTGSQARHFWQRDDPQTPWDRVKDFATVYVDAVKDSGREYVSQFETSA LGKQLNLNLLENWDTLGSTVGRLQEQLGPVTQEFWDNLEKETEWLRREMNKDLEEVKAKV QPYLDQFQTKWQEEVALYRQKMEPLGAELRDGARQKLQELQEKLTPLGEDLRDRMRHHVD ALRTKMTPYSDQMRDRLAERLAQLKDSPTLAEYHTKAADHLKAFGEKAKPALEDLRQGLM PVFESFKTRIMSMVEEASKKLNAQ >P08250|APOA1_CHICK Apolipoprotein A-I precursor - Gallus gallus MRGVLVTLAVLFLTGTQARSFWQHDEPQTPLDRIRDMVDVYLETVKASGKDAIAQFESSA VGKQLDLKLADNLDTLSAAAAKLREDMAPYYKEVREMWLKDTEALRAELTKDLEEVKEKI RPFLDQFSAKWTEELEQYRQRLTPVAQELKELTKQKVELMQAKLTPVAEEARDRLRGHVE ELRKNLAPYSDELRQKLSQKLEEIREKGIPQASEYQAKVMEQLSNLREKMTPLVQEFRER LTPYAENLKNRLISFLDELQKSVA

Copyright OpenHelix. No use or reproduction without express written consent25 ClustalW2 Alignment Method Copyright OpenHelix. No use or reproduction without express written consent25 alignment method

Copyright OpenHelix. No use or reproduction without express written consent26 Aligning Sequences: Fine-Tuning Slow Alignment Copyright OpenHelix. No use or reproduction without express written consent26 options Fast

Copyright OpenHelix. No use or reproduction without express written consent27 Aligning Sequences: Fine-Tuning Fast Alignment Copyright OpenHelix. No use or reproduction without express written consent27 Step 3 options

Copyright OpenHelix. No use or reproduction without express written consent28 Aligning Sequences: Scoring Parameters Copyright OpenHelix. No use or reproduction without express written consent28

Copyright OpenHelix. No use or reproduction without express written consent29 Aligning Sequences: Iteration Parameters Copyright OpenHelix. No use or reproduction without express written consent29

Copyright OpenHelix. No use or reproduction without express written consent30 Aligning Sequences: Output Format & Clustering Copyright OpenHelix. No use or reproduction without express written consent30

Copyright OpenHelix. No use or reproduction without express written consent31 ClustalW2 General Parameters Copyright OpenHelix. No use or reproduction without express written consent31

Copyright OpenHelix. No use or reproduction without express written consent32 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent32 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: ClustalW2 EBI Toolbox:

Copyright OpenHelix. No use or reproduction without express written consent33 ClustalW2 Alignment Copyright OpenHelix. No use or reproduction without express written consent33 AVFPMILW RED Small (small+ hydrophobic (incl. aromatic -Y)) DE BLUE Acidic RK MAGENTA Basic STYHCNGQ GREEN Hydroxyl + Amine + Basic - Q Others Gray * Asterisks are identical amino acids. : Colons are significantly conservative amino acids substitutions. Periods are amino acids substitutions that suggest some conservation conservation

Copyright OpenHelix. No use or reproduction without express written consent34 ClustalW2 Output Overview Copyright OpenHelix. No use or reproduction without express written consent34 output files scores

Copyright OpenHelix. No use or reproduction without express written consent35 Guide Tree and Cladogram Copyright OpenHelix. No use or reproduction without express written consent35 Right click for display options

Copyright OpenHelix. No use or reproduction without express written consent36 Submission Details Copyright OpenHelix. No use or reproduction without express written consent36 Input parameters

Copyright OpenHelix. No use or reproduction without express written consent37 Jalview Visualization Copyright OpenHelix. No use or reproduction without express written consent37 Jalview

Copyright OpenHelix. No use or reproduction without express written consent38 Jalview Overview Copyright OpenHelix. No use or reproduction without express written consent38 alignment conservation consensus quality position

Copyright OpenHelix. No use or reproduction without express written consent39 Jalview Editing: Deleting Copyright OpenHelix. No use or reproduction without express written consent39

Copyright OpenHelix. No use or reproduction without express written consent40 Jalview Editing: Sliding Sequences Copyright OpenHelix. No use or reproduction without express written consent40 shift “Q”

Copyright OpenHelix. No use or reproduction without express written consent41 Jalview Editing: Removing Columns Copyright OpenHelix. No use or reproduction without express written consent41

Copyright OpenHelix. No use or reproduction without express written consent42 Jalview: Saving Alignment Copyright OpenHelix. No use or reproduction without express written consent42

Copyright OpenHelix. No use or reproduction without express written consent43 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent43 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: ClustalW2 EBI Toolbox:

Copyright OpenHelix. No use or reproduction without express written consent44 ClustalW Summary Multiple sequence alignments examine relationships ClustalW at the EBI Tool Site Jalview: a multiple sequence alignment editor Copyright OpenHelix. No use or reproduction without express written consent44

Copyright OpenHelix. No use or reproduction without express written consent45 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent45 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: ClustalW2 EBI Toolbox:

Copyright OpenHelix. No use or reproduction without express written consent46