Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright OpenHelix. No use or reproduction without express written consent1.

Similar presentations


Presentation on theme: "Copyright OpenHelix. No use or reproduction without express written consent1."— Presentation transcript:

1 Copyright OpenHelix. No use or reproduction without express written consent1

2 ClustalW using EBI Toolbox Version 1 An Introduction to Multiple Sequence Alignments (MSA) using the alignment program ClustalW2 at the EBI Toolbox site Materials prepared by: Steffen Schmidt, Ph.D. and Warren C. Lathe III, Ph.D. www.openhelix.com Updated: Q2 2011

3 Copyright OpenHelix. No use or reproduction without express written consent3 ClustalW Using EBI Interface Agenda Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises Copyright OpenHelix. No use or reproduction without express written consent3 ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2

4 Copyright OpenHelix. No use or reproduction without express written consent4 ClustalW Introduction Multiple sequence alignments (MSA) are the basis of many bioinformatics analyses molecular evolutionary analysis (phylogenetic trees) find functionally important positions in a sequence family prediction of secondary and tertiary structure of proteins Creation of a “correct” MSA is difficult automatic tools often can be improved by human intervention Copyright OpenHelix. No use or reproduction without express written consent4 MyoD from UniProt smart.embl.de PDB MyoD

5 Copyright OpenHelix. No use or reproduction without express written consent5 Literature and Software Sources Copyright OpenHelix. No use or reproduction without express written consent5

6 6 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent6 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2

7 Copyright OpenHelix. No use or reproduction without express written consent7 Row – a sequence (protein or nucleotide) Column – “equivalent” positions in different sequences gaps can be introduced to slide amino acids to the “correct” position Theory: Multiple Sequence Alignment (MSA) Copyright OpenHelix. No use or reproduction without express written consent7 “equivalent ” sequences gaps

8 Copyright OpenHelix. No use or reproduction without express written consent8 “equivalent” positions means “evolutionarily related” what is the evolutionary history of the sequences in the alignment? how can the alignment be explained by a set of amino acid / nucleotide substitutions, insertions, and deletions? Theory: Problem we only know the sequences of today we need to make assumptions about the past Copyright OpenHelix. No use or reproduction without express written consent8

9 9 Theory: Parsimony Parsimony: the simplest explanation is the best penalize events like insertion / deletions Copyright OpenHelix. No use or reproduction without express written consent9

10 10 Theory: Scoring Matrix substitution of similar amino acids is more likely Copyright OpenHelix. No use or reproduction without express written consent10 Serine AG(C/T), TC(N) Threonine AC(N) Tryptophan TGG probability of substitution Serine AG(C/T), TC(N) Threonine AC(N) Tryptophan TGG Serinefrequent rare Threoninefrequent rare Tryptophanrare

11 Copyright OpenHelix. No use or reproduction without express written consent11 Theory: Substitution or Scoring Matrix scoring matrix contains two kind of probabilities how often an amino acid occurs at random (diagonal) how often a substitution occurs (derived from actual alignments) Copyright OpenHelix. No use or reproduction without express written consent11 (positive values – more common, negative values – less likely) observed frequency of amino acid substitution expected frequency of both amino acids Score = log 2

12 Copyright OpenHelix. No use or reproduction without express written consent12 multiple sequence alignments computationally too intensive need for “shortcuts” pairwise sequence alignments scoring matrix gap penalties two kinds of pairwise sequence alignments Theory: Pairwise Alignment Copyright OpenHelix. No use or reproduction without express written consent12 global MACMYFASTCAT ---MYFA-TCTT localMACMYFASTCAT- M---YFA-TC-TT

13 Copyright OpenHelix. No use or reproduction without express written consent13 progressively assemble alignment guided by the tree create phylogentic tree / guided tree pairwise alignment of all sequences against all ClustalW Algorithm Overview Copyright OpenHelix. No use or reproduction without express written consent13 1212 1414 2424 2323 1313 3434 13241324 progessive alignment

14 Copyright OpenHelix. No use or reproduction without express written consent14 ClustalW Algorithm: Pairwise alignment pairwise alignment of all sequences against all aligning the complete sequences (global alignment) uses scoring matrices to score similarity two types of gap penalties - gap opening & gap extension Copyright OpenHelix. No use or reproduction without express written consent14

15 Copyright OpenHelix. No use or reproduction without express written consent15 create phylogentic tree / guide tree using the pairwise distance matrix computed above neighbor-joining ClustalW Algorithm: Guided Tree pairwise alignment of all sequences against all Copyright OpenHelix. No use or reproduction without express written consent15

16 Copyright OpenHelix. No use or reproduction without express written consent16 ClustalW Algorithm: Assembly progressively assemble alignment guided by the tree each alignment is analyzed to build a profile which is then merged with profile of the other branch gaps introduced in an alignment step before will be kept gap penalties will be varied depending on: - sequence similarity - neighboring amino acid (individual scores) - hydrophilic stretches (prone for gaps) - previous gaps (extension allowed, new gaps penalized) scoring matrix varies depending on the estimated divergence Copyright OpenHelix. No use or reproduction without express written consent16 pairwise alignment of all sequences against all create phylogentic tree / guided tree

17 Copyright OpenHelix. No use or reproduction without express written consent17 ClustalW2: Improvements ClustalW2 now allows option on tree program neighbor joining (more accurate) UPGMA (faster, less accurate) ClustalW2 refinement removing each sequence and re-aligns them, and test if this alignment is better. Two possibilities: a) “alignment”: aligning to complete alignment (faster) b) “tree”: aligning to each step of alignment (more accurate) Copyright OpenHelix. No use or reproduction without express written consent17

18 Copyright OpenHelix. No use or reproduction without express written consent18 ClustalW: Summary ClustalW a “progressive multiple alignment method” uses global pairwise alignments to create a phylogenetic tree stepwise assembly of the MSA by the tree Drawback: method heavily depends on the initial tree no guarantee that this tree is correct misaligned regions can’t be corrected later You need to critically look at your alignment Copyright OpenHelix. No use or reproduction without express written consent18

19 Copyright OpenHelix. No use or reproduction without express written consent19 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent19 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2

20 Copyright OpenHelix. No use or reproduction without express written consent20 EBI Toolbox Overview Copyright OpenHelix. No use or reproduction without express written consent20 http://www.ebi.ac.uk/ Sequence Analysis

21 Copyright OpenHelix. No use or reproduction without express written consent21 EBI Toolbox for Sequence Analysis Copyright OpenHelix. No use or reproduction without express written consent21 ClustalW2

22 Copyright OpenHelix. No use or reproduction without express written consent22 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent22 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2

23 Copyright OpenHelix. No use or reproduction without express written consent23 ClustalW2 Overview Copyright OpenHelix. No use or reproduction without express written consent23 Submit upload file

24 Copyright OpenHelix. No use or reproduction without express written consent24 ClustalW2 sample query Copyright OpenHelix. No use or reproduction without express written consent24 paste sequences >P02647|APOA1_HUMAN Apolipoprotein A-I precursor - Homo sapiens MKAAVLTLAVLFLTGSQARHFWQQDEPPQSPWDRVKDLATVYVDVLKDSGRDYVSQFEGS ALGKQLNLKLLDNWDSVTSTFSKLREQLGPVTQEFWDNLEKETEGLRQEMSKDLEEVKAK VQPYLDDFQKKWQEEMELYRQKVEPLRAELQEGARQKLHELQEKLSPLGEEMRDRARAHV DALRTHLAPYSDELRQRLAARLEALKENGGARLAEYHAKATEHLSTLSEKAKPALEDLRQ GLLPVLESFKVSFLSALEEYTKKLNTQ >Q00623|APOA1_MOUSE Apolipoprotein A-I precursor - Mus musculus MKAVVLAVALVFLTGSQAWHVWQQDEPQSQWDKVKDFANVYVDAVKDSGRDYVSQFESSS LGQQLNLNLLENWDTLGSTVSQLQERLGPLTRDFWDNLEKETDWVRQEMNKDLEEVKQKV QPYLDEFQKKWKEDVELYRQKVAPLGAELQESARQKLQELQGRLSPVAEEFRDRMRTHVD SLRTQLAPHSEQMRESLAQRLAELKSNPTLNEYHTRAKTHLKTLGEKARPALEDLRHSLM PMLETLKTKAQSVIDKASETLTAQ >Q9Z2L4|APOA1_MESAU Apolipoprotein A-I precursor - Mesocricetus auratus MKTVVLAVAVLFLTGSQARHFWQRDDPQTPWDRVKDFATVYVDAVKDSGREYVSQFETSA LGKQLNLNLLENWDTLGSTVGRLQEQLGPVTQEFWDNLEKETEWLRREMNKDLEEVKAKV QPYLDQFQTKWQEEVALYRQKMEPLGAELRDGARQKLQELQEKLTPLGEDLRDRMRHHVD ALRTKMTPYSDQMRDRLAERLAQLKDSPTLAEYHTKAADHLKAFGEKAKPALEDLRQGLM PVFESFKTRIMSMVEEASKKLNAQ >P08250|APOA1_CHICK Apolipoprotein A-I precursor - Gallus gallus MRGVLVTLAVLFLTGTQARSFWQHDEPQTPLDRIRDMVDVYLETVKASGKDAIAQFESSA VGKQLDLKLADNLDTLSAAAAKLREDMAPYYKEVREMWLKDTEALRAELTKDLEEVKEKI RPFLDQFSAKWTEELEQYRQRLTPVAQELKELTKQKVELMQAKLTPVAEEARDRLRGHVE ELRKNLAPYSDELRQKLSQKLEEIREKGIPQASEYQAKVMEQLSNLREKMTPLVQEFRER LTPYAENLKNRLISFLDELQKSVA

25 Copyright OpenHelix. No use or reproduction without express written consent25 ClustalW2 Alignment Method Copyright OpenHelix. No use or reproduction without express written consent25 alignment method

26 Copyright OpenHelix. No use or reproduction without express written consent26 Aligning Sequences: Fine-Tuning Slow Alignment Copyright OpenHelix. No use or reproduction without express written consent26 options Fast

27 Copyright OpenHelix. No use or reproduction without express written consent27 Aligning Sequences: Fine-Tuning Fast Alignment Copyright OpenHelix. No use or reproduction without express written consent27 Step 3 options

28 Copyright OpenHelix. No use or reproduction without express written consent28 Aligning Sequences: Scoring Parameters Copyright OpenHelix. No use or reproduction without express written consent28

29 Copyright OpenHelix. No use or reproduction without express written consent29 Aligning Sequences: Iteration Parameters Copyright OpenHelix. No use or reproduction without express written consent29

30 Copyright OpenHelix. No use or reproduction without express written consent30 Aligning Sequences: Output Format & Clustering Copyright OpenHelix. No use or reproduction without express written consent30

31 Copyright OpenHelix. No use or reproduction without express written consent31 ClustalW2 General Parameters Copyright OpenHelix. No use or reproduction without express written consent31

32 Copyright OpenHelix. No use or reproduction without express written consent32 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent32 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2

33 Copyright OpenHelix. No use or reproduction without express written consent33 ClustalW2 Alignment Copyright OpenHelix. No use or reproduction without express written consent33 AVFPMILW RED Small (small+ hydrophobic (incl. aromatic -Y)) DE BLUE Acidic RK MAGENTA Basic STYHCNGQ GREEN Hydroxyl + Amine + Basic - Q Others Gray * Asterisks are identical amino acids. : Colons are significantly conservative amino acids substitutions. Periods are amino acids substitutions that suggest some conservation conservation

34 Copyright OpenHelix. No use or reproduction without express written consent34 ClustalW2 Output Overview Copyright OpenHelix. No use or reproduction without express written consent34 output files scores

35 Copyright OpenHelix. No use or reproduction without express written consent35 Guide Tree and Cladogram Copyright OpenHelix. No use or reproduction without express written consent35 Right click for display options

36 Copyright OpenHelix. No use or reproduction without express written consent36 Submission Details Copyright OpenHelix. No use or reproduction without express written consent36 Input parameters

37 Copyright OpenHelix. No use or reproduction without express written consent37 Jalview Visualization Copyright OpenHelix. No use or reproduction without express written consent37 Jalview

38 Copyright OpenHelix. No use or reproduction without express written consent38 Jalview Overview Copyright OpenHelix. No use or reproduction without express written consent38 alignment conservation consensus quality position

39 Copyright OpenHelix. No use or reproduction without express written consent39 Jalview Editing: Deleting Copyright OpenHelix. No use or reproduction without express written consent39

40 Copyright OpenHelix. No use or reproduction without express written consent40 Jalview Editing: Sliding Sequences Copyright OpenHelix. No use or reproduction without express written consent40 shift “Q”

41 Copyright OpenHelix. No use or reproduction without express written consent41 Jalview Editing: Removing Columns Copyright OpenHelix. No use or reproduction without express written consent41

42 Copyright OpenHelix. No use or reproduction without express written consent42 Jalview: Saving Alignment Copyright OpenHelix. No use or reproduction without express written consent42

43 Copyright OpenHelix. No use or reproduction without express written consent43 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent43 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2

44 Copyright OpenHelix. No use or reproduction without express written consent44 ClustalW Summary Multiple sequence alignments examine relationships ClustalW at the EBI Tool Site Jalview: a multiple sequence alignment editor Copyright OpenHelix. No use or reproduction without express written consent44

45 Copyright OpenHelix. No use or reproduction without express written consent45 ClustalW Using EBI Interface Agenda Copyright OpenHelix. No use or reproduction without express written consent45 Introduction & Credits Background and Theory The EBI Toolbox Site Sequence alignment using ClustalW2 Viewing the multiple sequence alignment Summary Exercises ClustalW2: www.clustal.orgwww.clustal.org ClustalW2 EBI Toolbox: www.ebi.ac.uk/Tools/clustalw2www.ebi.ac.uk/Tools/clustalw2

46 Copyright OpenHelix. No use or reproduction without express written consent46


Download ppt "Copyright OpenHelix. No use or reproduction without express written consent1."

Similar presentations


Ads by Google