Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file.

Similar presentations


Presentation on theme: "Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file."— Presentation transcript:

1 Multiple Sequence Alignment (MSA) and Phylogeny

2 One of the options to get multiple sequence Fasta file

3

4 MSA input: multiple sequence Fasta file >gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus] MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI

5 Clustal X

6 Step1: Load the sequences

7 Uploaded sequences A little unclear…

8 Edit Fasta headers… MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI >Homo_sapiens_CD4 >Pan_troglodytes_CD4 >Sus_scrofa_CD4 >Rattus_norvegicus_CD4 <gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus]

9 Uploaded sequences Much better

10 Step2: Perform alignment

11 Multiple Sequence Alignment and conservation view

12 Step 3: Create tree

13 The Newick tree format is used to represent trees as strings CA D In Newick format: ((A,C),(B,D)); B Each pair of parenthesis () encloses a clade in the tree A comma “,” separates the members of the corresponding clade A semicolon “;” is always the last character

14 Step 4: View tree with NJPlot Note: unrooted tree

15 Step 4.5: defining an outgroup

16 Note: The order inside a split doesn’t matter Swapping nodes

17 Bootstrap

18 A. Resample (100-1000 time). 12345 100 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T 12345 100 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x 12345 100 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x 12345 100 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x

19 Bootstrap B. Reconstruct a tree from each data set. 12345 100 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x 12345 100 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x 12345 100 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4

20 C. We compute the majority rule consensus. Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% In 67% of the data sets, the split between SP1+SP2 and the rest of the tree was found. Bootstrap

21 Step 3.5 - Bootstrap

22 Bootstrap values on NJPlot Note: ClustalX saves trees with.ph extension. Trees with bootstrap are saved with.phb extension

23 Detecting selection forces using phlogeny (ConSeq, ConSurf, Selecton)

24 “Important” sites evolve slower unimportantones than “unimportant” ones.

25 Conservation = functional/structural importance

26 Use Phylogenetic information 1234567 HumanDMAAHAM ChimpDEAAGGC CowDQAAWAP FishDLAACAL S. cerevisiae DDGAFAA S. pombe DDGALGE AG A A A G AA A A G G

27 http://conseq.tau.ac.il Site-specific rate computation tool

28 Using ConSeq

29 ConSeq results

30 ConSeq conservation scores

31 Conservation scores:  The scores are standardized: the average score for all residues is zero, and the standard deviation is one  The lowest score represents the most conserved site in the protein negative values: slowly evolving (= low evolutionary rate), conserved sites negative values: slowly evolving (= low evolutionary rate), conserved sites  The highest score represents the most variable site in the protein positive values: rapidly evolving (= fast evolutionary rate), variable sites positive values: rapidly evolving (= fast evolutionary rate), variable sites  Scores are relative to the protein. Scores of different proteins are incomparable !!!

32 ConSeq results

33 Color-coded results

34 Protein core: structurally constrained - usually conserved Active site: functionally constrained - usually conserved Surface loops: tolerant to mutations - usually variable Hydrophobic core Surface loops Conservation in the structure Active site

35 Color-coded results

36 http://consurf.tau.ac.il Same algorithm as ConSeq, but here the results are projected onto the 3D structure of the protein

37 Using ConSurf

38 ConSurf results

39 First Glance in Jmol visual presentation

40 ConSeq/ConSurf user intervention (advanced options) ConSeq/ConSurf user intervention (advanced options) 1. Choosing the method for calculating the amino-acid conservation scores: (Bayesian/Max Likelihood) 2. Entering your own MSA file 3. Performing the MSA using: (MUSCLE/CLUSTALW) 4. Collecting the homologs from: (SWISS-PROT/UniProt) 5. Max. number of homologs: (50) 6. No. of PSI-BLAST iterations: (1) 7. PSI-BLAST 3-value cutoff: (0.001 ) 8. Model of substitution for proteins: (JTT/Dayhoff/mtREV/cpREV/WAG) 9. Entering your own PDB file 10. Entering your own TREE file

41 Solution – look at the DNA Purifying selection Syn > Non-syn Adaptive evolution = Positive selection Non-syn > Syn Neutral selection Syn = Non-syn

42 Selection score (Ka/Ks) < 1purifying selection Selection score (Ka/Ks) > 1positive selection Selection score (Ka/Ks) = 1no selection Ka/Ks also known as… (or dn/ds, or ω) Non- synonymous mutation rate Synonymous mutation rate

43 http://selecton.tau.ac.il

44 Selecton input  The user must provide the sequences – no psi-blast option  Coding sequences  Only ORF  No stop codons  If an MSA is provided it must be codon aligned (RevTrans) (RevTrans) Codon-level sequences !!!

45 Comparing H 0 and H 1 in Selecton H0H0 H1H1

46 Solution: statistics helps us to compare between hypotheses  H 0 : There’s no positive selection  H 1 : There is positive selection  H 0 : compute the probability (likelihood) of the data using a model that does not account for positive selection  H 1 : compute the probability (likelihood) of the data using a model that does account for positive selection  Perform a statistical test to accept or reject H 0 (likelihood ratio test) P-value a > 0.05 ( a ) accept H 0 a < 0.05 ( a ) reject H 0

47 Comparing H 0 and H 1 in Selecton

48

49 Selecton results:

50 Results Humanrhesus swaps at sites 332, 335-340 confer human resistance to HIV and rhesus resistance to SIV


Download ppt "Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file."

Similar presentations


Ads by Google