Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file.

Multiple Sequence Alignment (MSA) and Phylogeny

One of the options to get multiple sequence Fasta file

MSA input: multiple sequence Fasta file >gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus] MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI

Clustal X

Step1: Load the sequences

Uploaded sequences A little unclear…

Edit Fasta headers… MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI >Homo_sapiens_CD4 >Pan_troglodytes_CD4 >Sus_scrofa_CD4 >Rattus_norvegicus_CD4 <gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus]

Uploaded sequences Much better

Step2: Perform alignment

Multiple Sequence Alignment and conservation view

Step 3: Create tree

The Newick tree format is used to represent trees as strings CA D In Newick format: ((A,C),(B,D)); B Each pair of parenthesis () encloses a clade in the tree A comma “,” separates the members of the corresponding clade A semicolon “;” is always the last character

Step 4: View tree with NJPlot Note: unrooted tree

Step 4.5: defining an outgroup

Note: The order inside a split doesn’t matter Swapping nodes

Bootstrap

A. Resample (100-1000 time). 12345 100 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T 12345 100 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x 12345 100 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x 12345 100 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x

Bootstrap B. Reconstruct a tree from each data set. 12345 100 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x 12345 100 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x 12345 100 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4

C. We compute the majority rule consensus. Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% In 67% of the data sets, the split between SP1+SP2 and the rest of the tree was found. Bootstrap

Step 3.5 - Bootstrap

Bootstrap values on NJPlot Note: ClustalX saves trees with.ph extension. Trees with bootstrap are saved with.phb extension

Detecting selection forces using phlogeny (ConSeq, ConSurf, Selecton)

“Important” sites evolve slower unimportantones than “unimportant” ones.

Conservation = functional/structural importance

Use Phylogenetic information 1234567 HumanDMAAHAM ChimpDEAAGGC CowDQAAWAP FishDLAACAL S. cerevisiae DDGAFAA S. pombe DDGALGE AG A A A G AA A A G G

http://conseq.tau.ac.il Site-specific rate computation tool

Using ConSeq

ConSeq results

ConSeq conservation scores

Conservation scores:  The scores are standardized: the average score for all residues is zero, and the standard deviation is one  The lowest score represents the most conserved site in the protein negative values: slowly evolving (= low evolutionary rate), conserved sites negative values: slowly evolving (= low evolutionary rate), conserved sites  The highest score represents the most variable site in the protein positive values: rapidly evolving (= fast evolutionary rate), variable sites positive values: rapidly evolving (= fast evolutionary rate), variable sites  Scores are relative to the protein. Scores of different proteins are incomparable !!!

ConSeq results

Color-coded results

Protein core: structurally constrained - usually conserved Active site: functionally constrained - usually conserved Surface loops: tolerant to mutations - usually variable Hydrophobic core Surface loops Conservation in the structure Active site

Color-coded results

http://consurf.tau.ac.il Same algorithm as ConSeq, but here the results are projected onto the 3D structure of the protein

Using ConSurf

ConSurf results

First Glance in Jmol visual presentation

ConSeq/ConSurf user intervention (advanced options) ConSeq/ConSurf user intervention (advanced options) 1. Choosing the method for calculating the amino-acid conservation scores: (Bayesian/Max Likelihood) 2. Entering your own MSA file 3. Performing the MSA using: (MUSCLE/CLUSTALW) 4. Collecting the homologs from: (SWISS-PROT/UniProt) 5. Max. number of homologs: (50) 6. No. of PSI-BLAST iterations: (1) 7. PSI-BLAST 3-value cutoff: (0.001 ) 8. Model of substitution for proteins: (JTT/Dayhoff/mtREV/cpREV/WAG) 9. Entering your own PDB file 10. Entering your own TREE file

Solution – look at the DNA Purifying selection Syn > Non-syn Adaptive evolution = Positive selection Non-syn > Syn Neutral selection Syn = Non-syn

Selection score (Ka/Ks) < 1purifying selection Selection score (Ka/Ks) > 1positive selection Selection score (Ka/Ks) = 1no selection Ka/Ks also known as… (or dn/ds, or ω) Non- synonymous mutation rate Synonymous mutation rate

http://selecton.tau.ac.il

Selecton input  The user must provide the sequences – no psi-blast option  Coding sequences  Only ORF  No stop codons  If an MSA is provided it must be codon aligned (RevTrans) (RevTrans) Codon-level sequences !!!

Comparing H 0 and H 1 in Selecton H0H0 H1H1

Solution: statistics helps us to compare between hypotheses  H 0 : There’s no positive selection  H 1 : There is positive selection  H 0 : compute the probability (likelihood) of the data using a model that does not account for positive selection  H 1 : compute the probability (likelihood) of the data using a model that does account for positive selection  Perform a statistical test to accept or reject H 0 (likelihood ratio test) P-value a > 0.05 ( a ) accept H 0 a < 0.05 ( a ) reject H 0

Comparing H 0 and H 1 in Selecton

Selecton results:

Results Humanrhesus swaps at sites 332, 335-340 confer human resistance to HIV and rhesus resistance to SIV

Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file.

Similar presentations

Presentation on theme: "Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file.

Similar presentations

Presentation on theme: "Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file."— Presentation transcript:

Similar presentations

About project

Feedback