Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.

Similar presentations


Presentation on theme: "1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein."— Presentation transcript:

1 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein from a model organism. We know what it does but we do not know who does the same in human? A protein related to a disease We have no idea what it does in relation to the disease

2 retinol-binding protein odorant-binding protein apolipoprotein D

3 RBP4 and obesity retinol-binding protein odorant-binding protein apolipoprotein D

4 Scoring matrices let you focus on the big (or small) picture retinol-binding protein retinol-binding protein PAM250 PAM30 Blosum45 Blosum80

5 PSI-BLAST generates scoring matrices more powerful than PAM or BLOSUM retinol-binding protein retinol-binding protein

6 Phylogenetic trees

7 7 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features between organisms. Today, molecular sequence data are mainly used for phylogenetic analyses. One tree of life A sketch Darwin made soon after returning from his voyage on HMS Beagle (1831–36) showed his thinking about the diversification of species from a single stock (see Figure, overleaf). This branching, extended by the concept of common descent, Phylogeny in Greek =the origin of the tribe

8 8 Haeckel (1879)Pace (2001)

9 9 Molecular phylogeny uses trees to depict evolutionary relationships among organisms. These trees are based upon DNA and protein sequence data Human Chimpanzee Gorilla Orangutan Gorilla Chimpanzee Orangutan Human Molecular analysis: Chimpanzee is related more closely to human than the gorilla Pre-Molecular analysis: The great apes (chimpanzee, Gorilla & orangutan) Separate from the human

10 10 What can we learn from phylogenetics tree?

11 Was the extinct quagga more like a zebra or a horse? Determine the closest relatives of one organism in which we are interested

12 12 Which species are closest to Human? Human Chimpanzee Gorilla Orangut an Gorilla Chimpanzee Orangutan Human

13 13 Human Evolution Modern Man Neanderthals

14 14 Example Metagenomics A new field in genomics aims the study the genomes recovered from environmental samples. A powerful tool to access the wealthy biodiversity of native environmental samples Help to find the relationship between the species and identify new species

15 10 6 cells/ ml seawater 10 7 virus particles/ ml seawater >99% uncultivated microbes How can we discover new species in the ocean?

16 16 Relationships can be represented by Phylogenetic Tree or Dendrogram A B C D E F

17 17 Phylogenetic Tree Terminology Graph composed of nodes & branches Each branch connects two adjacent nodes A B C D E F R

18 18 Rooted tree Human Chimp Chicken Gorilla Human Chimp Chicken Gorilla Un-rooted tree Phylogenetic Tree Terminology

19 19 Rooted vs. unrooted trees 1 2 3 31 2

20 20 How can we build a tree with molecular data? -Trees based on DNA sequence (rRNA) -Trees based on Protein sequences

21 Basic algorithm for constructing a rooted tree Unweighted Pair Group Method using Arithmetic Averages (UPGMA) Assumption: Divergence of sequences is assumed to occur at a constant rate  Distance to root is equal Sequence a ACGCGTTGGGCGATGGCAAC Sequence b ACGCGTTGGGCGACGGTAAT Sequence c ACGCATTGAATGATGATAAT Sequence d ACACATTGAGTGTGATAATA abcd

22 22 abcd a0875 b8039 c7308 d5980 Moving from Similarity to Distance Distance Table Sequence a ACGCGTTGGGCGATGGCAAC Sequence b ACACATTGAGTGTGATCAAC Sequence c ACACATTGAGTGAGGACAAC Sequence d ACGCGTTGGGCGACGGTAAT Distances * Sequences Dab = 8 Dac = 7 Dad = 5 Dbc = 3 Dbd = 9 Dcd = 8 * Can be calculated using different distance metrics

23 23 abcd a0875 b8039 c7308 d5980 a d c b Step 1:Choose the nodes with the shortest distance and fuse them. Constructing a tree starting from a STAR model

24 24 a Step 2: recalculate the distance between the rest of the remaining sequences (a and d) to the new node (e) and remove the fused nodesfrom the table. d c,b e a ade a056 d507 e670 D (ea) = (D(ac)+ D(ab)-D(cb))/2 D (ed) = (D(dc)+ D(db)-D(cb))/2 abcd a0875 b8039 c7308 d5980

25 25 !!!The distances Dce and Dde are calculated assuming constant rate evolution d c e a ade a056 d507 e670 b D ce D de Step 3: In order to get a tree, un-fuse c and b by calculating their distance to the new node (e)

26 26 a a,d c e ade a056 d507 e670 b D ce D de f Next… We want to fuse the next closest nodes

27 27 a c e fe f04 e40 b D af D de f d D ce D bf Finally D (ef) = (D(ea)+ D(ed)-D(ad))/2 We need to calculate the distance between e and f

28 28 a d c b acbd f e From a Star to a tree

29 29 IMPORTANT !!! Usually we don’t assume a constant mutation rate and in order to choose the nodes to fuse we have to calculate the relative distance of each node to all other nodes. Neighbor Joining (NJ)- is an algorithm which is suitable to cases when the rate of evolution varies

30 30 Human Evolution Tree Neighbor Joining UPGMA

31 The down side of phylogenetic trees - Using different regions from a same alignment may produce different trees.

32 Problems with phylogenetic trees

33 Bacillus E.coli Pseudomonas Salmonella Aeromonas Lechevaliera Burkholderias Problems with phylogenetic trees

34 What to do ?

35 35 A.We create new data sets by sampling N positions with replacement. B.We generate 100 - 1000 such pseudo-data sets. C.For each such data set we reconstruct a tree, using the same method. D.We note the agreement between the tree reconstructed from the pseudo-data set to the original tree. Note: we do not change the number of sequences ! Bootstrapping

36 Bootstrapped tree Less reliable Branch Highly reliable branch

37 37 Open Questions Do DNA and proteins from the same gene produce different trees ? Can different genes have different evolutionary history ?

38 38


Download ppt "1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein."

Similar presentations


Ads by Google