Presentation is loading. Please wait.

Presentation is loading. Please wait.

"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.

Similar presentations


Presentation on theme: ""Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky."— Presentation transcript:

1 "Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky

2 Homology bird wing bat wing human arm by Bob Friedman

3 homology vs analogy Homology (shared ancestry) versus Analogy (convergent evolution) A priori sequences could be similar due to convergent evolution bird wing butterfly wing bat wingfly wing

4 Related proteins Present day proteins evolved through substitution and selection from ancestral proteins. Related proteins have similar sequence AND similar structure AND similar function. In the above mantra "similar function" can refer to: identical function, similar function, e.g.: identical reactions catalyzed in different organisms; or same catalytic mechanism but different substrate (malic and lactic acid dehydrogenases); similar subunits and domains that are brought together through a (hypothetical) process called domain shuffling, e.g. nucleotide binding domains in hexokinse, myosin, HSP70, and ATPsynthases.

5 homology Two sequences are homologous, if there existed an ancestral molecule in the past that is ancestral to both of the sequences Homology is a "yes" or "no" character (don't know is also possible). Either sequences (or characters share ancestry or they don't (like pregnancy). Molecular biologist often use homology as synonymous with similarity of percent identity. One often reads: sequence A and B are 70% homologous. To an evolutionary biologist this sounds as wrong as 70% pregnant. Types of Homology Orthology: bifurcation in molecular tree reflects speciation Paralogy: bifurcation in molecular tree reflects gene duplication

6 Sequence Similarity vs Homology The following is based on observation and not on an a priori truth: If two (complex) sequences show significant similarity in their primary sequence, they have shared ancestry, and probably similar function. (although some proteins acquired radically new functional assignments, lysozyme -> lense crystalline).

7 The Size of Protein Sequence Space (back of the envelope calculation) For comparison the universe contains only about 10 89 protons and has an age of about 5*10 17 seconds or 5*10 29 picoseconds. If every proton in the universe were a super computer that explored one possible protein sequence per picosecond, we only would have explored 5*10 118 sequences, i.e. a negligible fraction of the possible sequences with length 600 (one in about 10 662 ). Consider a protein of 600 amino acids. Assume that for every position there could be any of the twenty possible amino acid. Then the total number of possibilities is 20 choices for the first position times 20 for the second position times 20 to the third.... = 20 to the 600 = 4*10 780 different proteins possible with lengths of 600 amino acids.

8 no similarity vs no homology If two (complex) sequences show significant similarity in their primary sequence, they have shared ancestry, and probably similar function. THE REVERSE IS NOT TRUE: PROTEINS WITH THE SAME OR SIMILAR FUNCTION DO NOT ALWAYS SHOW SIGNIFICANT SEQUENCE SIMILARITY for one of two reasons: a) they evolved independently (e.g. different types of nucleotide binding sites); or b) they underwent so many substitution events that there is no readily detectable similarity remaining. Corollary: PROTEINS WITH SHARED ANCESTRY DO NOT ALWAYS SHOW SIGNIFICANT SIMILARITY.

9 homology Two sequences are homologous, if there existed an ancestral molecule in the past that is ancestral to both of the sequences Types of Homology Orthologs: “deepest” bifurcation in molecular tree reflects speciation. These are the molecules people interested in the taxonomic classification of organisms want to study. Paralogs: “deepest” bifurcation in molecular tree reflects gene duplication. The study of paralogs and their distribution in genomes provides clues on the way genomes evolved. Gen and genome duplication have emerged as the most important pathway to molecular innovation, including the evolution of developmental pathways. Xenologs: gene was obtained by organism through horizontal transfer. The classic example for Xenologs are antibiotic resistance genes, but the history of many other molecules also fits into this category: inteins, selfsplicing introns, transposable elements, ion pumps, other transporters, Synologs: genes ended up in one organism through fusion of lineages. The paradigm are genes that were transferred into the eukaryotic cell together with the endosymbionts that evolved into mitochondria and plastids (the -logs are often spelled with "ue" like in orthologues) see Fitch's article in TIG 2000 for more discussion.TIG 2000

10 Ways to construct Protein Space Construction of sequence space from (Eigen et al. 1988) illustrating the construction of a high dimensional sequence space. Each additional sequence position adds another dimension, doubling the diagram for the shorter sequence. Shown is the progression from a single sequence position (line) to a tetramer (hypercube). A four (or twenty) letter code can be accommodated either through allowing four (or twenty) values for each dimension (Rechenberg 1973; Casari et al. 1995), or through additional dimensions (Eigen and Winkler-Oswatitsch 1992). Eigen, M. and R. Winkler-Oswatitsch (1992). Steps Towards Life: A Perspective on Evolution. Oxford; New York, Oxford University Press. Eigen, M., R. Winkler-Oswatitsch and A. Dress (1988). "Statistical geometry in sequence space: a method of quantitative comparative sequence analysis." Proc Natl Acad Sci U S A 85(16): 5913-7 Casari, G., C. Sander and A. Valencia (1995). "A method to predict functional residues in proteins." Nat Struct Biol 2(2): 171-8 Rechenberg, I. (1973). Evolutionsstrategie; Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Stuttgart-Bad Cannstatt, Frommann-Holzboog.

11 Diversion: From Multidimensional Sequence Space to Fractals

12 one symbol -> 1D coordinate of dimension = pattern length

13 Two symbols -> Dimension = length of pattern length 1 = 1D:

14 Two symbols -> Dimension = length of pattern length 2 = 2D: dimensions correspond to position For each dimension two possibiities Note: Here is a possible bifurcation: a larger alphabet could be represented as more choices along the axis of position!

15 Two symbols -> Dimension = length of pattern length 3 = 3D:

16 Two symbols -> Dimension = length of pattern length 4 = 4D: aka Hypercube

17 Two symbols -> Dimension = length of pattern

18 Three Symbols (the other fork)

19 Four Symbols: I.e.: with an alphabet of 4, we have a hypercube (4D) already with a pattern size of 2, provided we stick to a binary pattern in each dimension.

20 hypercubes at 2 and 4 alphabets 2 character alphabet, pattern size 4 4 character alphabet, pattern size 2

21 Three Symbols Alphabet suggests fractal representation

22 3 fractal enlarge fill in outer pattern repeats inner pattern = self similar = fractal

23 3 character alphabet 3 pattern fractal

24 3 character alphapet 4 pattern fractal Conjecture: For n -> infinity, the fractal midght fill a 2D triangle Note: check Mandelbrot

25 Same for 4 character alphabet 1 position 2 positions 3 positions

26 4 character alphabet continued (with cheating I didn’t actually add beads) 4 positions

27 4 character alphabet continued (with cheating I didn’t actually add beads) 5 positions

28 4 character alphabet continued (with cheating I didn’t actually add beads) 6 positions

29 4 character alphabet continued (with cheating I didn’t actually add beads) 7 positions

30 Animated GIf 1-12 positions

31 Protein Space in JalView

32 Alignment of V F A ATPase ATP binding SU (catalytic and non- catalytic SU)

33 UPGMA tree of V F A ATPase ATP binding SU with line dropped to partition (and colour) the 4 SU types (VA cat and non cat, F cat and non cat). Note that details of the tree $%#&@.

34 PCA analysis of V F A ATPase ATP binding SU using colours from the UPGMA tree

35 Same PCA analysis of V F A ATPase ATP binding SU using colours from the UPGMA tree, but turned slightly. (Giardia A SU selected in grey.)

36 Same PCA analysis of V F A ATPase ATP binding SU Using colours from the UPGMA tree, but replacing the 1st with the 5th axis. (Eukaryotic A SU selected in grey.)

37 Same PCA analysis of V F A ATPase ATP binding SU Using colours from the UPGMA tree, but replacing the 1st with the 6th axis. (Eukaryotic B SU selected in grey - forgot rice.)

38 Problems Jalview’s approach requires an alignment - only homologous sequences can be depicted in the same space Solution: One could use pattern absence / presence as coordinates


Download ppt ""Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky."

Similar presentations


Ads by Google