Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4.

Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Isochore structure of vertebrate genomes Why do patterns of base composition – the frequencies of the four bases and of codons used to specify amino acids – differ between genomes? Mean G + C content in bacteria ranges from 25% to 75%, but there is little intragenome variation Genomes of vertebrates have a much greater range of G + C values: Caused by continuous sections (> 300kb) each of which has a uniform G + C content (isochores) G + C content of isochores also varies between species

Properties of vertebrate isochores G + C rich isochores Correlate with reverse Giesma (R) bands Early replicating High density of genes SINEs present CpG islands in genes High G + C content at third codon position High frequency of retroviral sequences High frequency of chiasmata A + T rich isochores Correlate with Giesma (G) bands Late replicating Low density of genes (only tissue specific) LINEs present No CpG islands High A + T content at third codon position Low frequency of retroviral sequences Low frequency of chiasmata

Theories on the existence of isochores Selectionist hypothesis of Bernardi et al. suggests that GC-rich isochores predominantly found in warm- blooded vertebrates are an adaptation to higher body temperature: Extra hydrogen bond in G-C pair may lessen possibility of thermal damage to DNA Desert plants also have higher GC contents Evidence for independent occurrence of isochores since birds and mammals do not share an immediate ancestor However, some thermophilic bacteria are AT-rich

Theories on the existence of isochores Neutralist explanation for the existence of isochores is that they simply reflect variation in the process of mutation across the genome Studies on argininosuccinate synthetase processed pseudogenes from anthropoid primates: Pseudogenes were derived from same functional ancestral gene but then inserted into different parts of the genome Despite their common ancestry, they now differ in base composition Because pseudogenes are not subject to selection, differences in base composition must have been due to regional variation in mutation patterns

Why should mutation patterns vary across genomes? Replication hypothesis suggests that genes which replicate earlier in the cell cycle are more GC-rich than those which replicate later: Believed to be due to the fact that G and C precursor pools of dNTPs are larger at this time – errors are more likely to incorporate G or C Repair hypothesis is based on assumption that efficiency of DNA repair varies across genome: May be an outcome of transcriptionally active areas being repaired more efficiently CpG islands are maintained by a special repair system – efficiency of DNA replication may be dependent on location

Why should mutation patterns vary across genomes? Recombination hypothesis claims that isochore structure of vertebrate genomes is the outcome of differences in the pattern and frequency of recombination: Low GC localities will be associated with regions of reduced recombination: — Genes with low rates of recombination have low GC values — The large, non-recombining region of the Y-chromosome has a low GC composition Fact that recombination plays such a large part in the structuring of eukaryote genomes makes this an attractive hypothesis Although the relative contributions of these hypotheses are still unclear, the neutralist interpretation seems more likely

Codon usage CGA CGC CGG CGU AGA AGG CUA CAC CUG CUU UUA UUG E. coli Human ARG LEU

What determines codon usage? Degeneracy of genetic code: Null hypothesis is that all codons for a particular amino acid are used with equal frequency Refuted when nucleotide sequences became available for a wide range of organisms Selectionist argument: Highly expressed genes show most codon bias because they require more translational efficiency: coevolution of tRNAs and codons Also supports the neutralist prediction of a relationship between functional constraint and substitution rate

Gene expression and codon bias Highly expressed genes Strong selection for translational efficiency Restricted tRNAs used Strong codon bias Low rate of synonymous substitution (few neutral mutations) Lowly expressed genes Weak selection for translational efficiency More tRNAs used Weak codon bias High rate of synonymous substitution (many neutral mutations)

The molecular clock Idea of a molecular clock is central to the neutralist theory, since it demonstrates the constancy of the underlying neutral mutation rate Previous example of  -globin Does not imply that all genes and proteins evolve at the same rate: Great variation between proteins (fibrinonectins vs. histones) Variation in rate among genes and proteins is compatible with the neutral theory if the underlying cause is changes in selective constraint Key question concerning the validity of a molecular clock is whether rates of substitution are constant within genes across evolutionary time

Neutral theory and the molecular clock Rate of nucleotide substitution (fixation) at any site per year, k, in a diploid population of size 2N is equal to the number of new mutations (neutral, deleterious or advantageous) arising per year, , multiplied by their probability of fixation, u: k = 2N  u For a neutral mutation, probability of fixation is reciprocal of population size: u = 1/2N So substitution rate for a neutral mutation is: k = (2N )(1/2N ) 

Neutral theory and the molecular clock (continued) Parameters for population size (2N) cancel out, leaving: k =  One of the most important formulae in molecular evolution – means that rate of substitution in neutral mutations is dependent only on underlying mutation rate and is independent of other factors such as population size Also holds for mutants with a very weak selective advantage e.g. s < 1/2N e

Substitution of selectively advantageous mutations Probability of fixation is roughly twice the selection coefficient: u = 2sN e /N Substituting this into the original equation, we get: k = 4N e s  In this case, substitution rate for an advantageous mutation also depends on population size and magnitude of selective advantage For natural selection to produce a molecular clock, it is necessary for N e, s and  (combination of ecological, mutational and selective events) to be the same across evolutionary time – highly unlikely!

Constancy of the molecular clock Neutral theory predicted a molecular clock and first protein sequence data appeared to confirm this: led Kimura to cite this as the best evidence for neutrality As more comparative sequence data became available, particularly from mammals, examples of rate variation began to appear Debate arose concerning the constancy of the molecular clock

Testing the molecular clock Dispersion index R(t): test whether there is more rate variation between lineages than expected under a Poisson process: If the data fit a Poisson process, variance in number of substitutions between lineages should be no greater than the mean number If the data fit a Poisson process then R(t) = 1.0, if not then R(t) > 1.0 and the clock is said to be overdispersed A star phylogeny should be used, since any phylogenetic structure will complicate the calculations (e.g. placental mammals)

Testing the molecular clock Mammalian protein data presented a serious problem for neutralists Problems most likely due to inaccuracies in phylogenies: “Outlier” in data was guinea pig Guinea pig is much more divergent than previously thought Protein Haemoglobin  Haemoglobin  Myoglobin Cytochrome c Ribonuclease  -Crystallin Species (n) 666446 Amino acids 141146153104123175R(t)1.173.041.603.222.152.71

The relative rate test The relative rate test compares the difference between the numbers of substitutions between two closely related taxa in comparison with a third, more distantly related outgroup If A and B have evolved according to a molecular clock, both should be equidistant from C d AC = d BC A and B must be closest relatives and C must not be too far removed ABC X

The relative rate test Synonymous sites in nine nuclear genes (3520 bp): d 12 = 6.7 d 13 – d 23 = 2.3 ± 0.6  -globin pseudogene (1827 bp): d 12 = 7.9 d 13 – d 23 = 1.5 ± 0.4 Three introns (3376 bp): d 12 = 6.9 d 13 – d 23 = 1.0 ± 0.5 Two flanking regions (936 bp): d 12 = 7.9 d 13 – d 23 = 3.1 ± 1.1 123 Old World monkeyHuman New World monkey

Lineage effects and the molecular clock Substitution rate varies with underlying neutral mutation rate: k =  Three ways for rates to vary between species: Differences in generation time Differences in metabolic rate Differences in efficiency of DNA repair These are known as lineage effects: neutralists believe that lineage effects alone can account for all variation in molecular clock Selectionists believe that genes also show rate variation due to other, selection-driven factors (residue effects)

Generation time and the molecular clock Time

At the molecular level, generation time (g) can be defined as time it takes for germ-line DNA to replicate i.e. from one gamete to the next Since most mutations occur at this point, rate of substitution under neutral theory is a function of both mutation rate and generation time: k =  /g General conclusion from molecular data is that the clock is generation time dependent at silent sites and in non-coding DNA: Silent rates in orang-utan, gorilla and chimp are 1.3-, 2.2- and 1.2-fold faster than in humans, which matches differences in generation times

The metabolic rate hypothesis In sharks, rate of silent change is five- to sevenfold lower than in primates and ungulates which have similar generation times: Led to the hypothesis that differences in molecular rate are a better explanation for differences in mutation rates than differences in generation time (metabolic rate hypothesis) States that organisms with high metabolic rates have higher levels of DNA synthesis Two pieces of mitochondrial DNA evidence support this: Small bodied animals, which have higher metabolic rates, tend to have higher mutation rates Warm-blooded animals also have higher mutation rates than cold-blooded animals

Relationship between body mass and sequence evolution 0.010.1110100100010,000100,000 0.1 110 % sequence divergence per Myr Body mass (kg) Rodents Geese Dogs Primates Horses Bears Whales Newts Frogs Tortoises Salmon Sea turtles Sharks

DNA repair and mutation DNA DirectdamageReplicationerrors Repair Incorrectlyrepaired Correctlyrepaired Mutation

DNA repair and mutation Repair mechanisms are extremely complex and there are many repair pathways There is some evidence supporting the hypothesis that DNA repair influences mutation rate: Evidence that highly transcribed genes are more efficiently repaired Base composition and substitution rates at silent sites in mammalian genes tends to be gene- rather than species- specific: suggests that homologous genes are transcribed and repaired in a similar manner Conversely, closely related species such as hominind primates, which share very similar repair mechanisms, can exhibit greatly differing substitution rates

Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4.

Similar presentations

Presentation on theme: "Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4.

Similar presentations

Presentation on theme: "Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4."— Presentation transcript:

Similar presentations

About project

Feedback