Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.

Similar presentations


Presentation on theme: "By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458."— Presentation transcript:

1 By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458

2 Agenda Basic concepts of population genetics The coalescent theory Coalescent process of two sequences Coalescent time Statistical inference Applications: reconstruction of human evolutionary history Future venues

3 Basic Concepts in Population Genetics f1f1 f2f2 fkfk Random genetic drift Mutation Selection

4 Basic Concepts in Population Genetics Mutation: limited role in evolution due to its slow effect, however contributes to the maintenance of alleles in the population Locus with 2 allelles: A 1 (p(n)) and A 2 (q(n)=1-p(n)) Non-overlapping generations A 1 ->A 2 at rate u and A 2 ->A 1 at rate v (u, v ~10 -5, 10 -6 ) Allele can mutate most once/generation if initial gene freq. of A1=p(0) As n->∞ “equilibrium”

5 Basic Concepts in Population Genetics Random genetic drift: change in gene frequency due to random sampling of gametes from a finite population. Important for small size populations Each generation 2N gametes sampled at random from parent generation y(n): # gametes of type A1, in absence of mutation and selection Wright-Fisher model One allele will be lost

6 Basic Concepts in Population Genetics Selection: can act at different stages of the life of an organism (e.g. differential fecundity, viability) Locus with 2 alleles A 1, A 2 Three genotypes: A 1 A 1 (w 11 ), A 1 A 2 (w 12 ), A 2 A 2 (w 22 ) with fitness w ij, relative survival chances of zygotes of genotype A i A j Under Hardy-Weinberg equilibrium If w 11 >w 12 >w 22 -> A 1 becomes fixed w 11 A 2 becomes fixed w 11,w 22 overdominance, stable polymorphism w 12 underdominance, unstable polymorphism, A 1 or A 2 becomes fixed f(0)

7 The Coalescent Theory Stochastic process: continuous-time Markov process Large population approximation of Wright-Fisher model, and other neutral models Probability model for genealogical tree of random sample of n genes from large population Most significant progress in theoretical population genetics (past 2 decades). Cornerstone for rigorous statistical analysis of molecular data from populations Need of: inferring the past from samples taken from present population Seminal work: Kingman, J Appl Prob 19A:27, 1982

8 The Coalescent Theory – Key Idea Start with a sample and trace backwards in time to identify EVENTS in the past since the Most Recent Common Ancestor (MRCA) in the sample Consider sample of n sequences of a DNA region for a population Assume no recombination between sequences N sequences are connected by a single phylogenetic tree (genealogy) where the root=MRCA Diverge Coalesce MRCA

9 The Coalescent Theory: Usefulness Sample-based theory By-product: development of highly-efficient algorithms for simulation of samples under various population genetics models Particularly suitable for molecular data Estimate parameters of evolutionary models (vs. history of specific locus – phylogenetics)

10 The Coalescent Process of Two Sequences Consider diploid organisms Wright-Fisher model: –Sequence in a population at a generation = random sample with replacement from those in the previous generation –Mutations at locus of interest: selectively neutral (do not affect reproductive success, all individuals likely to reproduce, all lineages equally likely to coalesce) P(coalescence at previous generation)=? P=1/2N, N=effective population size P(coalescence t+1 generations ago) = For haploid structures, use N rather than 2N

11 The Coalescent Tree Topology is independent of branch lengths Branch lengths are independent, exponential rv’s (waiting time between coalescent events) Topology is generated by randomly picking lineages to coalesce -> “all topologies are equally likely” MRCA T2T3T4T5T2T3T4T5 Genealogical relationship of sample of genes

12 The Coalescent Time Assume: # mutations in a given period ~Poisson mean time 2N generation between two sequences mean # mutations in two sequences  = 4N  (  : mutation rate seq/generations) Underlying assumption: randomly mating (~ organisms with high mobility) Coalescent time: time between two successive coalescent events Exponential variable, mean = 2/k(k-1) k: # ancestral sequences between the two events

13 Coalescent Tree Parameters P(2 lineages pick same parent) And coalesce Remain distinct Expected time to MRCA (height of the tree): Expected total branch length of the tree:

14 The Coalescent Theory & Statistical Inference Mutation rate Age of MRCA Recombination rate Ancestral population size Migration rate

15 Reconstruction of Human Evolutionary History Goal: estimate times of evolutionary events (major migrations), demographic history (population bottlenecks, expansions) Haploid sequences: mtDNA, Y chromosome Case study: recent common ancestry of human Y chromosome Source: Thomson et al. PNAS 2000; 97:7360-5 Estimations: expected time to MRCA and ages of certain mutations Data: 53-70 chromosomes, sequences variation at three genes (SMCY, DBY, DFFRY) in Y chromosome

16 Recent common ancestry of Y chromosome Gene Seq length Sample size No. polym. No. substitutions Mutation rate SMCY39,9315347 (41)5281.32x10-9 DBY8,5477014 (12)1071.25x10-9 DFFRY15,6427017 (15)1591.02x10-9 All64,1204365 (56)7941.24x10-9 Summary of gene characteristics from sample Source: Table 1 from article (#) in no. polymorphisms after removal of length variants, repeat sequences, indels For ages of major events: need mutation rate estimate (SN substitution) Substitutions between chimpanzee and human sequences Mutation rate per site per year = No. subst./2*T split *L T split : time since chimp and human split (~5M years ago) Assumptions: selective neutrality of all changes on Y since divergence

17 GENETREE Analysis Software: www.stats.ox.ac.uk/~stephens/group/software.htmlwww.stats.ox.ac.uk/~stephens/group/software.html Estimate mean number of mutations: = 2N e  N e : effective number of Y chromosomes in population  : mutation rate per gene per generation Also: expected ages of mutation, time since MRCA Assumptions: coalescent process, infinitely-many-sites mutation (mutation rate low enough -> e/occurs at new site) Four insertions, three deletions, two repeat mutations (different rates from SN substitutions) Only one segregating site in SMCY appeared to have mutated >1 -> data fit infinitely-many sites model

18 Recent common ancestry of Y chromosome Gene T MRCA 1 95%CI T MRCA 2 95%CI SMCY0.56(0.40, 0.82)85,000(61,000, 125,000) DBY0.83(0.60, 1.10)154,000(112,000, 206,000) DFFRY0.96(0.55, 1.21)120,000(69,000, 152,000) All0.55(0.36, 0.98)84,000(55,000, 149,000) Gene T MRCA 95%CI T MRCA 95%CI SMCY0.0731(0.0618, 0.1030)48,000(41,000, 68,000) DBY0.0538(0.0382, 0.0975)55,000(39,000, 100,000) DFFRY0.0582(0.0440, 0.0720)53,000(40,000, 65,000) All0.0853(0.0580, 0.2070)59,000(40,000, 140,000) MRCA distribution under constant population MRCA distribution under exponential population growth 1 Expected age in Ne generations. 2 Value in years = N e *25

19 GENETREE Analysis Expected ages of mutations in tree: Mutation 1: 47,000 (35,000; 89,000) – male movement out of Africa Mutation 2: 40,000 (31,000; 79,000) – beginning of global expansion 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 2 1 1 1 2 1 2 1 3 1 1 3 Africa Asia Oceania

20 Future Venues Population genetics models: incorporation of migration, population growth, recombination, natural selection Longitudinal analysis Evolutionary analysis of quantitative trait loci (QTL) Properties of CT: –Accuracy of coalescent approximation under combinations of population size, sample size, mutation rate –Properties of estimators under MCMC

21 References Handbook of Statistical Genetics, 2 nd edition, Vol.2 Nature 2002; 3:380-390 Theoretical Population Biology 1999; 56:1-10.


Download ppt "By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458."

Similar presentations


Ads by Google