Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.

Similar presentations


Presentation on theme: "A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University."— Presentation transcript:

1 A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University February 26, 2007 Paper by E. P. Xing and K-A. Sohn

2 Outline Terminology and Introduction DP Mixtures for Non-recombination Inheritance HMDP for Recombination Results Conclusions

3 Allele: a viable DNA coding on a chromosome – observation Locus : the location of an allele – index of an observation Haplotype: a sequence of alleles – data sequence Recombination: exchange pieces of paired chromosome – state-transition Mutation: any change to a haplotype during inheritance – emission Terminology and Introduction (1)

4 Terminology and Introduction (2) Ancestors Descendants

5 Terminology and Introduction (3) Problems: 1. Ancestral inference: recovering ancestral haplotypes; 2. Recombination analysis: inferring the recombination hotspots; 3. Ancestral mapping: inferring the ancestral origin of each allele in each modern haplotype.

6 DP Mixtures for Non-recombination Inheritance (1) Non-recombination: Only mutation may occur during inheritance; Each modern haplotype is originated from a single ancestor. Only true for haplotypes spanning a short region in a chromosome.

7 DP Mixtures for Non-recombination Inheritance (2) where, the distinct values of, denote the joint of the k th ancestor and the mutation parameter corresponding to the k th ancestor.

8 DP Mixtures for Non-recombination Inheritance (3)

9 HMDP for Recombination (1) For long haplotypes possibly bearing multiple ancestors, we consider recombinations (state-transitions across discrete space-interval).

10  Each row of the transition matrix in HMM is a DP. Also these DPs are linked by the top level master DP, and have the same set of target states.  The mixing proportions for each lower level DP are denoted as, then the j th row of the transition matrix is. HMDP for Recombination (2)

11 HMDP for Recombination (3) Modern haplotype Ancestor haplotype The indicators of ith modern haplotype for all the loci, which specify the corresponding ancestral haplotype when no recombination takes place during the inheritance process producing haplotype H i, when a recombination occurs between loci t and t+1,

12 HMDP for Recombination (4) Introduce a Poisson point process to control the duration of non-recombinant inheritance (space-inhomogeneous) Denote d: the physical distance between loci t and t+1 ; r: recombination rate per unit distance. Then x-the number of recombinations

13 HMDP for Recombination (5) Combine with the standard stationary HMDP, the non-stationary state transition probability: While d or r goes to infinity,,, the inhomogeneous HMDP model goes back to a standard HMDP.

14 HMDP for Recombination (6) Inference: The emission function: where The prior base: uniform Integrate over, the marginal likelihood:

15 HMDP for Recombination (7) Inference: Two sampling stages: 1.Sample given all haplotypes h and the most recently sampled ancestor pool a; 2.Sample every ancestor A k given all haplotypes h and the current Combine the HDP prior and the marginal likelihood, we can infer the posterior for and, which are the variables of interest.

16 Results (1) Simulated data: 30 populations, each includes 200 haplotypes from K=5 ancestral haplotypes. T=100 Compare: HMDP, HMMs with K=3,5 and 10 The average ancestor reconstruction errors for the five ancestors Even the HMM with K=5 cannot beat the HMDP

17 Results (2) Box plot of the empirical recombination rates The vertical gray lines - the pre-specified recombination hotspots Threshold 1 Threshold 2

18 Results (3) Population maps: 1. true map; 2. HMDP; 3-5. HMMs with K=3,5,10 Each vertical thin line – one modern haplotype; Each color – one ancestral haplotype. Measure for accuracy: the mean squared distance to the true map

19 Results (4) Real haplotype data sets 1: Daly data – single population 512 haplotypes. T=103 Bottom: empirical recombination rates Upper vertical lines: recombination hotspots. Red dotted lines: HMM; blue dashed lines: MDL; black solid lines: HMDP

20 Results (5) A Gaussian mixture fitting of empirical recombination rates Choose the threshold

21 Results (6) Estimated population map Each vertical thin line – one modern haplotype; Each color – one ancestral haplotype.

22 Conclusions This HMDP model is an application and extension of the HDP into the population genetics field; The HDP allows the space of states in HMM to be infinite so that it is suitable for inferring unknown number of ancestral haplotypes; The HMDP model also allows the recombination rates to be non-stationary; The HMDP model can jointly infer a number of important genetic variables.


Download ppt "A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University."

Similar presentations


Ads by Google