Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.

Similar presentations


Presentation on theme: "Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August."— Presentation transcript:

1 Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August 24, 2006 Paper by E. Xing, K. Sohn, M. Jordan and Y. Teh, ICML 2006

2 Outline Background Dirichlet Processe mixture Hierarchical Dirichlet Process mixture Application on haplotype inference

3 Motivation Problem – Uncovering the haplotypes of single nucleotide polymorphisms (SNP) within and between populations. Methods – Coalescence, finite and infinite mixtures, and maximal parsimony. Application –Biological and medical analysis; –Genetic demography study.

4 Background A SNP haplotype is a list of alleles at contiguous sites in a local region of a single chromosome. A haplotype is inherited as a unit. For diploid organisms, two haplotypes go together to make up a genotype, which is a list of unordered pairs of alleles in a region. Haplotype inference from genotype data can be formulated as a mixture model. HDP mixture is used in this paper.

5 Dirichlet Processes A single clustering problem can be analyzed as a Dirichlet processes (DP).

6 DP mixture model G can be looked as an mixture model with infinite components.

7 DP-Haplotyper denotes the genotype of T contiguous SNPs of individual i from ethnic group j. The corresponding paternal/maternal haplotypes of the individual genotype is denoted by H is assume to be a random perturbation of an ancestral haplotype A, or founder. DP-Haplotyper is a DP mixture model to model a single population group.

8 Graph model of DP-Haplotyper

9 Hierarchical Dirichlet Process Each group is modeled as a DP G j and the group-specific DPs are linked via a global DP G 0. G 0 defines the set of mixture components used by all the groups. Different groups share the same set of mixture components (underlying clusters ), but with different mixture proportions.

10 HDP can be used as the prior distribution over the factors for nested group data. Consider a two-level DPs. G 0 links the child G j DPs and forces them to share components. G j is conditionally independent given G 0 HDP mixture model

11 HDP – Chinese Restaurant Franchise First level: within each group, DP mixture – –Φ j1, …,Φ j(i-1), i.i.d., r.v., distributed according to G j ; Ѱ j1, …, Ѱ jT j to be the values taken on by Φ j1, …,Φ j(i-1), n jk be # of Φ ji ’ = Ѱ jt, 0<i ’ <i. Second level: across group, sharing clusters –Base measure of each group is a draw from DP: –Ө 1, …, Ө K to be the values taken on by Ѱ j1, …, Ѱ jT j, m k be # of Ѱ jt =Ө k, all j, t.

12 HDP-Haplotyper model

13 Parameterization form of the model Underlying mixture component A k := [A k,1, …, A k,T ] – founding haplotype configuration Base measure, where p(A) is uniform distribution and p( ) is a beta distribution. Inheritance model Genotyping model

14 Gibbs Sampling Gibbs sampling variants includes: Sampling scheme is similar to a two-level urn model: – –

15 Simulated data 100 individuals from 5 groups (20 each). Each group has 2 shared founders and 3 unique founders, in a total of 17 founders.

16 Real data International HapMap Project, containing four population of genotypes.

17 Conclusion The author proposed a HDP mixture model for haplotype inference for multiple populations. HDP prior couples multiple heterogeneous populations and facilitates sharing mixture components across multiple infinite mixture models. In the future, longer SNP sequences will be considered. Also HDP can be generalized to the problem in which the group labels are unknown and to be inferred.


Download ppt "Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August."

Similar presentations


Ads by Google