Presentation is loading. Please wait.

Presentation is loading. Please wait.

Course: Neural Networks, Instructor: Professor L.Behera. -Joy Bhattacharjee, Department of ChE, IIT Kanpur. Johann Peter Gustav Lejeune Dirichlet.

Similar presentations


Presentation on theme: "Course: Neural Networks, Instructor: Professor L.Behera. -Joy Bhattacharjee, Department of ChE, IIT Kanpur. Johann Peter Gustav Lejeune Dirichlet."— Presentation transcript:

1

2 Course: Neural Networks, Instructor: Professor L.Behera. -Joy Bhattacharjee, Department of ChE, IIT Kanpur. Johann Peter Gustav Lejeune Dirichlet

3 What is Dirichlet Process ? The Dirichlet process is a stochastic process used in Bayesian nonparametric models of data, particularly in Dirichlet process mixture models (also known as infinite mixture models). It is a distribution over distributions, i.e. each draw from a Dirichlet process is itself a distribution. It is called a Dirichlet process because it has Dirichlet distributed finite dimensional marginal distributions.

4 What is Dirichlet Process ? The Dirichlet process is a stochastic process used in Bayesian nonparametric models of data, particularly in Dirichlet process mixture models (also known as infinite mixture models). It is a distribution over distributions, i.e. each draw from a Dirichlet process is itself a distribution. It is called a Dirichlet process because it has Dirichlet distributed finite dimensional marginal distributions.

5 What is Dirichlet Process ? The Dirichlet process is a stochastic process used in Bayesian nonparametric models of data, particularly in Dirichlet process mixture models (also known as infinite mixture models). It is a distribution over distributions, i.e. each draw from a Dirichlet process is itself a distribution. It is called a Dirichlet process because it has Dirichlet distributed finite dimensional marginal distributions.

6 Dirichlet Priors A distribution over possible parameter vectors of the multinomial distribution Thus values must lie in the k-dimensional simplex Beta distribution is the 2-parameter special case Expectation A conjugate prior to the multinomial xixi N

7 What is Dirichlet Distribution ? Methods to generate Dirichlet distribution : 1.Polyas Urn 2.Stick Breaking 3.Chinese Restaurant Problem

8 Samples from a DP

9

10

11

12 Dirichlet Distribution

13 Polyas Urn scheme: Suppose we want to generate a realization of Q Dir(α). To start, put i balls of color i for i = 1; 2; : : : ; k; in an urn. Note that i > 0 is not necessarily an integer, so we may have a fractional or even an irrational number of balls of color i in our urn! At each iteration, draw one ball uniformly at random from the urn, and then place it back into the urn along with an additional ball of the same color. As we iterate this procedure more and more times, the proportions of balls of each color will converge to a pmf that is a sample from the distribution Dir(α).

14 Mathematical form:

15 Stick Breaking Process The stick-breaking approach to generating a random vector with a Dir(α) distribution involves iteratively breaking a stick of length 1 into k pieces in such a way that the lengths of the k pieces follow a Dir(α) distribution. Following figure illustrates this process with simulation results.

16 Stick Breaking Process G0G

17 Chinese Restaurant Process

18

19 Nested CRP To generate a document given a tree with L levels – Choose a path from the root of the tree to a leaf – Draw a vector of topic mixing proportions from an L-dimensional Dirichlet – Generate the words in the document from a mixture of the topics along the path, with mixing proportions

20 Nested CRP Day 1Day 2Day 3

21 Properties of the DP Let (, ) be a measurable space, G 0 be a probability measure on the space, and be a positive real number A Dirichlet process is any distribution of a random probability measure G over (, ) such that, for all finite partitions (A 1,…,A r ) of, Draws G from DP are generally not distinct The number of distinct values grows with O(log n)

22 In general, an infinite set of random variables is said to be infinitely exchangeable if for every finite subset {x i,…,x n } and for any permutation we have Note that infinite exchangeability is not the same as being independent and identically distributed (i.i.d.)! Using DeFinettis theorem, it is possible to show that our draws are infinitely exchangeable Thus the mixture components may be sampled in any order.

23 Mixture Model Inference We want to find a clustering of the data: an assignment of values to the hidden class variable Sometimes we also want the component parameters In most finite mixture models, this can be found with EM The Dirichlet process is a non-parametric prior, and doesnt permit EM We use Gibbs sampling instead

24 Finite mixture model

25 Infinite mixture model

26 DP Mixture model

27 Agglomerative Clustering Pros: Doesnt need generative model (number of clusters, parametric distribution) Cons: Ad-hoc, no probabilistic foundation, intractable for large data sets Num ClustersMax Distance

28 Mixture Model Clustering Examples: K-means, mixture of Gaussians, Naïve Bayes Pros: Sound probabilistic foundation, efficient even for large data sets Cons: Requires generative model, including number of clusters (mixture components)

29 Applications Clustering in Natural Language Processing – Document clustering for topic, genre, sentiment… – Word clustering for Part of Speech(POS), Word sense disambiguation(WSD), synonymy… – Topic clustering across documents – Noun coreference: dont know how many entities are there – Other identity uncertainty problems: deduping, etc. – Grammar induction Sequence modeling: the infinite HMM – Topic segmentation) – Sequence models for POS tagging Society modeling in public places Unsupervised machine learning

30 References: Bela A. Frigyik, Amol Kapila, and Maya R. Gupta, University of Washington, Seattle, UWEE Technical report : Introduction to Dirichlet distribution and related processes, report number UWEETR Yee Whye Teh, University College London : Dirichlet Process Khalid-El-Arini, Select Lab meeting, October Teg Granager, Natural Language Processing, Stanford University : Introduction to Chinese Restaurant problem and Stick breaking scheme. Wikipedia

31 Questions ? Suggest some distributions that can use Dirichlet process to find classes. What are the applications in finite mixture model? Comment on: The DP of a cluster is also a Dirichlet distribution.


Download ppt "Course: Neural Networks, Instructor: Professor L.Behera. -Joy Bhattacharjee, Department of ChE, IIT Kanpur. Johann Peter Gustav Lejeune Dirichlet."

Similar presentations


Ads by Google