Download presentation

1
**Dirichlet Processes in Dialogue Modelling**

Nigel Crook March 2009 1

2
**Inputs and Outputs Overview The COMPANIONS project Dialogue Acts**

Document Clustering Multinomial Distribution Dirichlet Distribution Graphical Models Bayesian Finite Mixture Models Dirichlet Processes Chinese Restaurant Process Concluding Thoughts With thanks to ... Percy Liang and Dan Klein (UC Berkeley)1 1 Structured Bayesian Nonparametric Models with Variational Inference, ACL Tutorial in Prague, Czech Republic on June 24, 2007. 2

3
**The COMPANIONS project**

COMPANIONS: Intelligent, Persistent, Personalised Multimodal Interfaces to the Internet One Companion on many platforms “Okay, but please play some relaxing music then” “Your pulse is a bit high, please slow down a bit.”

4
**The COMPANIONS project**

Proposed Dialogue System Architecture Speech Recognition Language Understand Dialogue Model Signal Words Concepts User Intentions (DAs) USER Dialogue Manager Signal Speech Sinthesizer Language Generation System Intentions (DAs) Words DB

5
Dialogue Acts A Dialogue Act is a linguistic abstraction that attempts to capture the intension/purpose of an utterance. DAs are based on the concept of a speech act – “When we say something, we do something” (Austin, 1962) Examples of DAs labels using the DAMSL scheme on the Switchboard corpus : Example Dialogue Act Me, I’m in the legal department. Statement-non-opinion Uh-huh. Acknowledge (Backchannel) I think it’s great Statement-opinion That’s exactly it. Agree/Accept So, - Abandoned or Turn-Exit I can imagine. Appreciation Do you have to have any special training? Yes-No-Question

6
**Dialogue Act Classification**

Research question: Can major DA categories be identified automatically through the clustering of utterances? Each utterance can be treated as a ‘bag of (content) words’ … What time is the next train to Oxford ? Can then apply methods from document clustering

7
Document Clustering Working example: Document clustering

8
**Document Clustering Each document is a ‘bag of (content) words’**

How many clusters? In parametric methods the number of clusters is specified at the outset. Bayesian nonparametric methods (Gaussian Processes and Dirichlet Processes) automatically detect how many clusters there are.

9
**Multinomial Distribution**

A multinomial probability distribution is a distribution over all the possible outcomes of multinomial experiment. 1 2 3 4 5 6 A fair dice 1 2 3 4 5 6 A weighted dice Each draw from a multinomial distribution yields an integer e.g. 5, 2, 3, 2, 6 …

10
**Dirichlet Distribution**

Each point on a k dimensional simplex is a multinomial probability distribution: 1 2 3 1 1 1 2 3 1

11
**Dirichlet Distribution**

A Dirichlet Distribution is a distribution over multinomial distributions in the simplex. 1 1 1 1 1

12
**Dirichlet Distribution**

The Dirichlet Distribution is parameterised by a set of concentration constants defined over the k-simplex A draw from a Dirichlet Distribution written as: where is a multinomial distribution over k outcomes.

13
**Dirichlet Distribution**

Example draws from a Dirichlet Distribution over the 3-simplex: Dirichlet(5,5,5) Dirichlet(0.2, 5, 0.2) 1 Dirichlet(0.5,0.5,0.5)

14
Graphical Models A p(A,B) = p(B|A)p(A) B A A B1 B2 Bn Bi i n

15
**Bayesian Finite Mixture Model**

Parameters: = (, ) = (1 … k,1 … k ) Hidden variables z = (z1 … zn) Observed data x = (x1 … xn) zi z z k xi i n ~Dirichletk(, …, ) Components z (z (1 … k)) are drawn from a base measure G0 z ~ G0 (e.g. Dirichletv(, …, )) For each data point (document) a component z is drawn: zi ~ Multinomial() and the data point is drawn from some distribution F() xi ~ F(z ) (e.g. Multinomial(z )) i

16
**Bayesian Finite Mixture Model**

Document clustering example: k = 2 clusters ~Dirichletk(, ) 1 2 1 2 3 z v = 3 word types z ~ Dirichletv(, , ) Choose a source for each data point (document) i {1, … n}: zi ~ Multinomialk() z1 = 1 z2 = 2 z3 = 2 z4 = 1 z5 = 2 Generate the data point (words in document) using source: xi ~ Multinomialv(z )) xi = ACAAB x2 = ACCBCC x3 = CCC x4 = CABAAC x5 = ACC

17
Data Generation Demo Component = 1 words = Id: 0 [1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1] Component = 0 words = Id: 1 [0, 1, 2, 2, 0, 0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 2] Component = 1 words = Id: 2 [1, 1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1] Component = 1 words = Id: 3 [1, 1, 1, 1, 1] Component = 1 words = Id: 4 [1, 1, 1, 1, 1, 1, 1, 1] Component = 0 words = Id: 5 [0, 2, 0, 0, 0, 2, 2, 0, 0, 1, 0, 2, 0, 2, 1, 2, 0, 0, 2] Component = 1 words = Id: 6 [1, 1, 1, 1, 1, 1] Component = 0 words = Id: 7 [0, 2, 2, 0, 0, 2, 2, 0, 2, 0] Component = 0 words = Id: 8 [0, 0, 2, 1, 2, 2] Component = 0 words = Id: 9 [2, 0, 1, 0, 2, 0, 2, 1, 0, 2, 2, 1, 1, 2, 0] Component = 1 words = Id: 10 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2] Component = 2 words = Id: 11 [0, 0, 2, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0] Component = 0 words = Id: 12 [1, 0, 1, 0, 0, 0, 2, 2, 0, 0, 2, 0, 2, 1, 0, 0] Component = 1 words = Id: 13 [1, 1, 1, 2, 1, 1, 1] Component = 0 words = Id: 14 [0, 2, 2, 0, 2, 0, 2, 0, 0, 0, 2, 1, 2] Component = 0 words = Id: 15 [2, 0, 0, 0, 1, 2, 0, 2, 0, 2, 0, 2, 0] Component = 1 words = Id: 16 [1, 1, 1, 1, 1] Component = 0 words = Id: 17 [1, 1, 0, 0, 2, 1, 2, 0, 0, 0, 1, 2, 1] Component = 1 words = Id: 18 [1, 1, 1, 1, 1, 1, 0, 2, 1] Component = 1 words = Id: 19 [1, 1, 0, 2, 1, 1, 1, 1, 0] Component = 2 words = Id: 20 [0, 1, 0, 2, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 2]

18
Dirichlet Processes Dirichlet Processes can be thought of as a generalisation of infinite-dimensional Dirichlet distributions … but not quite! As the dimension k of a Dirichlet distribution increases … k = 2 k = 4 k = 6 k = 8 k = 10 k = 12 k = 18 Dirichlet distribution is symmetric For a Dirichlet Process need the larger components to appear near the beginning of the distribution on average

19
Dirichlet Processes Stick breaking construction (GEM) … 1

20
**Dirichlet Processes Mixture Model**

Definition ~ Dirichletk(, …, ) Components z z (1 … k) z ~ G0 For each data point (document) z is drawn: zi ~ Multinomial() and the data point is drawn from some distribution F() xi ~ F(z ) (e.g. Multinomial(z )) i GEM() (1 … )

21
**Chinese Restaurant Process**

The Chinese Restaurant Process is one view of DPs Tables = clusters Customers = data points (documents) Dishes = component parameters x1 x2 x3 x4 x5 x6 x7 … 1 2 3 4 5

22
**Chinese Restaurant Process**

Shut your eyes if you don’t want to see any more maths … i | 1, …, i-1 ~ The “rich get richer” principle: tables with more customers get more customers on average

23
**CRP Initial Clustering Demo**

Initial table allocations 100 documents 3 sources 5 to 20 words per document

24
**CRP Table parameters 1 2 3 4 5**

Each cluster (table) is given a parameter (dish) i which all the data points (customers) in that cluster share. These are drawn from the base measure G0 (a Dirichlet distribution in this case) 1 2 3 4 5 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

25
**CRP Inference p(, , z | x) zi z xi**

Goal of Bayesian inference is to calculate the posterior: p(, , z | x) zi z The posterior cannot usually be sampled directly. Can use Gibbs sampling … z k xi i n

26
**CRP Inference - reclustering**

1 2 3 5 4 1 2 3 5 4 x1 x2 x4 x5 1 2 3 4 5 x6 x3 x7 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 x4 x2

27
**CRP Inference – table updates**

( ) 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 + + = + = x1 x3 x2 x4 x5 1 2 3 4 x6 x7 1 2 3 1 2 3 1 2 3 1 2 3

28
CRP Inference Demo

29
Concluding Thoughts CRP Works well at the toy document clustering example Document size 100+ words Up to 6 word types 100 – 500 documents Will it work when clustering utterances? Utterance size 1 – 20 words This is much a much harder classification problem

Similar presentations

Presentation is loading. Please wait....

OK

Categorical Data Analysis

Categorical Data Analysis

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google