Download presentation

Presentation is loading. Please wait.

1
**Dirichlet Processes in Dialogue Modelling**

Nigel Crook March 2009 1

2
**Inputs and Outputs Overview The COMPANIONS project Dialogue Acts**

Document Clustering Multinomial Distribution Dirichlet Distribution Graphical Models Bayesian Finite Mixture Models Dirichlet Processes Chinese Restaurant Process Concluding Thoughts With thanks to ... Percy Liang and Dan Klein (UC Berkeley)1 1 Structured Bayesian Nonparametric Models with Variational Inference, ACL Tutorial in Prague, Czech Republic on June 24, 2007. 2

3
**The COMPANIONS project**

COMPANIONS: Intelligent, Persistent, Personalised Multimodal Interfaces to the Internet One Companion on many platforms “Okay, but please play some relaxing music then” “Your pulse is a bit high, please slow down a bit.”

4
**The COMPANIONS project**

Proposed Dialogue System Architecture Speech Recognition Language Understand Dialogue Model Signal Words Concepts User Intentions (DAs) USER Dialogue Manager Signal Speech Sinthesizer Language Generation System Intentions (DAs) Words DB

5
Dialogue Acts A Dialogue Act is a linguistic abstraction that attempts to capture the intension/purpose of an utterance. DAs are based on the concept of a speech act – “When we say something, we do something” (Austin, 1962) Examples of DAs labels using the DAMSL scheme on the Switchboard corpus : Example Dialogue Act Me, I’m in the legal department. Statement-non-opinion Uh-huh. Acknowledge (Backchannel) I think it’s great Statement-opinion That’s exactly it. Agree/Accept So, - Abandoned or Turn-Exit I can imagine. Appreciation Do you have to have any special training? Yes-No-Question

6
**Dialogue Act Classification**

Research question: Can major DA categories be identified automatically through the clustering of utterances? Each utterance can be treated as a ‘bag of (content) words’ … What time is the next train to Oxford ? Can then apply methods from document clustering

7
Document Clustering Working example: Document clustering

8
**Document Clustering Each document is a ‘bag of (content) words’**

How many clusters? In parametric methods the number of clusters is specified at the outset. Bayesian nonparametric methods (Gaussian Processes and Dirichlet Processes) automatically detect how many clusters there are.

9
**Multinomial Distribution**

A multinomial probability distribution is a distribution over all the possible outcomes of multinomial experiment. 1 2 3 4 5 6 A fair dice 1 2 3 4 5 6 A weighted dice Each draw from a multinomial distribution yields an integer e.g. 5, 2, 3, 2, 6 …

10
**Dirichlet Distribution**

Each point on a k dimensional simplex is a multinomial probability distribution: 1 2 3 1 1 1 2 3 1

11
**Dirichlet Distribution**

A Dirichlet Distribution is a distribution over multinomial distributions in the simplex. 1 1 1 1 1

12
**Dirichlet Distribution**

The Dirichlet Distribution is parameterised by a set of concentration constants defined over the k-simplex A draw from a Dirichlet Distribution written as: where is a multinomial distribution over k outcomes.

13
**Dirichlet Distribution**

Example draws from a Dirichlet Distribution over the 3-simplex: Dirichlet(5,5,5) Dirichlet(0.2, 5, 0.2) 1 Dirichlet(0.5,0.5,0.5)

14
Graphical Models A p(A,B) = p(B|A)p(A) B A A B1 B2 Bn Bi i n

15
**Bayesian Finite Mixture Model**

Parameters: = (, ) = (1 … k,1 … k ) Hidden variables z = (z1 … zn) Observed data x = (x1 … xn) zi z z k xi i n ~Dirichletk(, …, ) Components z (z (1 … k)) are drawn from a base measure G0 z ~ G0 (e.g. Dirichletv(, …, )) For each data point (document) a component z is drawn: zi ~ Multinomial() and the data point is drawn from some distribution F() xi ~ F(z ) (e.g. Multinomial(z )) i

16
**Bayesian Finite Mixture Model**

Document clustering example: k = 2 clusters ~Dirichletk(, ) 1 2 1 2 3 z v = 3 word types z ~ Dirichletv(, , ) Choose a source for each data point (document) i {1, … n}: zi ~ Multinomialk() z1 = 1 z2 = 2 z3 = 2 z4 = 1 z5 = 2 Generate the data point (words in document) using source: xi ~ Multinomialv(z )) xi = ACAAB x2 = ACCBCC x3 = CCC x4 = CABAAC x5 = ACC

17
Data Generation Demo Component = 1 words = Id: 0 [1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1] Component = 0 words = Id: 1 [0, 1, 2, 2, 0, 0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 2] Component = 1 words = Id: 2 [1, 1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1] Component = 1 words = Id: 3 [1, 1, 1, 1, 1] Component = 1 words = Id: 4 [1, 1, 1, 1, 1, 1, 1, 1] Component = 0 words = Id: 5 [0, 2, 0, 0, 0, 2, 2, 0, 0, 1, 0, 2, 0, 2, 1, 2, 0, 0, 2] Component = 1 words = Id: 6 [1, 1, 1, 1, 1, 1] Component = 0 words = Id: 7 [0, 2, 2, 0, 0, 2, 2, 0, 2, 0] Component = 0 words = Id: 8 [0, 0, 2, 1, 2, 2] Component = 0 words = Id: 9 [2, 0, 1, 0, 2, 0, 2, 1, 0, 2, 2, 1, 1, 2, 0] Component = 1 words = Id: 10 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2] Component = 2 words = Id: 11 [0, 0, 2, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0] Component = 0 words = Id: 12 [1, 0, 1, 0, 0, 0, 2, 2, 0, 0, 2, 0, 2, 1, 0, 0] Component = 1 words = Id: 13 [1, 1, 1, 2, 1, 1, 1] Component = 0 words = Id: 14 [0, 2, 2, 0, 2, 0, 2, 0, 0, 0, 2, 1, 2] Component = 0 words = Id: 15 [2, 0, 0, 0, 1, 2, 0, 2, 0, 2, 0, 2, 0] Component = 1 words = Id: 16 [1, 1, 1, 1, 1] Component = 0 words = Id: 17 [1, 1, 0, 0, 2, 1, 2, 0, 0, 0, 1, 2, 1] Component = 1 words = Id: 18 [1, 1, 1, 1, 1, 1, 0, 2, 1] Component = 1 words = Id: 19 [1, 1, 0, 2, 1, 1, 1, 1, 0] Component = 2 words = Id: 20 [0, 1, 0, 2, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 2]

18
Dirichlet Processes Dirichlet Processes can be thought of as a generalisation of infinite-dimensional Dirichlet distributions … but not quite! As the dimension k of a Dirichlet distribution increases … k = 2 k = 4 k = 6 k = 8 k = 10 k = 12 k = 18 Dirichlet distribution is symmetric For a Dirichlet Process need the larger components to appear near the beginning of the distribution on average

19
Dirichlet Processes Stick breaking construction (GEM) … 1

20
**Dirichlet Processes Mixture Model**

Definition ~ Dirichletk(, …, ) Components z z (1 … k) z ~ G0 For each data point (document) z is drawn: zi ~ Multinomial() and the data point is drawn from some distribution F() xi ~ F(z ) (e.g. Multinomial(z )) i GEM() (1 … )

21
**Chinese Restaurant Process**

The Chinese Restaurant Process is one view of DPs Tables = clusters Customers = data points (documents) Dishes = component parameters x1 x2 x3 x4 x5 x6 x7 … 1 2 3 4 5

22
**Chinese Restaurant Process**

Shut your eyes if you don’t want to see any more maths … i | 1, …, i-1 ~ The “rich get richer” principle: tables with more customers get more customers on average

23
**CRP Initial Clustering Demo**

Initial table allocations 100 documents 3 sources 5 to 20 words per document

24
**CRP Table parameters 1 2 3 4 5**

Each cluster (table) is given a parameter (dish) i which all the data points (customers) in that cluster share. These are drawn from the base measure G0 (a Dirichlet distribution in this case) 1 2 3 4 5 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

25
**CRP Inference p(, , z | x) zi z xi**

Goal of Bayesian inference is to calculate the posterior: p(, , z | x) zi z The posterior cannot usually be sampled directly. Can use Gibbs sampling … z k xi i n

26
**CRP Inference - reclustering**

1 2 3 5 4 1 2 3 5 4 x1 x2 x4 x5 1 2 3 4 5 x6 x3 x7 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 x4 x2

27
**CRP Inference – table updates**

( ) 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 + + = + = x1 x3 x2 x4 x5 1 2 3 4 x6 x7 1 2 3 1 2 3 1 2 3 1 2 3

28
CRP Inference Demo

29
Concluding Thoughts CRP Works well at the toy document clustering example Document size 100+ words Up to 6 word types 100 – 500 documents Will it work when clustering utterances? Utterance size 1 – 20 words This is much a much harder classification problem

Similar presentations

Presentation is loading. Please wait....

OK

5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.

5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google