Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statisical Spoken Dialogue System Talk 2 – Belief tracking CLARA Workshop Presented by Blaise Thomson Cambridge University Engineering Department

Similar presentations


Presentation on theme: "Statisical Spoken Dialogue System Talk 2 – Belief tracking CLARA Workshop Presented by Blaise Thomson Cambridge University Engineering Department"— Presentation transcript:

1 Statisical Spoken Dialogue System Talk 2 – Belief tracking CLARA Workshop Presented by Blaise Thomson Cambridge University Engineering Department

2 Human-machine spoken dialogue Recognizer Semantic Decoder Dialog Manager Synthesizer Message Generator User WaveformsWords Dialog Acts I want a restaurant inform(type=restaurant) What kind of food do you want.? request(food) Typical structure of a spoken dialogue system

3 Spoken Dialogue Systems – State of the art

4 Outline Introduction An example user model (spoken dialogue model) The Partially Observable Markov Decision Process (POMDP) POMDP models for dialogue systems POMDP models for off-line experiments POMDP models for simulating Inference Belief propagation (Fixed parameters) Expectation Propagation (Learning parameters) Optimisations Results

5 Intro – An example user model? Partially Observable Markov Decision Process (POMDP) Probabilistic model of what the user will say Variables: Dialogue state, s t. (e.g. User wants a restaurant) System action, a t. (e.g. What type of food) Observation of what was said, o t. (e.g. N-best semantic list) Assumes Input-Output Hidden Markov structure: s1s1 s2s2 sTsT... o1o1 o2o2 oToT a1a1 a2a2 aTaT

6 Intro – Simplifying the POMDP user model Typically split dialogue state, s t : s1s1 s2s2 sTsT... o1o1 o2o2 oToT a1a1 a2a2 aTaT

7 Intro – Simplifying the POMDP user model Typically split dialogue state, s t : True user goal, g t True user act, u t g1g1 g2g2 gTgT... o1o1 o2o2 oToT a1a1 a2a2 aTaT u1u1 u2u2 uTuT

8 Further split the goal, g t, into sub-goals g t,c e.g. User wants a Chinese restaurant food=Chinese, type=restaurant Intro – Simplifying the POMDP user model gtgt g t,food g t,type g t,stars g t,area g

9 Intro – Simplifying the POMDP user model g type u type g food u food o a U G g type u type g food u food o a U G

10 How can I help you? Im looking for a beer [0.5] Im looking for a bar [0.4] Sorry, what did you say? bar [0.3] bye [0.3] When decisions are based on probabilistic user goals: Partially Observable Markov Decision Process (POMDPs) Intro – POMDP models for dialogue systems Beer Bar Bye

11 Intro – POMDP models for dialogue systems

12 Intro – belief model for dialogue systems Beer Bar Bye confirm(beer) Choose actions according to beliefs in the goal instead of most likely hypothesis More robust – some key reasons Full hypothesis list User model

13 Intro – POMDP models for off-line experiments How can I help you? Im looking for a beer Im looking for a bar Sorry, what did you say? bar bye Beer Bar Bye [0.5] [0.4] [0.2] [0.7] [0.3] [0.5] [0.1]

14 Intro – POMDP models for simulation Often useful to be able to simulate how people behave: For reinforcement learning For testing a given system In theory, simply generate from the POMDP user model g type u type g food u food a U G restaurant Chinese inform(type=restaurant) silence()

15 An example – voic We have a voic system with 2 possible user goals: g = SAVE: The user wants to save g = DEL: The user wants to delete In each turn until we save/delete we observe one of two things o = USAVE: The user said save o = UDEL: The user said delete We assume that the goal changes between each turn, and for the moment we only look at two turns We start by being completely unsure what the user wants

16 An example – exercise Observation probability: P(o | g) If we observe the user saying they want to save and then what is the probability they want to save. P(g 1 | o 1 = OSAVE) Use Bayes Theorem – P(A|B) = P(B|A) P(A) / P(B) g \ oOSAVEODEL SAVE DEL0.20.8

17 An example – exercise Observation probability: P(o | g) Transition probability: P(g | g) If we observe the user saying they want to save and then saying they want to delete, what is the probability they want to save in the second turn. i.e. what is: P(g 2 | o 1 = OSAVE, o 2 = ODEL) g \ oOSAVEODEL SAVE DEL g \ gSAVEDEL SAVE DEL0.01.0

18 An example – answer g2g2 g1=SAVEg1=DELTOTALPROB SAVE0.5*0.8*0.9*0.20.5*0.2*0* DEL0.5*0.8*0.1*0.80.5*0.2*1*

19 An example – expanding further In general we will want to compute probabilities conditional on the observations (we will call this the data D). This always becomes a marginal on the joint distribution with the observation probabilities fixed. e.g. These sums can be computed much more cleverly using dynamic programming

20 Belief Propagation Interested in the marginals p(x|D) Assume network is a tree with observations above and below D = {D a, D b } x D a D b

21 Belief Propagation When we split D b = {D c, D d } These are called the messages into x. We have one message for every probability factor connected x D a D c D d

22 Belief Propagation - message passing ab D a D b

23 Belief Propagation - message passing a b D a D b c D c

24 Belief Propagation We can do the same thing repeatedly. Start on one side, and keep getting p(x|D a ) Then start on the other ends and keep getting p(D b |x) To get a marginal simply multiply these

25 Belief Propagation – our example g1g1 o1o1 g2g2 o2o2 Write probabilities as vectors with SAVE on top

26 Parameter Learning – The problem g type u type g food u food o a U G g type u type g food u food o a U G

27 Parameter Learning – The problem For every (action, goal, goal) triple there is a parameter The parameters are a probability table of P(g|g,a) The goals are all hidden and factorized and there are many of them g t-1 gtgt atat

28 Parameter Learning – Some options 1. Hand-craft Roy et al, Zhang et al, Young et al, Thomson et al, Bui et al 2. Annotate user goal and use Maximum Likelihood Williams et al, Kim et al, Henderson & Lemon Isnt always possible 3. Expectation Maximisation Doshi & Roy (7 states), Syed et al (no goal changes) Uses an unfactorised state Intractable 4. Expectation Propagation (EP) Allows parameter tying (details in paper) Handles factorized hidden variables Handles large state spaces Doesnt require any annotations (incl of user act) – though it does use the semantic decoder output

29 Belief Propagation as message passing ab D a D b Message from outside the factor – q \ (a) input message from above a Message from outside the factor – q \ (b) product of input messages below b Message from this factor to b – q * (b) Message from this factor to a – q * (a)

30 Belief Propagation as message passing ab D a D b Message from outside network – q \ (a) = p(a|D a ) Message from outside network – q \ (b) = p(D b |a) Message from this factor – q * (b) = p(b|D b ) Message from this factor – q * (a) = p(D b |a) q * (b)q * (a)q \ (a) q \ (b) Think in terms of approximations from each probability factor

31 Belief Propagation – Unknown parameters? Imagine we have a discrete choice for the parameters Integrate over our estimate from the rest of the network: To estimate, we want to sum over a and b:

32 Belief Propagation – Unknown parameters? But we actually have continuous parameters Integrate over our estimate from the rest of the network: To estimate, we want to sum over a and b:

33 Expectation Propagation This doesnt make sense – is a probability! Multiplying by q \ () gives: Choose q * () to minimize KL divergence with this If we restrict ourselves to Dirichlet distributions, we need to find the Dirichlet that best matches a mixture of Dirichlets

34 Expectation Propagation – Example g type u o a g u o a g

35 Expectation Propagation – Example g type u o a g u o a g

36 Expectation Propagation – Example g type u o a g u o a g p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2]

37 Expectation Propagation – Example g type u o a g u o a g p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2] inform(type=bar) [0.5] inform(type=hotel) [0.2]

38 p(u=bar|g) 0.4 * p(u=hotel|g) 0.1 Expectation Propagation – Example g type u o a g u o a g p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2] inform(type=bar) [0.5] inform(type=hotel) [0.2] type=bar [0.45] type=hotel [0.18]

39 p(u=bar|g) 0.4 * p(u=hotel|g) 0.1 Expectation Propagation – Example g type u o a g u o a g p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2] inform(type=bar) [0.5] inform(type=hotel) [0.2] type=bar [0.45] type=hotel [0.18] type=bar [0.44] type=hotel [0.17]

40 p(u=bar|g) 0.4 * p(u=hotel|g) 0.1 Expectation Propagation – Example g type u o a g u o a g p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2] inform(type=bar) [0.5] inform(type=hotel) [0.2] type=bar [0.45] type=hotel [0.18] type=bar [0.44] type=hotel [0.17] p(o|inform(type=bar)) [0.6] p(o|inform(type=rest)) [0.3]

41 p(u=bar|g) 0.4 * p(u=hotel|g) 0.1 Expectation Propagation – Example g type u o a g u o a g p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2] inform(type=bar) [0.5] inform(type=hotel) [0.2] type=bar [0.45] type=hotel [0.18] type=bar [0.44] type=hotel [0.17] p(o|inform(type=bar)) [0.6] p(o|inform(type=rest)) [0.3]

42 Expectation Propagation – Optimisation 1 In dialogue systems, most of the values are equally likely We can use this to reduce computations: - Compute the q distributions only once - Multiply instead of summing the same value repeatedly Number of stars Twee stars please

43 Expectation Propagation – Optimisation 2 For each value, assume transition to most other values is the same (mostly constant factor) e.g. constant probability of change The reduced number of parameters means we can speed up learning too!

44 Results – Computation times No opt Grouping Const Change Both

45 Results – Simulated re-ranking Train on 1000 simulated dialogues Re-rank simulated semantics on 1000 dialogues Oracle accuracy is 93.5% TAcc – Semantic accuracy of the top hypothesis NCE – Normalized Cross Entropy Score (Confidence scores) ICE – Item Cross Entropy Score (Accuracy + Confidence) TAccNCEICE No rescoring Trained with noisy semantics Trained with semantic annotations

46 Results – Data re-ranking Train on Mar09 TownInfo trial data (720 dialogues) Test on Feb08 TownInfo trial data (648 dialogues) Oracle accuracy is 79.2% TAccNCEICE No rescoring Trained with noisy semantics Trained with semantic annotations

47 Results – Simulated dialogue management Use reinforcement learning (Natural Actor Critic algorithm) to train two systems: One uses hand-crafted parameters One uses parameters learned from 1000 simulated dialogues

48 Results – Live evaluations (control) Tested in the Spoken Dialogue Challenge Provide bus timetables in Pittsburgh 800 road names (pairs represent a stop). Required to get place from, to and time All parameters of the Cambridge system were hand-crafted # Dial# Succ% SuccWER BASELINE / System / Cambridge / System /

49 Results – Live evaluations (control) WER Estimated success rate CAM BASELINE CAM Success CAM Failure BASELINE Success BASELINE Failure

50 Summary POMDP models are an effective model of dialogue: For use in dialogue systems For re-ranking semantic hypotheses off-line Expectation Propagation allows parameter learning for complex models, without annotations of dialogue state Experiments show: EP gives improvements in re-ranked hypotheses EP gives improvements in simulated dialogue management performance Probabilistic belief gives improvements in live dialogue management performance

51 Current/Future work Using the POMDP as a simulator too Need to change the model to better handle user acts (the sub-acts are not independent!) g type u type g food u food g type u g food

52 The End – Thanks! Dialogue Group homepage: My homepage:

53 Expectation Propagation – Optimisations AB Assume this is constant for A-A* Compute this offline


Download ppt "Statisical Spoken Dialogue System Talk 2 – Belief tracking CLARA Workshop Presented by Blaise Thomson Cambridge University Engineering Department"

Similar presentations


Ads by Google