Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Dialogue Modelling Milica Gašić Dialogue Systems Group.

Similar presentations


Presentation on theme: "Statistical Dialogue Modelling Milica Gašić Dialogue Systems Group."— Presentation transcript:

1

2 Statistical Dialogue Modelling Milica Gašić Dialogue Systems Group

3 Why are current methods poor?

4 Dialogue Manager Text to Speech Synthesiser Speech Recogniser Semantic Decoder Natural Language Generator I’m looking for a restaurant inform(type=restaurant) request(food) What kind of food would you like? Dialogue Manager Dialogue Model Dialogue Policy What the user wants? What to say back to the user?

5 Elements of Dialogue Management a3a3 s3s3 o3o3 a2a2 s2s2 o2o2 a1a1 s1s1 o1o1 sTsT oToT Turn 1Turn 2Turn 3 Turn T observations states actions What the system says What the user wants What the system hears a T-1

6 Example (Commercial systems) hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello() 0.5 inform(food=turkish) 0.3 inform(food=thai) 0.2 hello() 0.5 inform(food=turkish) 0.3 inform(food=thai) 0.2 I’m looking for a Thai restaurant. Thai. type food statesactionsobservations typefood 0.6 R R R R You are looking for a restaurant right?

7 Example (Baseline tracker in the practical session) hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello() 0.5 inform(food=turkish) 0.3 Inform(food=thai) 0.2 hello() 0.5 inform(food=turkish) 0.3 Inform(food=thai) 0.2 I’m looking for a Thai restaurant. Thai. What kind of food do you want? type food statesactionsobservations typefood R R 1.0 R R TH 0.4 TR 0.3

8 Example (Focus tracker in the practical session) hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello (type=restaurant) 0.6 inform(type=restaurant,food=thai) 0.4 hello() 0.5 inform(food=turkish) 0.3 Inform(food=thai) 0.2 hello() 0.5 inform(food=turkish) 0.3 Inform(food=thai) 0.2 I’m looking for a Thai restaurant. Thai. type food statesactionsobservations typefood R R 1.0 R R TH 0.4 TR 0.3 What kind of food do you want? Did you say Thai or Turkish? TH 0.4

9 Challenges in dialogue modelling How to define the state space? How to tractably maintain the belief state? Which actions to take?

10 1. How to represent dialogue state? Needs to know what happened before – the dialogue history Markov property Needs to know what user wants – the user goal Task oriented dialogue Needs to know what user says – the user act Robust to errors

11 Ontology defines possible user goals typerestaurantareanorthsouthfoodChineseIndianhotelstarts

12 Example of a belief state (PRACTICAL SESSION) Goal-labels Food British 1.0 Area Don’t-care 0.97 Method-label Byconstraints 0.72 Requested-slots Pricerange 0.88

13 2. How to track belief state?

14 Generative vs Discriminative models Discriminative models: the state depends on the observation Generative models: the state generates the observation stst otot stst otot

15 Focus Dialogue State Tracker – a discriminative model Where slu(s) is the probability that semantic decoder gives to the state element s (TO BE IMPLEMENTED AT THE PRACTICAL SESSION)

16 Partially Observable Markov Decision Process (POMDP) – a generative model atat stst s t+1 rtrt otot o t+1 State is unobservable and depends on the previous state and action: P(s t+1 |s t, a t ) – the transition probability State generates a noisy observation P(o t |s t ) -- the observation probability

17 A bit of theory… Belief propagation Probabilities conditional on the observations Interested in the marginal probabilities p(x|D), D={D a,D b } D a D b x

18 A bit of theory… Belief propagation D a D b x D c D d Split D b further into D c and D d

19 A bit of theory… Belief propagation D a D c a c D b b

20 A bit of theory… Belief propagation D a D b a b

21 Belief state tracking Requires summation over every dialogue state!!! atat stst s t+1 rtrt otot o t+1 Requires summation over all possible states at every dialogue turn – intractable!!!

22 Dialogue state factorisation Decompose the sate into conditionally independent elements: user goal user action stst gtgt utut dtdt dialogue history atat rtrt otot o t+1 g t+1 u t+1 d t+1

23 Belief update gtgt utut dtdt atat rtrt otot o t+1 g t+1 u t+1 d t+1 Requires summation over all possible goals– intractable!!! Requires summation over all possible histories and user actions– intractable!!!

24 Bayesian Update of Dialogue State system Further decomposes the dialogue state Tractable belief state update Learning of the shape of distribution

25 Bayesian network model for dialogue gtgt utut dtdt atat rtrt otot o t+1 g t+1 u t+1 d t+1 g t food d t food u t food g t area d t area u t area g t+1 food d t+1 food u t+1 food g t+1 area d t+1 area u t+1 food

26 Belief tracking For each node x Start on one side, and keep getting p(x|D a ) Then start on the other ends and keep getting p(D b |x) To get a marginal simply multiply these

27 2. Which action to take? What kind of food do you want? Did you say you want thai food? Do you want Thai or Turkish? request(food) confirm(food=Thai) select(food=Thai,food=Turkish) typefood R R 1.0 TR 0.3 TH 0.4

28 Dialogue as a sequential decision process a3a3 s3s3 o3o3 a2a2 s2s2 o2o2 a1a1 s1s1 o1o1 a T-1 sTsT oToT r3r3 r2r2 r1r1 rTrT Want to optimise the sequence Based on the reward

29 Reward function Reward is a measure of how good the dialogue is

30 Q-function Q-function measures the expected discounted reward that can be obtained at a belief state when an action is taken Takes into account the reward of the future actions Optimising the Q-function is equivalent to optimising the policy Discount Factor in (0,1] Reward Starting belief state Starting action Expectation with respect to policy π

31 How to optimise the Q-function? On-line in interaction with the environment Standard methods takes too many dialogues, need a simulated user Use a sample efficient method (eg. Gaussian processes, Kalman filters) and optimise in direct interaction with human users

32 Learning in interaction with real people Simulator trained On-line trained Success (%) 93.5 +/- 1.296.8 +/- 0.9

33 Conclusions Statistical dialogue modelling Requites compact state representation Generative vs discriminative dialogue model Policy optimised to maximise the reward


Download ppt "Statistical Dialogue Modelling Milica Gašić Dialogue Systems Group."

Similar presentations


Ads by Google