Dialogue as a Partially Observable Markov Decision Process (POMDP) atat stst s t+1 rtrt otot o t+1 State is unobservable and depends on the previous state and action: P(s t+1 |s t, a t ) – the transition probability State depends on a noisy observation P(s t |o t ) -- the observation probability Action selection (policy) is based on the distribution over all states at every time step t – belief state b(s t )
How to track belief state? atat stst s t+1 rtrt otot o t+1
Belief state tracking Requires summation over every dialogue state!!! atat stst s t+1 rtrt otot o t+1 Requires summation over all possible states at every dialogue turn – intractable!!!
Challenges in POMDP dialogue modelling How to define the state space? How to tractably maintain the belief state? How to define transition and observation probabilities?
How to represent dialogue state? Needs to know what happened before – the dialogue history Markov property Needs to know what user wants – the user goal Task oriented dialogue Needs to know what user says – the user act Robust to errors
Dialogue state factorisation Decompose the sate into conditionally independent elements: user goal user action stst gtgt utut dtdt dialogue history atat rtrt otot o t+1 g t+1 u t+1 d t+1
Belief update gtgt utut dtdt atat rtrt otot o t+1 g t+1 u t+1 d t+1 Requires summation over all possible goals– intractable!!! Requires summation over all possible histories and user actions– intractable!!!
Dialogue models for real-world dialogue system Hidden Information State (HIS) system Bayesian Update of Dialogue State (BUDS) system
Hidden Information State system Real world dialogue system based on POMDP Takes an N-best input of user utterances Maintains a distribution over most probable dialogue states in real time
Hidden Information State system – dialogue acts inform ( pricerange = cheap, area = centre) dialogue act typesemantics slots and values Is there um maybe a cheap place in the centre of town please? inform request confirm … type=restaurant food=Chinese …
Hidden Information State system -- ontology typerestaurantareanorthsouthfoodChineseIndianhotelstarts
Hidden Information State system – belief update Only the user acts from the N-best Iist Dialogue histories take a small number of values Goals are grouped into partitions All probabilities are handcrafted
Dialogue history in the HIS system Dialogue history ideally represent everything that happened History states: system informed, user informed, user requested, system requested for each concept in the dialogue either 1 or 0 and defined by a finite state automaton
HIS partitions Represent group of (most probable) goals Dynamically built during the dialogue is set to a high value if g t+1 is in line with g t and a t, otherwise a small value
HIS partitions --example System: How may I help you? request(task) User: I’d like a restaurant in the centre. inform(entity=venue, type=restaurant, area=centre) entity ! venue entity venue type area !restaurant entity venue type area restaurant !central entity venue type area !restaurant central entity venue type area restaurant central entity=venue type=restaurant area=central
Pruning 23 entity ! venue entity venue type area !restaurant entity venue type area restaurant !central entity venue type area !restaurant central entity venue type area restaurant central entity=venue 0.9 type=restaurant 0.2 area=central 0.5
Hidden Information State systems Any limitations?
Bayesian Update of Dialogue State system Further decomposes the dialogue state Tractable belief state update Learning of the shape of distribution
Bayesian network model for dialogue gtgt utut dtdt atat rtrt otot o t+1 g t+1 u t+1 d t+1 g t food d t food u t food g t area d t area u t area g t+1 food d t+1 food u t+1 food g t+1 area d t+1 area u t+1 food
Belief tracking For each node x Start on one side, and keep getting p(x|D a ) Then start on the other ends and keep getting p(D b |x) To get a marginal simply multiply these
Bayesian network model for dialogue atat rtrt otot o t+1 g t food d t food u t food g t area d t area u t area g t+1 food d t+1 food u t+1 food g t+1 area d t+1 area u t+1 food θ
Training policy using different parameters Policy trained using reinforcement learning (explained in next lecture) Examined on different errors in the user input Average reward
Summary Essential ingredients to include in dialogue state Belief state maintaining Dialogue modelling for real world problems Learning of the shapes of probability distributions