Presentation is loading. Please wait.

Presentation is loading. Please wait.

DQA meeting: 17.07.2007: Learning more effective dialogue strategies using limited dialogue move features Matthew Frampton & Oliver Lemon, Coling/ACL-2006.

Similar presentations


Presentation on theme: "DQA meeting: 17.07.2007: Learning more effective dialogue strategies using limited dialogue move features Matthew Frampton & Oliver Lemon, Coling/ACL-2006."— Presentation transcript:

1 DQA meeting: 17.07.2007: Learning more effective dialogue strategies using limited dialogue move features Matthew Frampton & Oliver Lemon, Coling/ACL-2006 Presented by: Mark Hepple

2 Data-driven methodology for SDS Development Broader context = realising a “data-driven” methodology for creating SDSs, with following steps: –1. Collect data (using prototype or WOZ) –2. Build probabilistic user simulation from data (covering user behaviour, ASR errors) –3. [Feature selection - using USimulation] –4. Learn dialog strategy, using reinforcement learning over system interactions with simulation

3 Task Information seeking dialog systems –Specifically task-oriented, slot-filling dialogs, leading to database query –E.g. getting user requirements for a flight booking (c.f. COMMUNICATOR task) –Aim is to achieve effective system strategy for such dialog interactions

4 Reinforcement Learning System modelled as a Markov Decision Process (MDP) –model decision making in situations where outcomes partly random, partly under system control Reinforcement learning used to learn an effective policy –determines best action to take in each situation Aim is to maximize overall reward –need a reward function, assigning reward value of different dialogs

5 Action set of dialog system –1. Ask open qu. (how may I help you?) –2. Ask value for slot 1..n –3. Explicitly confirm a slot 1..n –4. Ask for slot k, whilst implicitly confirm slot k-1 or k+1 –5. Give help –6. Pass to human operator –7. Database query

6 Reward function Reward function is “all-or-nothing”: –1. DB query, all slots confirmed = +100 –2. Any other DB query = -75 –3. Usimulation hangs up = -100 –4. System passes to human operator = -50 –5. Each system turn = -5

7 N-Gram User Simulation Employs n-gram user simulation of Georgila, Lemon & Henderson: –Derived from annotated version of COMMUNICATOR data –Treat dialog as sequence of pairs of DAs/tasks –Output next user “utterance” as DA/task pair, based on last n-1 pairs –Incorporate effects of ASR errors (built from user utts as recognised by ASR components of original COMMUNICATOR systems) –Have separate 4- and 5-gram simulations: used for training/testing

8 Key Question: what context features to use Past work - has used on limited state information –Based on number/fill-status of slots Proposal: include richer context information, specifically –Dialog act (DA) of last system turn –DA of last user turn

9 Experiments Compare 3 systems –Baseline: slot features only –Strat 2: slot features + last user DA –Strat 3: slot features + last user + system Das Train with 4-gram, test with 5-gram USim, and vice versa

10 Results Main reported result is improvement in average reward level of dialogs for strategies, compared to baseline –Str-2 improves over baseline by 4.9% –Str-3 improves over baseline by 7.8% All 3 strategies achieve 100% slot filling and confirmation Augmented strategies also improve over baseline w.r.t. average dialogue length

11 Qualitative Analysis Learns to: –Only query DB when all slots filled –Not pass to operator –Use implicit confirmation where possible Emergent behaviour: –When baseline system fails to fill/confirm slot from user input, state remains same, and system will repeat same action –For augmented systems, ‘state’ changes, so can learn to do different action, e.g. ask about a different slot, or use “give help” action

12 Questions/Comments Value of performance improvement figures based on reward? Does improvement w.r.t. reward function -> improvement for human-machine dialogs Validity of comparisons to COMMr systems Why does system performance improve? –Is avoidance of repetition the key?


Download ppt "DQA meeting: 17.07.2007: Learning more effective dialogue strategies using limited dialogue move features Matthew Frampton & Oliver Lemon, Coling/ACL-2006."

Similar presentations


Ads by Google