Presentation on theme: "A Meeting Browser that Learns Patrick Ehlen * Matthew Purver * John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information."— Presentation transcript:
A Meeting Browser that Learns Patrick Ehlen * Matthew Purver * John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information Stanford University
The CALO Meeting Assistant Observe human-human meetings – Audio recording & speech recognition – Video recording & processing – Process written and typed notes, & whiteboard sketches Produce a useful record of the interaction
The CALO Meeting Assistant (cont’d) Offload some cognitive effort from participants during meetings Learn to do this better over time For now, focus on identifying: – Action items people commit to during meeting – Topics discussed during meeting
Human Interpretation Problems Compare to a new temp taking notes during meetings of Spacely Sprockets
Human Interpretation Problems (cont’d) Problems in recognizing and interpreting content: – Lack of lexical knowledge, specialized idioms, etc – Overhearer understanding problem (Schober & Clark, 1989)
Machine Interpretation Problems Machine interpretation: – Similar problems with lexicon – Trying to do interpretation from messy, overlapping multi-party speech transcribed from ASR, with multiple word-level hypotheses
Human Interpretation Solution Human temp can still do a good job (while understanding little) at identifying things like action items When people commit to do things with others, they adhere to a rough dialogue pattern Temp performs shallow discourse understanding Then gets implicit or explicit “corrections” from meeting participants
How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves:
How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task Somebody needs to do this TZ-3146!
How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task – Owner Somebody needs to do this TZ-3146! I guess I could do it.
How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task – Owner – Timeframe Can you do it by tomorrow?
How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task – Owner – Timeframe – Agreement Sure.
How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task – Owner – Timeframe – Agreement Excellent! Sounds good to me! Sweet! Sure.
Machine Interpretation Shallow understanding for action item detection: – Use our knowledge of this exemplary pattern – Skip over deep semantic processing – Create classifiers that identify those individual moves – Posit action items – Get feedback from participants after meeting
Challenge to Machine Learning and UI Design Detection challenges: – classify 4 different types of dialogue moves – want classifiers to improve over time – thus, need differential feedback on interpretations of these different types of dialogue moves Participants should see and evaluate our results while doing something that’s valuable to them And, from those user actions, give us the feedback we need for learning
Feedback Proliferation Problem To improve action item detection, need feedback on performance of five classifiers (4 utterance classes, plus overall “this is an action item” class) All on noisy, human-human, multi-party ASR results So, we could use a lot of feedback
Feedback Proliferation Problem (cont’d) Need a system to obtain feedback from users that is: – light-weight and usable – valuable to users (so they will use it) – can solicit different types of feedback in a non- intrusive, almost invisible way
Feedback Proliferation Solution Meeting Rapporteur – a type of meeting browser used after the meeting
Feedback Proliferation Solution (cont’d) Many “meeting browser” tools are developed for research, and focus on signal replay Ours: – tool to commit action items from meeting to user’s to-do list – relies on implicit user supervision to gather feedback to retrain classification models
Subclass hypotheses Top hyp is highlighted Mouse-over hyps to change them Click to edit them (confirm, reject, replace, create)
Feedback Loop Each participant’s implicit feedback for a meeting is stored as an “overlay” to the original meeting data – Overlay is reapplied when participant views meeting data again – Same implicit feedback also retrains models – Creates a personalized representation of meeting for each participant, and personalized classification models
Problem In practice (e.g., CALO Y3 CLP data): – seem to get a lot of feedback at the superclass level (i.e., people are willing to accept or delete an action item) – but not as much (other than implicit confirmation) at subclass level (i.e., people are not as willing to change descriptions, etc)
Questions User feedback provides information along different dimensions: – Information about the time an event (like discussion of an action item) happened – Information about the text that describes aspects of the event (like the task description, owner, and timeframe)
Questions (cont’d) Which of these dimensions contribute most to improving models during retraining? Which dimensions require more cognitive effort for the user when giving feedback? What is the best balance between getting feedback information and not bugging the user too much? What is the best use of initiative in such a system (user- vs. system- initiative)? – During meeting? – After meeting?
Ideal Feedback Experiment Turn gold-standard human annotations of meeting data into posited “ideal” human feedback Using that ideal feedback to retrain, determine which dimensions (time, text, initiative) contribute most to improving classifiers
Ideal Feedback Experiment (cont’d) Results: – both time and text dimensions alone improve accuracy over raw classifier – using both time and text together performs best – textual information is more useful than temporal – user initiative provides extra information not gained by system-initiative
Wizard-of-Oz Experiment Create different Meeting Assistant interfaces and feedback devices (including our Meeting Rapporteur) See how real-world feedback data compares to the ideal feedback described above Assess how the tools affect and change behavior during meetings
Action Item Identification Use four classifiers to identify dialogue moves associated with action items in utterances of meeting participants Then posit the existence of an action item, along with its semantic properties (what, who, when) using those utterances uNuN u1u1 u2u2 Linearized utterances task agreement owner timeframe
Like hiring a new temp to take notes during meetings of Spaceley Sprockets Even if we say, “Just write down the action items people agree to, and the topics,” That temp will run up against a couple problems in recognizing and interpreting content (rooted in the collaborative underpinnings of semantics): – Overhearer understanding problem (Schober & Clark, 1989) – Lack of vocabulary knowledge, etc machine overhearer that uses noisy multi-party transcripts is even worse We do what a human overhearer might do: shallow discourse understanding If you were to go into a meeting you might not understand what was being talked about, but you could understand when somebody agreed to do something. why:? Because when people make commitments to do things with others, they typically adhere to a certain kind of dialogue pattern