Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments Matthew Purver, Patrick Ehlen, John Niekrasz Computational Semantics.

Similar presentations


Presentation on theme: "Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments Matthew Purver, Patrick Ehlen, John Niekrasz Computational Semantics."— Presentation transcript:

1 Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments Matthew Purver, Patrick Ehlen, John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information Stanford University

2 The CALO Project Multi-institution, multi-disciplinary project Working towards an intelligent personal assistant that learns Three major areas – managing personal data  clustering email, documents, managing contacts – assisting with task execution  learning to carry out computer-based tasks – observing interaction in meetings

3 The CALO Meeting Assistant Observe human-human meetings – Audio recording & speech recognition (ICSI/CMU) – Video recording & processing (MIT/CMU) – Written notes, via digital ink (NIS) or typed (CMU) – Whiteboard sketch recognition (NIS) Produce a useful record of the interaction – answer questions about what happened – can be used by attendees or non-attendees Learn to do this better over time (LITW)

4 The CALO Meeting Assistant Primary focus on end-user Develop something that can really help people when it comes to dealing with all of the meetings we have to deal with

5 What do people want to know from meetings?

6 Banerjee et al. (2005) survey of 12 academics: – Missed meeting - what do you want to know? – Topics: which were discussed, what was said? – Decisions: what decisions were made? – Action items/tasks: was I assigned something?

7 What do people want to know from meetings? Banerjee et al. (2005) survey of 12 academics: – Missed meeting - what do you want to know? – Topics: which were discussed, what was said? – Decisions: what decisions were made? – Action items/tasks: was I assigned something? Lisowska et al. (2004) survey of 28 people: – What would you ask a meeting reporter system? – Similar responses about topics, decisions – who attended, who asked/decided what? – Did they talk about me?

8 Purpose Helpful system: not only records and transcribes a meeting, but extracts (from streams of potentially messy human-human speech): – topics discussed – decisions made – tasks assigned (“action items”) The system should highlight this information over meeting “noise”

9 Example Impromptu meeting you might have after your team has boarded a rebel spacecraft in search of stolen plans, and you’re trying to figure out what to do next

10 Commander, tear this ship apart until you’ve found those plans!

11

12 A section of discourse in a meeting where someone is made responsible to take care of something

13

14 Action Items Concrete decisions; public commitments to be responsible for a particular task Want to know: – Can we find them? – Can we produce useful descriptions of them? Not aware of previous discourse-based work

15 Action Item Detection in Email Corston-Oliver et al., 2004 Marked a corpus of email with “dialogue acts” Task act: – “items appropriate to add to an ongoing to-do list” Good inter-annotator agreement (kappa > 0.8) Per-sentence classification using SVMs – lexical features e.g. n-grams; punctuation; message features – f-scores around 0.6

16 A First Try: Flat Annotation Gruenstein et al (2005) analyzed 65 meetings annotated from: – ICSI Meeting Corpus (Janin et al., 2003) – ISL Meeting Corpus (Burger et al., 2002) Two human annotators “Mark utterances relating to action items” – create groups of utterances for each AI – made no distinction between utterance type/role

17 A First Try: Flat Annotation (cont’d) Annotators identified 921 / 1267 (respectively) action item-related utterances Human agreement poor (  < 0.4) Tried binary classification using SVMs (like Corston-Oliver) Precision, recall, f-score: all below.25

18 Try a more restricted dataset? Sequence of 5 (related) CALO meetings – similar amount of ICSI/ISL data for training Same annotation schema SVMs with words & n-grams as features – Also tried other discriminative classifiers, and 2- & 3- grams, w/ no improvements Similar performance – Improved f-scores (0.30 - 0.38), but still poor – Recall up to 0.67, precision still low (< 0.36)

19 Should we be surprised? Our human annotator agreement poor DAMSL schema has dialogue acts Commit, Action-directive – annotator agreement poor (  ~ 0.15) – (Core & Allen, 1997) ICSI MRDA dialogue act commit – Most DA tagging work concentrates on 5 broad DA classes Perhaps “action items” comprise a more heterogeneous set of utterances

20 Rethinking Action Item Acts Maybe action items are not aptly described as singular “dialogue acts” Rather: multiple people making multiple contributions of several types Action item-related utterances represent a form of group action, or social action That social action has several components, giving rise to a heterogeneous set of utterances What are those components?

21 Commander, tear this ship apart until you’ve found those plans! A person commits or is committed to “own” the action item

22 Commander, tear this ship apart until you’ve found those plans! A person commits or is committed to “own” the action item A description of the task itself is given

23 Commander, tear this ship apart until you’ve found those plans! A person commits or is committed to “own” the action item A description of the task itself is given A timeframe is specified

24 Yes, Lord Vader! A person commits or is committed to “own” the action item A description of the task itself is given A timeframe is specified Some form of agreement

25 Exploiting discourse structure Action items have distinctive properties – Task description, owner, timeframe, agreement Action item utterances can simultaneously play different roles – assigning properties – agreeing/committing These classes may be more homogeneous & distinct than looking for just “action item” utts. – Could improve classification performance

26 New annotation schema Annotated and classified again using the new schema Classify utterances by their role in the action item discourse – can play more than one role Define action items by grouping subclass utterances together in an action-item discussion – a subclass can be missing

27 Action Item discourse: an example

28 New Experiment Annotated same set of CALO/ICSI/ISL data using the new schema Ran classifiers to train and identify utterances that contain each of the 4 subclasses

29 Encouraging signs Between-class distinction (cosine distances) – Agreement vs. any other is good: 0.05 to 0.12 – Timeframe vs. description is OK: 0.25 – Owner/timeframe/description: 0.36 to 0.47 Improved inter-annotator agreement? – Timeframe:  = 0.86 – Owner 0.77, agreement & description 0.73 – Warning: this is only on one meeting, although it’s the most difficult one we could find

30 Combined classification Still don’t have enough data for proper combined classification – Recall 0.3 to 0.5, precision 0.1 to 0.5 – Agreement subclass is best, with f-score = 0.40 Overall decision based on sub-classifier outputs Ad-hoc heuristic: – prior context window of 5 utterances – agreement plus one other class

31 Questions we can ask Does overall classification look useful? – Whole-AI-based f-score 0.40 to 1.0 (one meeting perfectly correlated with human annotation) Does overall output improve sub-classifiers? – Agreement: f-score 0.40  0.43 – Timescale: f-score 0.26  0.07 – Owner: f-score 0.12  0.24 – Description: f-score 0.33  0.24

32 Example output From a CALO meeting:  t = [the, start, of, week, three, just, to]  o = [reconfirm, everything, and, at, that, time, jack, i'd, like, you, to, come, back, to, me, with, the]  d = [the, details, on, the, printer, and, server]  a = [okay] Another (less nice?) example:  o = [/h#/, so, jack, /uh/, for, i'd, like, you, to]  d = [have, one, more, meeting, on, /um/, /h#/, /uh/]  t = [in, in, a, couple, days, about, /uh/]  a = [/ls/, okay]

33 Where next for action items? More data annotation – Using NOMOS, our annotation tool Meeting browser to get user feedback Improved individual classifiers Improved combined classifier – maximum entropy model – not enough data yet Moving from words to symbolic output – Gemini (Dowding et al., 1990) bottom-up parser

34

35 Questions we can ask Does overall classification look useful? – Whole-AI-based f-score 0.40 to 1.0 (one meeting perfectly correlated with human annotation) Does overall output improve sub-classifiers? – Agreement: f-score 0.40  0.43 – Timescale: f-score 0.26  0.07 – Owner: f-score 0.12  0.24 – Description: f-score 0.33  0.24


Download ppt "Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments Matthew Purver, Patrick Ehlen, John Niekrasz Computational Semantics."

Similar presentations


Ads by Google