Presentation is loading. Please wait.

Presentation is loading. Please wait.

HTL-ACTS Workshop, June 2006, New York City Improving Email Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon.

Similar presentations


Presentation on theme: "HTL-ACTS Workshop, June 2006, New York City Improving Email Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon."— Presentation transcript:

1 HTL-ACTS Workshop, June 2006, New York City Improving Email Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon University

2 Outline 1. Email Speech Acts: Can we do it? What for? Introduction Introduction Data Data Applications Applications 2. Language Cues Preprocessing Preprocessing N-grams N-grams 3. Results

3 Motivation Email classification for Email classification for topic/folder identification topic/folder identification spam/non-spam spam/non-spam Speech-act classification in conversational speech (aka dialog act classification) Speech-act classification in conversational speech (aka dialog act classification) email is new domain - multiple acts/msg email is new domain - multiple acts/msg Winograd’s Coordinator (1987): users manually annotated email with intent. Winograd’s Coordinator (1987): users manually annotated email with intent. Extra work for (lazy) users Extra work for (lazy) users Murakoshi et al (1999): hand-coded rules for identifying speech-act like labels in Japanese emails Murakoshi et al (1999): hand-coded rules for identifying speech-act like labels in Japanese emails

4 “Email Acts” Taxonomy An Act is described as a verb-noun pair (e.g., propose meeting, request information) - Not all pairs make sense Single email message may contain multiple acts Try to describe commonly observed behaviors, rather than all possible speech acts in English Also include non-linguistic usage of email (e.g. delivery of files) From: Benjamin Han To: Vitor Carvalho Subject: LTI Student Research Symposium Hey Vitor When exactly is the LTI SRS submission deadline? Also, don’t forget to ask Eric about the SRS webpage. Thanks. Ben Request - Information Reminder - Action/Task

5 Classifying Email into Acts [Cohen, Carvalho & Mitchell, EMNLP-04] An Act is a verb-noun pair (e.g., propose meeting) An Act is a verb-noun pair (e.g., propose meeting) One single email message may contain multiple acts. Not all pairs make sense. One single email message may contain multiple acts. Not all pairs make sense. Try to describe commonly observed behaviors, rather than all possible speech acts. Try to describe commonly observed behaviors, rather than all possible speech acts. Also include non-linguistic usage of email (delivery of files) Also include non-linguistic usage of email (delivery of files) Most of the acts can be learned (EMNLP-04) Most of the acts can be learned (EMNLP-04) Nouns Verbs

6 Email Acts - Applications Improved email clients. Improved email clients. Negotiating/managing shared tasks is a central use of email Negotiating/managing shared tasks is a central use of email Tracking commitments, delegations, pending answers Tracking commitments, delegations, pending answers Integrating to-do/task lists to email, etc. Integrating to-do/task lists to email, etc. Email overload Email overload Iterative Learning of Email Tasks and Speech Acts Iterative Learning of Email Tasks and Speech Acts Predicting Social Roles and Group Leadership. Predicting Social Roles and Group Leadership. Kushmerick et al, AAAI-06 Kushmerick & Khousainov, IJCAI-05, CEAS-05 Leusky, SIGIR-04 Carvalho et al. in progress

7 Data: CSPACE Corpus Few large, free, natural email corpora are available Few large, free, natural email corpora are available CSPACE corpus (Kraut & Fussell) CSPACE corpus (Kraut & Fussell) o Emails associated with a semester-long project for Carnegie Mellon MBA students in 1997 o 15,000 messages from 277 students, divided in 50 teams (4 to 6 students/team) o Rich in task negotiation. o 1500+ messages (5 teams) had their “Speech Acts” labeled. o One of the teams was double labeled, and the inter- annotator agreement ranges from 72 to 83% (Kappa) for the most frequent acts.

8 Inter-Annotator Agreement Kappa Statistic Kappa Statistic A = probability of agreement in a category A = probability of agreement in a category R = prob. of agreement for 2 annotators labeling at random R = prob. of agreement for 2 annotators labeling at random Kappa range: -1…+1 Kappa range: -1…+1 Inter-Annotator Agreement Email Act Kappa Deliver 0.75 Commit 0.72 Request 0.81 Amend 0.83 Meeting 0.82 Propose 0.72

9 Overview on Entire Corpus Act-Learner V-PerceptronAdaBoostSVMDecision Trees Request 0.250.220.230.2 Propose 0.110.12 0.1 Deliver 0.260.280.270.3 Commit 0.150.140.170.15 Error Rate

10 PreProcessing Signature and Quoted removal Signature and Quoted removal

11 Request Act: IG n-grams

12

13 Error Rate Analysis

14

15 Idea: Predicting Acts from Surrounding Acts Delivery Request Commit Proposal Request Commit Delivery Commit Delivery > Act has little or no correlation with other acts of same message Strong correlation with previous and next message’s acts Example of Email Thread Sequence


Download ppt "HTL-ACTS Workshop, June 2006, New York City Improving Email Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon."

Similar presentations


Ads by Google