Learning to Classify Email into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series.

Learning to Classify Email into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series - August 12 th 2004 - CMU

Imagine an hypothetical email assistant that can detect “speech acts”… Do you have any data with xml-tagged names? I need it ASAP! urgent Request - may take action - request pending Sure. I’ll put it together by Sunday. Here’s the tar ball on afs : ~vitor/names.tar.gz “should I add this Commitment to your to-do list?” Urgent Request - May take action A Commitment is detected. “Should I send Vitor a reminder on Sunday?” A Delivery of data is detected. - pending cancelled 1 2 3 Delivery is sent - to-do list updated

Outline 1)Setting the base  “Email speech act” Taxonomy  Data  Inter-annotator agreement 2)Results  Learnability of “email acts”  Different learning algorithms, “acts”, etc  Different representations 3)Improvements  Collective/Relational/Iterative classification

Related Work Email classification for Email classification for topic/folder identification topic/folder identification spam/non-spam spam/non-spam Speech-act classification in conversational speech Speech-act classification in conversational speech email is new domain - multiple acts/msg email is new domain - multiple acts/msg Winograd’s Coordinator (1987): users manually annotated email with intent. Winograd’s Coordinator (1987): users manually annotated email with intent. Extra work for (lazy) users Extra work for (lazy) users Murakoshi et al (1999): hand-coded rules for identifying speech-act like labels in Japanese emails Murakoshi et al (1999): hand-coded rules for identifying speech-act like labels in Japanese emails

“Email Acts” Taxonomy Single email message may contain multiple acts An Act is described as a verb-noun pair (e.g., propose meeting, request information) - Not all pairs make sense Try to describe commonly observed behaviors, rather than all possible speech acts in English Also include non-linguistic usage of email (e.g. delivery of files) From: Benjamin Han To: Vitor Carvalho Subject: LTI Student Research Symposium Hey Vitor When exactly is the LTI SRS submission deadline? Also, don’t forget to ask Eric about the SRS webpage. See you Ben Request - Information Reminder - action/task

A Taxonomy of “Email Acts” Verb Remind Propose Deliver Commit Request Amend Refuse Greet Other Negotiate Initiate Conclude

A Taxonomy of “Email Acts” Noun ActivityInformation Meeting Logistics Data Opinion Ongoing Activity Data Single Event MeetingOther Short Term Task Other Data Committee

A Taxonomy of “Email Acts” Noun ActivityInformation Meeting Logistics Data Opinion Ongoing Activity Data Single Event MeetingOther Short Term Task Other Data Committee Future work: integration with task-oriented email clustering Only will consider predicting top-level tasks, not recursive structure

Corpora 4 different datasets: 4 different datasets: from CSpace (Management game at GSIA) from CSpace (Management game at GSIA) N01F3 (351 email messages) N01F3 (351 email messages) N02F2 (341 email messages) N02F2 (341 email messages) N03F2 (443 email messages) - N03F2 (443 email messages) - from Project World CALO (simulation game at SRI) from Project World CALO (simulation game at SRI) Pw_calo (222 email messages) Pw_calo (222 email messages) 4 to 6 participants in each group 4 to 6 participants in each group N03F2 was manually labeled by 2 different annotators (what’s the agreement?) N03F2 was manually labeled by 2 different annotators (what’s the agreement?)

Corpora Few large, natural email corpora are available Few large, natural email corpora are available CSPACE corpus (Kraut & Fussell) CSPACE corpus (Kraut & Fussell) o Email associated with a semester-long project for GSIA MBA students in 1997 o 15,000 messages from 277 students in 50 teams (4 to 6/team) o Rich in task negotiation o N02F2, N01F3, N03F2: all messages from students in three teams (341, 351, 443 messages). SRI’s “Project World” CALO corpus: SRI’s “Project World” CALO corpus: o 6 people in artificial task scenario over four days o 222 messages (publically available) Double-labeled

Inter-Annotator Agreement Kappa Statistic Kappa Statistic A = probability of agreement in a category A = probability of agreement in a category R = prob. of agreement for 2 annotators labeling at random R = prob. of agreement for 2 annotators labeling at random Kappa range: -1…+1 Kappa range: -1…+1 Inter-Annotator Agreement Email Act Kappa Deliver 0.75 Commit 0.72 Request 0.81 Amend 0.83 Propose 0.72

Inter-Annotator Agreement for messages with only one single “verb”

Learnability of Email Acts Features: un-weighted word frequency counts (BOW) 5-fold cross-validation (Directive = Req or Prop or Amd)

(Directive Act = Req or Prop or Amd) Using Different Learners

Learning Requests only

(Commissive Act = Delivery or Commitment) Learning Commissives

Learning Deliveries only

Learning to recognize Commitments

Overview on Entire Corpus Act Voted Perceptron AdaBoostSVMDecision Trees Request Error0.250.220.230.2 (450/907) F10.580.650.640.69 Propose Error0.110.12 0.1 (140/1217) F10.190.260.440.13 Deliver Error0.260.280.270.3 (873/484) F10.80.78 0.76 Commit Error0.150.140.170.15 (208/1149) F10.210.440.470.11 Directive Error0.250.23 0.19 (605/752) F10.720.73 0.78 Commissive Error0.23 0.240.22 (993/364) F10.84 0.830.85

Multi-class: learning algor. X agreement Annot1 X learnerReqPropAmdCmtDlv Request 2704324 Propose 12048 Amend 20664 Commit 11311 Deliver 172512104 (for messages with just one single category) Annot1 X Annot2 ReqPropAmdCmtDlv Request 551011 Propose 011130 Amend 001512 Commit 000243 Deliver 0104135

Request+Amend+Propose CommitDeliver Most Informative Features (are common words)

Learning: document representation Variants explored Variants explored TFIDF -> TF weighting (don’t downweight common words) TFIDF -> TF weighting (don’t downweight common words) bigrams bigrams For commitment: “i will”, “i agree”, in top 5 features For commitment: “i will”, “i agree”, in top 5 features For directive: “do you”, “could you”, “can you”, “please advise” in top 25 For directive: “do you”, “could you”, “can you”, “please advise” in top 25 count of time expressions count of time expressions words near a time expression words near a time expression words near proper noun or pronoun words near proper noun or pronoun POS counts POS counts

Baseline classifier: linear-kernel SVM with TFIDF weighting Baseline classifier: linear-kernel SVM with TFIDF weighting

Collective Classification (relational)

Collective Classification BOW classifier output as features (7 binary features = req, dlv, amd, prop, etc) BOW classifier output as features (7 binary features = req, dlv, amd, prop, etc) MaxEnt Learner, Training set = N03f2, Test set = N01f3 MaxEnt Learner, Training set = N03f2, Test set = N01f3 Features: current msg + parent msg + child message (1 st child only) Features: current msg + parent msg + child message (1 st child only) “Related” msgs = messages with a parent and/or child message “Related” msgs = messages with a parent and/or child message N01f3 dataset ReqDlvCmtPropAmdReqAmdPropDlvCmt Entire dataset (351) F154.6174.4734.6128.9816.0068.3080.97 Kappa28.2134.8823.9421.7613.0235.0022.84 “Related” msgs only (170) F156.9271.7138.0939.2122.2275.0080.47 Kappa33.0832.7424.0228.7217.9343.7027.14 … useful for “related” messages

Collective/Iterative Classification Start with baseline (BOW) Start with baseline (BOW) How to make updates? How to make updates? Chronological order Chronological order Using “family-heuristics” (child first, parent first, etc) Using “family-heuristics” (child first, parent first, etc) Using posterior probability Using posterior probability (Maximum Entropy learner) (Threshold, ranking, etc) TIME 0.85 0.53 0.65 0.95 0.85 0.93

Iterative Classification: Commitment

Iterative Classification: Request

Iterative Classification: Dlv+Cmt

Conclusions/Summary Negotiating/managing shared tasks is a central use of email Negotiating/managing shared tasks is a central use of email Proposed a taxonomy for “email acts” - could be useful for tracking commitments, delegations, pending answers, integrating to-do lists and calendars to email, etc Proposed a taxonomy for “email acts” - could be useful for tracking commitments, delegations, pending answers, integrating to-do lists and calendars to email, etc Inter-annotator agreement → 70-80’s (kappa) Inter-annotator agreement → 70-80’s (kappa) Learned classifiers can do this to some reasonable degree of accuracy (90% precision at 50-60% recall for top level of taxonomy) Learned classifiers can do this to some reasonable degree of accuracy (90% precision at 50-60% recall for top level of taxonomy) Fancy tricks with IE, bigrams, POS offer modest improvement over baseline TF-weighted systems Fancy tricks with IE, bigrams, POS offer modest improvement over baseline TF-weighted systems

Conclusions/Future Work Teamwork (Collective/Iterative classification) seems to helps a lot! Teamwork (Collective/Iterative classification) seems to helps a lot! Future work: Future work: Integrate all features + best learners + tricks…tune the system Integrate all features + best learners + tricks…tune the system Social network analysis Social network analysis

Learning to Classify Email into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series.

Similar presentations

Presentation on theme: "Learning to Classify Email into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning to Classify Email into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series.

Similar presentations

Presentation on theme: "Learning to Classify Email into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series."— Presentation transcript:

Similar presentations

About project

Feedback