HTL-ACTS Workshop, June 2006, New York City Improving Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon University
Outline 1. Speech Acts: Can we do it? What for? Introduction Introduction Data Data Applications Applications 2. Language Cues Preprocessing Preprocessing N-grams N-grams 3. Results
Motivation classification for classification for topic/folder identification topic/folder identification spam/non-spam spam/non-spam Speech-act classification in conversational speech (aka dialog act classification) Speech-act classification in conversational speech (aka dialog act classification) is new domain - multiple acts/msg is new domain - multiple acts/msg Winograd’s Coordinator (1987): users manually annotated with intent. Winograd’s Coordinator (1987): users manually annotated with intent. Extra work for (lazy) users Extra work for (lazy) users Murakoshi et al (1999): hand-coded rules for identifying speech-act like labels in Japanese s Murakoshi et al (1999): hand-coded rules for identifying speech-act like labels in Japanese s
“ Acts” Taxonomy An Act is described as a verb-noun pair (e.g., propose meeting, request information) - Not all pairs make sense Single message may contain multiple acts Try to describe commonly observed behaviors, rather than all possible speech acts in English Also include non-linguistic usage of (e.g. delivery of files) From: Benjamin Han To: Vitor Carvalho Subject: LTI Student Research Symposium Hey Vitor When exactly is the LTI SRS submission deadline? Also, don’t forget to ask Eric about the SRS webpage. Thanks. Ben Request - Information Reminder - Action/Task
Classifying into Acts [Cohen, Carvalho & Mitchell, EMNLP-04] An Act is a verb-noun pair (e.g., propose meeting) An Act is a verb-noun pair (e.g., propose meeting) One single message may contain multiple acts. Not all pairs make sense. One single message may contain multiple acts. Not all pairs make sense. Try to describe commonly observed behaviors, rather than all possible speech acts. Try to describe commonly observed behaviors, rather than all possible speech acts. Also include non-linguistic usage of (delivery of files) Also include non-linguistic usage of (delivery of files) Most of the acts can be learned (EMNLP-04) Most of the acts can be learned (EMNLP-04) Nouns Verbs
Acts - Applications Improved clients. Improved clients. Negotiating/managing shared tasks is a central use of Negotiating/managing shared tasks is a central use of Tracking commitments, delegations, pending answers Tracking commitments, delegations, pending answers Integrating to-do/task lists to , etc. Integrating to-do/task lists to , etc. overload overload Iterative Learning of Tasks and Speech Acts Iterative Learning of Tasks and Speech Acts Predicting Social Roles and Group Leadership. Predicting Social Roles and Group Leadership. Kushmerick et al, AAAI-06 Kushmerick & Khousainov, IJCAI-05, CEAS-05 Leusky, SIGIR-04 Carvalho et al. in progress
Data: CSPACE Corpus Few large, free, natural corpora are available Few large, free, natural corpora are available CSPACE corpus (Kraut & Fussell) CSPACE corpus (Kraut & Fussell) o s associated with a semester-long project for Carnegie Mellon MBA students in 1997 o 15,000 messages from 277 students, divided in 50 teams (4 to 6 students/team) o Rich in task negotiation. o messages (5 teams) had their “Speech Acts” labeled. o One of the teams was double labeled, and the inter- annotator agreement ranges from 72 to 83% (Kappa) for the most frequent acts.
Inter-Annotator Agreement Kappa Statistic Kappa Statistic A = probability of agreement in a category A = probability of agreement in a category R = prob. of agreement for 2 annotators labeling at random R = prob. of agreement for 2 annotators labeling at random Kappa range: -1…+1 Kappa range: -1…+1 Inter-Annotator Agreement Act Kappa Deliver 0.75 Commit 0.72 Request 0.81 Amend 0.83 Meeting 0.82 Propose 0.72
Overview on Entire Corpus Act-Learner V-PerceptronAdaBoostSVMDecision Trees Request Propose Deliver Commit Error Rate
PreProcessing Signature and Quoted removal Signature and Quoted removal
Request Act: IG n-grams
Error Rate Analysis
Idea: Predicting Acts from Surrounding Acts Delivery Request Commit Proposal Request Commit Delivery Commit Delivery > Act has little or no correlation with other acts of same message Strong correlation with previous and next message’s acts Example of Thread Sequence