HTL-ACTS Workshop, June 2006, New York City Improving Email Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon.

Slides:



Advertisements
Similar presentations
THE UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Improving IM Collaboration in the Workplace Kirstin Williams COMP
Advertisements

GRETA GREENHOUSE GAS REGISTRY FOR EMISSIONS TRADING ARRANGEMENTS Overview of the UNFCCC registries Workshop on China’s National administration and approval.
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.
Generation of Referring Expressions: Modeling Partner Effects Surabhi Gupta Advisor: Amanda Stent Department of Computer Science.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Page 1 NAACL-HLT BEA Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign.
Sentence Classifier for Helpdesk s Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht.
Classifying into Acts From EMNLP-04, Learning to Classify into Speech Acts, Cohen-Carvalho-Mitchell An Act is described as a verb- noun pair.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Albert Gatt Corpora and Statistical Methods Lecture 9.
Toastmasters Education Overview
June 2007 On-line Forum Etiquette MATE 4th ICT4ELT National Conference Agadir, June 2-4, 2007 Abdellatif Zoubair, Abdellatif Zoubair,
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning Carolyn Penstein Rosé, Jaime Arguello, Yue Cui, Rohit.
Guide to the TDM online system
Meeting the donor. General rules All meetings with donors should respect the following rules : a) each meeting is well prepared in advance: you know exactly.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Semantic on the Social Semantic Desktop.
Learning to Classify into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series.
Networking the World TM IEEE: Networking the World.
Issues in Multiparty Dialogues Ronak Patel. Current Trend  Only two-party case (a person and a Dialog system  Multi party (more than two persons Ex.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Notes accompany this presentation. Please select Notes Page view. These materials can be reproduced only with written approval from Gartner. Such approvals.
Requirements Elicitation. Structured Interview: Purpose is to ask and answer questions concerning the problem being modeled Observation: Observe activities.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Modeling Intention in Speech Acts, Information Leaks and User Ranking Methods Vitor R. Carvalho Carnegie Mellon University.
Software Life Cycle The software life cycle is the sequence of activities that occur during software development and maintenance.
Learning, Recognizing, and Assisting with Activities Tom Dietterich Oregon State University.
S556 SYSTEMS ANALYSIS & DESIGN Week 6. Using Language to Focus Thought (cf., Wood, 1997) SLIS S556 2  The language gives you a way to see:  a framework.
Learning TFC Meeting, SRI March 2005 On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.
Stephanie McFarland Knowledge Management Systems February 22, 2005.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
1 Modeling Intention in Vitor R. Carvalho Ph.D. Thesis DefenseThesis Committee: Language Technologies Institute William W. Cohen (chair) School of.
Predicting Leadership Roles in Workgroups Vitor R. Carvalho, Wen Wu and William W. Cohen Carnegie Mellon University CEAS-2007, Aug 2 nd 2007.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
SIGIR, August 2005, Salvador, Brazil On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.
Requirements Elicitation Hans Van Vliet, Software Engineering: Principles and Practices, 3rd edition, Wiley (Chapter 3) 1.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Final Project Presentation Information Extraction Learning to Extract Signature and Reply Lines from Vitor R. Carvalho.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
TagHelper Track Overview Carolyn Penstein Rosé Carnegie Mellon University Language Technologies Institute & Human-Computer Interaction Institute School.
Welcome to M301 P2 Software Systems & their Development
Language Technologies Institute Carnegie Mellon University
SAT Reading Test Overview
Unit4 Customer Portal Signing In and Account Management.
Unit4 Partner Portal for Case Creator
Towards a Personal Briefing Assistant
Ranking Users for Intelligent Message Addressing
Learning to Rank Typed Graph Walks: Local and Global Approaches
Stance Classification of Ideological Debates
Presentation transcript:

HTL-ACTS Workshop, June 2006, New York City Improving Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon University

Outline 1. Speech Acts: Can we do it? What for? Introduction Introduction Data Data Applications Applications 2. Language Cues Preprocessing Preprocessing N-grams N-grams 3. Results

Motivation classification for classification for topic/folder identification topic/folder identification spam/non-spam spam/non-spam Speech-act classification in conversational speech (aka dialog act classification) Speech-act classification in conversational speech (aka dialog act classification) is new domain - multiple acts/msg is new domain - multiple acts/msg Winograd’s Coordinator (1987): users manually annotated with intent. Winograd’s Coordinator (1987): users manually annotated with intent. Extra work for (lazy) users Extra work for (lazy) users Murakoshi et al (1999): hand-coded rules for identifying speech-act like labels in Japanese s Murakoshi et al (1999): hand-coded rules for identifying speech-act like labels in Japanese s

“ Acts” Taxonomy An Act is described as a verb-noun pair (e.g., propose meeting, request information) - Not all pairs make sense Single message may contain multiple acts Try to describe commonly observed behaviors, rather than all possible speech acts in English Also include non-linguistic usage of (e.g. delivery of files) From: Benjamin Han To: Vitor Carvalho Subject: LTI Student Research Symposium Hey Vitor When exactly is the LTI SRS submission deadline? Also, don’t forget to ask Eric about the SRS webpage. Thanks. Ben Request - Information Reminder - Action/Task

Classifying into Acts [Cohen, Carvalho & Mitchell, EMNLP-04] An Act is a verb-noun pair (e.g., propose meeting) An Act is a verb-noun pair (e.g., propose meeting) One single message may contain multiple acts. Not all pairs make sense. One single message may contain multiple acts. Not all pairs make sense. Try to describe commonly observed behaviors, rather than all possible speech acts. Try to describe commonly observed behaviors, rather than all possible speech acts. Also include non-linguistic usage of (delivery of files) Also include non-linguistic usage of (delivery of files) Most of the acts can be learned (EMNLP-04) Most of the acts can be learned (EMNLP-04) Nouns Verbs

Acts - Applications Improved clients. Improved clients. Negotiating/managing shared tasks is a central use of Negotiating/managing shared tasks is a central use of Tracking commitments, delegations, pending answers Tracking commitments, delegations, pending answers Integrating to-do/task lists to , etc. Integrating to-do/task lists to , etc. overload overload Iterative Learning of Tasks and Speech Acts Iterative Learning of Tasks and Speech Acts Predicting Social Roles and Group Leadership. Predicting Social Roles and Group Leadership. Kushmerick et al, AAAI-06 Kushmerick & Khousainov, IJCAI-05, CEAS-05 Leusky, SIGIR-04 Carvalho et al. in progress

Data: CSPACE Corpus Few large, free, natural corpora are available Few large, free, natural corpora are available CSPACE corpus (Kraut & Fussell) CSPACE corpus (Kraut & Fussell) o s associated with a semester-long project for Carnegie Mellon MBA students in 1997 o 15,000 messages from 277 students, divided in 50 teams (4 to 6 students/team) o Rich in task negotiation. o messages (5 teams) had their “Speech Acts” labeled. o One of the teams was double labeled, and the inter- annotator agreement ranges from 72 to 83% (Kappa) for the most frequent acts.

Inter-Annotator Agreement Kappa Statistic Kappa Statistic A = probability of agreement in a category A = probability of agreement in a category R = prob. of agreement for 2 annotators labeling at random R = prob. of agreement for 2 annotators labeling at random Kappa range: -1…+1 Kappa range: -1…+1 Inter-Annotator Agreement Act Kappa Deliver 0.75 Commit 0.72 Request 0.81 Amend 0.83 Meeting 0.82 Propose 0.72

Overview on Entire Corpus Act-Learner V-PerceptronAdaBoostSVMDecision Trees Request Propose Deliver Commit Error Rate

PreProcessing Signature and Quoted removal Signature and Quoted removal

Request Act: IG n-grams

Error Rate Analysis

Idea: Predicting Acts from Surrounding Acts Delivery Request Commit Proposal Request Commit Delivery Commit Delivery > Act has little or no correlation with other acts of same message Strong correlation with previous and next message’s acts Example of Thread Sequence