Opinion Mining in a Telephone Survey Corpus Presenter: Shih-Hsiang Reference: [1]Robert E. Schapire and Yoram Singer. BoosTexter : A boosting-based system.

Slides:



Advertisements
Similar presentations
Armstrong Process Group, Inc. Copyright © , Armstrong Process Group, Inc., and others All rights reserved Armstrong Process.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Is it true that university students sleep late into the morning and even into the afternoon? Suppose we want to find out what time university students.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Chapter 11: Models of Computation
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 22 Comparing Two Proportions.
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Text Categorization.
Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005.
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 14: Simulations 1.
Sorting It All Out Mathematical Topics
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
On-line learning and Boosting
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Confidence Measures for Speech Recognition Reza Sadraei.
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Introduction to Machine Learning Approach Lecture 5.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Understanding
Information Retrieval in Practice
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating Jorge Carrillo de Albornoz Laura Plaza Pablo Gervás Alberto Díaz Universidad.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Universit at Dortmund, LS VIII
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Automatic recognition of discourse relations Lecture 3.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Erasmus University Rotterdam
Aspect-based sentiment analysis
Language Models for Information Retrieval
CSc4730/6730 Scientific Visualization
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Presentation transcript:

Opinion Mining in a Telephone Survey Corpus Presenter: Shih-Hsiang Reference: [1]Robert E. Schapire and Yoram Singer. BoosTexter : A boosting-based system for text categorization. Machine Learning, 39 :135–168, 2000 ICSLP2006 SLT2006 EuroSpeech 2007

2 Outline Introduction A corpus of telephone surveys Automatic opinion segmentation Modeling user satisfaction BoosTexter Experiments

3 Introduction Over the past several years, there has been an increasing number of publications focused on opinion or sentiment and subjectivity detection in text –The automatic analysis of surveys Automatically extract positive and negative perceptions about features of product or services –Information Extraction, Summarization and Question Answering Distinguish highly subjective stances from a mostly objective presentation of facts An emerging application of telephone services consists in asking opinions from the users about the solution of a problem –Users typically describe their problem and express opinions on the way it was treated Describe their problem with more than one sentence and a certain amount of redundancy The type of problem and the level of user satisfaction are only part of the semantic content of the user discourse

4 Introduction (cont.) Opinion mining constitutes a problem that is orthogonal to typical topic detection tasks in message classification –A user can be totally or partially satisfied for part of a service and not satisfied for other aspects –User messages may contain a variable number of sentences of highly variable length –Repetitions, hesitations, reformulation of the same concept are frequent –Private situation, out-of-domain comments, problem … etc

5 A corpus of telephone surveys Users are invited through a short message to call a toll-free number where they can express their satisfaction with regards to the customer service they recently called –These messages are currently processed by operators who listen and qualify the messages according to a variety of criteria –A subset of 1779 messages, collected over a 3 months period has been additionally transcribed manually in order to train models Two kinds of opinion expression have been manually annotated –Global satisfaction (called SatGlob) an indication of the perceived user judgment on the service positive, negative or neutral –Finer grained opinion analysis to characterize users opinion the courtesy of the customer service operators (Courtesy) the efficiency of the customer service (Efficiency) the amount of time one has to wait on the phone before reaching an operator (Rapidity) all the expressions of opinion on other subjects that the first three ones (Other)

6 A corpus of telephone surveys (cont.) These last four criteria can receive two polarities positive or negative –The goal of the opinion mining process is to automatically retrieve these opinion labels Annotating manually opinion expressions can be a difficult task for some ambiguous messages –particularly the case for semantic labels that cant be precisely defined –The Kappa measure is accepted as a reliable inter-annotator measure For the SatGlob label A value of K = 0.6 has been obtained with the 3 labels positive, negative and neutral By considering only the messages labeled positive or negative, a Kappa value of K = 0.9 can be obtained »This means that it is the neutral label that is the most ambiguous

7 Kappa Measure A measure of agreement between two observers taking into account agreement that could occur by chance (expected agreement) or Interpretation < 0No agreement Slight agreement Fair agreement Moderate agreement Substantial agreement Almost perfect agreement

8 Kappa Measure (cont.) PositiveNegative PositiveAB NegativeCD

9 A corpus of telephone surveys (cont.) The average length (in words) of a message according to the number of opinion expressed is presented in table –Even if the messages expressing only one opinion are the shortest, the length of a message in itself is not a reliable indicator as the longest messages in average –Another difficulty of this kind of corpus is that a single message can contain several times the same opinion expression with different polarities yes uh uh here is XX XX on the phone well Ive called the customer service yep the people were very nice Ive been given valuable information but it still doesnt work so I still dont know if I did something wrong or [... ]

10 Baseline ASR model Due to the lack of constraints on users elocution and to the nature of the open question they are submitted to –A large dispersion can be observed in the word frequency distribution –Once Named Entities have been parsed (such as phone numbers, last names..) and replaced by a single label, the training corpus contains close to 3000 different words for a total of 51k occurrences Nearly half of these words occur just once and the restriction to those words that occur at least twice led to a lexicon of 1564 words A first bigram language model has been estimated with this lexicon –Because of the very high level of disfluencies and noise, especially in long messages, the WER obtained with this model is high : 58% –However the WER is not the same for all messages, for example short messages obtain better performance

11 Automatic opinion segmentation The objective is to facilitate the classification task by providing fragments of messages instead of the whole message –This first attempt consists in segmenting messages by means of acoustic features, through the detection of pauses But long segments carrying several opinion expressions (pronounced without any pause between them) are still remaining while some segmentation are made in the middle of a single opinion expression –In a second approach, the problems of segmentation and recognition have been integrated through a new type of language model The idea is to explicitly model only those portions of messages that carry opinion expressions To this end, a sub-corpus has been extracted for each opinion label, containing all segments associated to this label in the initial training corpus »A specific bigram language model has then been estimated on each sub-corpus Along with these sub-models a global bigram language model has been estimated over a label lexicon of size 9 »the 8 opinion labels themselves and a garbage label modeling portions that do not correspond to any opinion expression

12 Automatic opinion segmentation (cont.) This global LM enables to model the possible co-occurrences of opinions in a single message In order to obtain a single fully compiled recognition model, each occurrence of an opinion label in the global LM is replaced by the corresponding sub-LM The output of system is a string of segments separated by garbage symbols

13 Modeling user satisfaction Previous works on message classification or call routing have used classification methods, like Boosting or Support Vector Machines –The classification tool used in this study is BoosTexter The weak classifiers are given as input They can be the occurrence or the absence of a specific word or n-gram, a numerical value (like the utterance length) or a combination of them At the end of the training process, the list of the selected classifiers is obtained as well as the weight of each of them in the calculation of the classification score for each conceptual constituent of the tagset Previous studies on sentiment analysis in text have shown the usefulness of integrating prior knowledge in the classification process by means of lexicons –These words, called seeds words, are likely to attach a positive or negative polarity to a stance Training corpus is used to train the BoosTexter

14 Modeling user satisfaction (cont.) After n iterations, n weak classifiers are obtained Each of them representing the occurrence of a word or an n-gram sequence of words All the words selected are then added to the manual lexicon of seed words And then replacing each word by its lemma for augmenting the genearlization capabilities of the classification model »565 lemma have been selected The features chosen in this paper are 1-gram, 2-gram and 3-gram features on the words or the seed words

15 In the typical case that a t is positive, the distribution D t is updated in a manner that increases the weight of example-label pairs which are misclassified by h t (i.e., for which Y i and h t (x,l) differ in sign). Designed to minimize Hamming loss AdaBoost.MH

16 Designed to find a hypothesis which ranks the labels in a manner that hopefully places the correct labels at the top of the ranking AdaBoost.MR Assuming momentarily that a t, this rule decreases the weight D t (i,l 0,l 1 ) if h t gives a correct ranking (h t (x i,l 1 ) > h t (x i,l 0 )), and increases this weight otherwise

17 Weak hypotheses h make predictions of the form –Weak learners search all possible terms For each term, value cjl are chosen as described below, and a score is defined for the resulting weak hypothesis BoosTexter

18 Experiments The test corpus, containing 580 messages, is processed according to 3 conditions –REF : the reference transcriptions (with manual opinion segmentation) –RECO1 : automatic transcription, segmentation done according to pauses in the speech signal –RECO2 : for the 8 opinion labels, transcription with the language model, segmentation done by the ASR process For each condition the results are given according to two data representation –messages represented only by their words or –filtered according to the seed words Precision (P), Recall (R) and F-measure are presented for each condition

19 Experimental Results - Global satisfaction Because no segmentation is needed (the SatGlob label applies to the whole message), only RECO1 is used here –the results are much better when considering only the messages containing an explicit opinion When adding the neutral polarity, the performance is slightly worse, matching the poor Kappa inter-annotator agreement measure obtained with 3 polarities –the loss in F-measure is only 5% despite the very high WER of the automatic transcriptions Representing the messages by means of seed words seems also to increase the robustness of the classification

20 Experimental Results - Fine grain opinion detection The segmentation method RECO2 outperforms the baseline one based on pause detection But the results this time are very far from those obtained on the manual transcriptions –mainly due to the insertion of segments by RECO2, leading to a good recall value but a poor precision The use of seed words increases the performance for every conditions