Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004.

Slides:



Advertisements
Similar presentations
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International.
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
Statistical NLP: Lecture 3
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Technical Writing II Acknowledgement: –This lecture notes are based on many on-line documents. –I would like to thank these authors who make the documents.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Writing Good Software Engineering Research Papers A Paper by Mary Shaw In Proceedings of the 25th International Conference on Software Engineering (ICSE),
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Mining and Summarizing Customer Reviews
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
CAS LX 502 Semantics 3a. A formalism for meaning (cont ’ d) 3.2, 3.6.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Semantic Analysis Legality checks –Check that program obey all rules of the language that are not described by a context-free grammar Disambiguation –Name.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.
Language Knowledge Engineering Lab. Kyoto University NTCIR-10 PatentMT, Japan, Jun , 2013 Description of KYOTO EBMT System in PatentMT at NTCIR-10.
AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Querying Structured Text in an XML Database By Xuemei Luo.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Rules, Movement, Ambiguity
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
INTRODUCTION TO COMPILERS(cond….) Prepared By: Mayank Varshney(04CS3019)
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Supertagging CMSC Natural Language Processing January 31, 2006.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Specifications …writing descriptive detail. Specifications: Purpose Document a product in enough detail that someone else could create or maintain it.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Chapter – 8 Software Tools.
NATURAL LANGUAGE PROCESSING
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
ANAPHORA RESOLUTION SYSTEM FOR NATURAL LANGUAGE REQUIREMENTS DOCUMENT IN KOREAN 課程 : 自然語言與應用 課程老師 : 顏國郎 報告者 : 鄭冠瑀.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Natural Language Processing Vasile Rus
Approaches to Machine Translation
Sentiment analysis algorithms and applications: A survey
Introduction to Parsing (adapted from CS 164 at Berkeley)
Statistical NLP: Lecture 3
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Lecture 7: Introduction to Parsing (Syntax Analysis)
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Approaches to Machine Translation
Chunk Parsing CS1573: AI Application Development, Spring 2003
Statistical NLP: Lecture 10
Presentation transcript:

Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004

abstract  This paper proposes a new paradigm for sentiment analysis : translation from text documents to a set of sentiment units.  Making use of an existing transfer-based machine translation engine.

introduction  Sentiment analysis (SA) is a task to obtain someone ’ s feelings as expressed in positive or negative comments (favorable or unfavorable), questions, and requests.  SA is becoming a useful tool for the commercial activities.  This paper describes a method to extract a set of sentiment units from sentences, which is the key component of SA.

introduction  A sentiment unit is a tuple of a sentiment, a predicate, and its arguments. It has excellent lens, but the price is too high. I don ’ t think the quality of the recharger has any problem. [favorable] excellent (lens) [unfavorable] high (price) [favorable] problematic+neg (recharger) Three sentiment units indicate that the camera has good features in its lens and recharger, and a bad feature in its price. The extraction of these sentiment units is not a trivial task because many syntactic and semantic operations are required. A sentiment unit should be constructed as the smallest possible informative unit so that it is easy to handle for the organizing processes after extraction.

introduction  Implemented an accurate sentiment analyzer by making use of an existing transfer-based machine translation engine (Watanabe, 1992), replacing the translation patterns and bilingual lexicons with sentiment patterns and a sentiment polarity lexicon.  Use deep analysis techniques such as those used for machine translation where all of the syntactic and semantic phenomena must be handled.

 our SA system attaches importance to each individual sentiment expression, rather than to the quantitative tendencies of reputation. introduction

Sentiment Unit  A predicate is a word, typically a verb or an adjective, which conveys the main notion of the sentiment unit.  An argument is also a word, typically a noun, which modifies the predicate with a case postpositional in Japanese. They roughly correspond to a subject and an object of the predicate in English.  For example, the sentence, ” ABC123 has an excellent lens ”.  [fav] excellent

Sentiment Unit  Semantically similar representations should be aggregated to organize extracted sentiments.  Predicates may have features, such as negation, facility, difficulty, etc. “ ABC123 doesn ’ t have an excellent lens. ”   [unf] excellent + neg Easy to break.  [unf] break + facil Difficult to learn  [unf] learn + diff  The surface string is the corresponding part in the original text. It is used for reference in the view of the output of SA.

Implementation :Transfer-based Machine Translation Engine  the transfer-based machine translation system consists of three parts: a source language syntactic parser, a bilingual transfer which handles the syntactic tree structures, a target language generator.

Implementation

Techniques Required for Sentiment Analysis  Full syntactic parsing plays an important role to extract sentiments correctly, because only by a shallow parser are not always reliable. For example, expressions such as “ I don ’ t think X is good ”, is not favorable opinions about X, even though “ X is good ” appears on the surface. Therefore we use top-down pattern matching on the tree structures from the full parsing in order to find each sentiment fragment.  In our method, initially the top node is examined to see whether or not the node and its combination of children nodes match with one of the patterns in the pattern repository. In this top-down manner, the nodes “ don ’ t think ” in the above examples are examined before “ X is good

 There are three types of patterns: principal patterns,  The pattern converts a Japanese expression “ noun ga warui ” to a sentiment unit “ [unf] bad ”.  The pattern converts an expression “ noun wo ki-ni iru ” to a sentiment unit “ [fav] like ” Techniques Required for Sentiment Analysis

auxiliary patterns  expands the scope of matching.  The pattern matches with phrases such as “ X-wa yoi- to omowa-nai. (I don ’ t think X is good.) ” and produces a sentiment unit with the negation feature. When this pattern is attached to a principal pattern, its favorability is inverted. nominal patterns  Using this pattern, convert a noun phrase “ renzu-no shitsu (quality of the lens) ” into just “ lens ”.  EX: The quality of the lens is good.  [fav] good ?[fav] good  Pattern used for compound nouns such as “ junden jikan (researching time). A sentiment unit “ long ” is not informative, but “ long “ can be regarded as a [unf]sentiment. Techniques Required for Sentiment Analysis

Disambiguation of sentiment polarity  Some adjectives and verbs may be used for both favorable and unfavorable predicates. This variation of sentiment polarity can be disambiguated naturally in the same manner as the word sense disambiguation in machine translation. The resolution is high  fav ABC123 is expensive  unf The semantic category assigned to a noun holds the information used for this type of disambiguation.

Resources  Principal patterns : verbal and adjectival, and assigned a sentiment polarity to each word. (total 3752 words)  Auxiliary/Nominal patterns: 95 auxiliary patterns and 36 nominal patterns were created manually.  Polarity lexicon: Some nouns were assigned sentiment polarity, e.g. [unf] for ‘ noise ’. (There are many...) ”.  Some patterns and lexicons are domain dependent. Fortunately the translation engine used here has a function to selectively use domain-dependent dictionaries, and thus we can prepare patterns which are especially suited for the domain of digital cameras.

Evaluation  Bulletin boards on the WWW that are discussing digital cameras.  A total of 200 randomly selected sentences were analyzed by our system.  The resources were created by looking at other part of the same domain texts.

Experiment 1  See the reliability of the extracted sentiment polarity, use 3 metrics: Weak / Strong Precision, Recall  Using 2 method (a) based on machine translation engine (b) the lexicon-only method, which emulates the shallow parsing approach.  Use simple polarity lexicon of adjectives and verbs.  No disambiguation was done.  Direct negation of and adjective or verb.

Experiment 1  The MT method outputs a sentiment unit only when the expression is reachable from the root node of the syntactic tree through the combination of sentiment fragments, while the lexicon-only method picks up sentiment units from any node in the syntactic tree.  The sentence is an example where the lexicon-only method output the wrong sentiment unit, while the MT method did not output this sentiment unit gashitsu-ga kirei-da-to iu hyouka-ha uke-masen-deshi-ta. ‘ There was no opinion that the picture was sharp. ’  [fav] clear In the lexicon-only method,  some errors occurred due to the ambiguity in sentiment polarity of an adjective or a verb, e.g. Capabilities are high. ” since high/expensive is always assigned the [unf] feature.

Experiment 2  Compare the scope of the extracted sentiment units between MT and (c): a method that support only na ï ve predicate-argument structures and doesn ’ t use nominal patterns.  The output by the MT was less redundant and more informative than Na ï ve method. Ex: It seems the function was enhanced last may  (A) [fav] enhance  (C) [fav] enhance Ex: A zoom is more desirable.  (A) [fav] desirable  (C) [fav] desirable

conclusion  We have shown that the deep syntactic and semantic analysis makes possible the reliable extraction of sentiment units, and the outlining of sentiments became useful because of the aggregation of the variations in expressions, and the informative outputs of the arguments.  when we regard the extraction of sentiment units as a kind of translation. Many techniques which have been studied for the purpose of machine translation, such as word sense disambiguation, anaphora resolution, can accelerate the further enhancement of sentiment analysis.