Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004
abstract This paper proposes a new paradigm for sentiment analysis : translation from text documents to a set of sentiment units. Making use of an existing transfer-based machine translation engine.
introduction Sentiment analysis (SA) is a task to obtain someone ’ s feelings as expressed in positive or negative comments (favorable or unfavorable), questions, and requests. SA is becoming a useful tool for the commercial activities. This paper describes a method to extract a set of sentiment units from sentences, which is the key component of SA.
introduction A sentiment unit is a tuple of a sentiment, a predicate, and its arguments. It has excellent lens, but the price is too high. I don ’ t think the quality of the recharger has any problem. [favorable] excellent (lens) [unfavorable] high (price) [favorable] problematic+neg (recharger) Three sentiment units indicate that the camera has good features in its lens and recharger, and a bad feature in its price. The extraction of these sentiment units is not a trivial task because many syntactic and semantic operations are required. A sentiment unit should be constructed as the smallest possible informative unit so that it is easy to handle for the organizing processes after extraction.
introduction Implemented an accurate sentiment analyzer by making use of an existing transfer-based machine translation engine (Watanabe, 1992), replacing the translation patterns and bilingual lexicons with sentiment patterns and a sentiment polarity lexicon. Use deep analysis techniques such as those used for machine translation where all of the syntactic and semantic phenomena must be handled.
our SA system attaches importance to each individual sentiment expression, rather than to the quantitative tendencies of reputation. introduction
Sentiment Unit A predicate is a word, typically a verb or an adjective, which conveys the main notion of the sentiment unit. An argument is also a word, typically a noun, which modifies the predicate with a case postpositional in Japanese. They roughly correspond to a subject and an object of the predicate in English. For example, the sentence, ” ABC123 has an excellent lens ”. [fav] excellent
Sentiment Unit Semantically similar representations should be aggregated to organize extracted sentiments. Predicates may have features, such as negation, facility, difficulty, etc. “ ABC123 doesn ’ t have an excellent lens. ” [unf] excellent + neg Easy to break. [unf] break + facil Difficult to learn [unf] learn + diff The surface string is the corresponding part in the original text. It is used for reference in the view of the output of SA.
Implementation :Transfer-based Machine Translation Engine the transfer-based machine translation system consists of three parts: a source language syntactic parser, a bilingual transfer which handles the syntactic tree structures, a target language generator.
Implementation
Techniques Required for Sentiment Analysis Full syntactic parsing plays an important role to extract sentiments correctly, because only by a shallow parser are not always reliable. For example, expressions such as “ I don ’ t think X is good ”, is not favorable opinions about X, even though “ X is good ” appears on the surface. Therefore we use top-down pattern matching on the tree structures from the full parsing in order to find each sentiment fragment. In our method, initially the top node is examined to see whether or not the node and its combination of children nodes match with one of the patterns in the pattern repository. In this top-down manner, the nodes “ don ’ t think ” in the above examples are examined before “ X is good
There are three types of patterns: principal patterns, The pattern converts a Japanese expression “ noun ga warui ” to a sentiment unit “ [unf] bad ”. The pattern converts an expression “ noun wo ki-ni iru ” to a sentiment unit “ [fav] like ” Techniques Required for Sentiment Analysis
auxiliary patterns expands the scope of matching. The pattern matches with phrases such as “ X-wa yoi- to omowa-nai. (I don ’ t think X is good.) ” and produces a sentiment unit with the negation feature. When this pattern is attached to a principal pattern, its favorability is inverted. nominal patterns Using this pattern, convert a noun phrase “ renzu-no shitsu (quality of the lens) ” into just “ lens ”. EX: The quality of the lens is good. [fav] good ?[fav] good Pattern used for compound nouns such as “ junden jikan (researching time). A sentiment unit “ long ” is not informative, but “ long “ can be regarded as a [unf]sentiment. Techniques Required for Sentiment Analysis
Disambiguation of sentiment polarity Some adjectives and verbs may be used for both favorable and unfavorable predicates. This variation of sentiment polarity can be disambiguated naturally in the same manner as the word sense disambiguation in machine translation. The resolution is high fav ABC123 is expensive unf The semantic category assigned to a noun holds the information used for this type of disambiguation.
Resources Principal patterns : verbal and adjectival, and assigned a sentiment polarity to each word. (total 3752 words) Auxiliary/Nominal patterns: 95 auxiliary patterns and 36 nominal patterns were created manually. Polarity lexicon: Some nouns were assigned sentiment polarity, e.g. [unf] for ‘ noise ’. (There are many...) ”. Some patterns and lexicons are domain dependent. Fortunately the translation engine used here has a function to selectively use domain-dependent dictionaries, and thus we can prepare patterns which are especially suited for the domain of digital cameras.
Evaluation Bulletin boards on the WWW that are discussing digital cameras. A total of 200 randomly selected sentences were analyzed by our system. The resources were created by looking at other part of the same domain texts.
Experiment 1 See the reliability of the extracted sentiment polarity, use 3 metrics: Weak / Strong Precision, Recall Using 2 method (a) based on machine translation engine (b) the lexicon-only method, which emulates the shallow parsing approach. Use simple polarity lexicon of adjectives and verbs. No disambiguation was done. Direct negation of and adjective or verb.
Experiment 1 The MT method outputs a sentiment unit only when the expression is reachable from the root node of the syntactic tree through the combination of sentiment fragments, while the lexicon-only method picks up sentiment units from any node in the syntactic tree. The sentence is an example where the lexicon-only method output the wrong sentiment unit, while the MT method did not output this sentiment unit gashitsu-ga kirei-da-to iu hyouka-ha uke-masen-deshi-ta. ‘ There was no opinion that the picture was sharp. ’ [fav] clear In the lexicon-only method, some errors occurred due to the ambiguity in sentiment polarity of an adjective or a verb, e.g. Capabilities are high. ” since high/expensive is always assigned the [unf] feature.
Experiment 2 Compare the scope of the extracted sentiment units between MT and (c): a method that support only na ï ve predicate-argument structures and doesn ’ t use nominal patterns. The output by the MT was less redundant and more informative than Na ï ve method. Ex: It seems the function was enhanced last may (A) [fav] enhance (C) [fav] enhance Ex: A zoom is more desirable. (A) [fav] desirable (C) [fav] desirable
conclusion We have shown that the deep syntactic and semantic analysis makes possible the reliable extraction of sentiment units, and the outlining of sentiments became useful because of the aggregation of the variations in expressions, and the informative outputs of the arguments. when we regard the extraction of sentiment units as a kind of translation. Many techniques which have been studied for the purpose of machine translation, such as word sense disambiguation, anaphora resolution, can accelerate the further enhancement of sentiment analysis.