Presentation on theme: "Polarity Dictionary: Two kinds of words, which are polarity words and modifier words, are involved in the polarity dictionary. The polarity words have."— Presentation transcript:
Polarity Dictionary: Two kinds of words, which are polarity words and modifier words, are involved in the polarity dictionary. The polarity words have 6 attributes including text, POS, def, exceptional-feature, dynamic-polarity, and strength attribute. The text attribute stands for the word itself. POS attribute depicts part-of-speech of words. The def attribute means the concept definition of a word from HowNet. The exceptional-feature and dynamic-polarity attributes are to deal with special case, in which words may have a different polarity from its basic polarity. For example, the word “high” is positive when it modifies the word “quality”, but negative when modifies the word “price”. The strength attribute reflects the strength of polarity for a word. Modifier words are words that can strengthen, weaken or even reverse polarity of polarity words, and they have very similar attributes as the polarity words. The corpus used in our system is the reviews from Bulletin Board, which is available from the following website: In the corpus, there are a lot of reviews written with irregular punctuation, so criteria to split sentence needs to be built first. Then each sentence is processed in a stage we called element construction, in which we use several tools and resources that are the syntactic parser, POS tagger, Ontology and Polarity Dictionary to build a dependency syntactical structure and assign different tags to each word in the sentence according to their potential use in the following stage. The pronominal resolution and ellipsis recovery model mainly deals with feature words, which mean car names or feature names of cars in our system. After that, a stage of the reconstruction for elements is arranged. In the last two stages, we first identify constituent relations using a pattern library which we have built using training data, and then summarize these opinions from a paragraph level. Finally, visualized results could be shown with the Opinion Observer. In this system users can make two kinds of comparisons between different brands as well as different parts of a certain car. In the left figure, we can see that six products are selected for comparison. Users choose brands from the left column of the interface and “compared cars” from the top menu. A bar chart will appear on the right. The bars above the x-axis show positive opinion quantity (in red color) and the ones below x-axis show negative opinion quantity (in blue color). Thus, we can clearly observe the statistical evaluation of consumer reviews. The right figure looks much the same as the left one, while the main difference is that it deals with features of cars. You can get a distinct impression of how consumers view different features of each product. 1. Introduction Nowadays, when online business becomes a fashion, the quantity of the reviews towards the products given by customers is growing surprisingly as well, so that it is difficult for a customer to read over all of the reviews and make a reasonable decision when he/she is facing the problem whether to purchase a certain product or not. Our main task is to extract the opinions of reviews given by customers towards different features for different brands of cars, and determine whether these opinions are positive, negative or neutral and how strong they are. In this paper, a practical system named Surveyer that can accomplish opinion mining tasks by natural language processing techniques, and its related algorithms will be introduced. 2. Interface of Opinion Observer An Opinion Mining System for Chinese Automobile Reviews Tianfang Yao Qingyang Nie Jianchao Li Linlin Li Decheng Lou Ke Chen Yu Fu Department of Computer Science and Engineering, Shanghai Jiao Tong University 800 Dong Chuan Rd., Shanghai , China System Architecture 6. A Self-developed Annotation Tool Surveyer annotation tool is designed not only to meet the needs of annotation, but also to describe the processing flow of the system. You can get a legible view of how Surveyer extracts opinions and determines their polarization step by step. You can also export the automatically generated rule file from annotated data here. 5. Pattern Generation and Effective Evaluation Ontology Polarity Dictionary Patterns Syntactic Parser POS Tagger Structured Analysis Result Preprocess Simple Sentence Split Element Construction Corpus Comments Element s Element Reconstruction Pronominal Resolution & Ellipsis Recovery Constituent Relation Extraction Paragraph Polarity Analysis Elements Resoluted Elements Merged Structured Analysis Result Topic 4. Resource Building: Ontology & Polarity Dictionary Ontology: There are two taxonomies in our ontology, which represent cars and features of cars. Each category in a taxonomy has a name, weight attributes, and contained extra information like synonyms of the name. All categories are arranged in a hierarchical structure to describe relations between different cars or car features. Patterns car-feature-patterns car-car-patterns polarity-modifier-patterns car-polarity-patterns feature-polarity-patterns feature-feature-patterns POS for each word down-route up-route from POS tagger from syntactic parser rules: Generation: Two features which are syntactic nodes in the parsed tree and part-of- speech of each related words are used to generate patterns. Four annotators have hand- crafted the training data, and rules are automatically generated with predefined criteria from annotated texts. Several optimization methods are used before the automatically generated rules are put into the pattern library, which is the source for new relation identification. Evaluation: Some tests have been used to evaluate the effectiveness of this pattern building method. Human annotated test data are used as gold standard, and we got an average 80% recall rate and 60% precision rate, which mainly towards feature-polarity- patterns and car-polarity-patterns. While most mistakes occur with polarity strength, the direction of polarity is correct most of the time. The result shows quite promising, in that only with part-of-speech and syntactic features, this method could achieve a relative high performance. In the future research, we consider adding more features to rebuild the pattern knowledge base.