Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

RCQ-ACS: RDF Chain Query Optimization Using an Ant Colony System WI 2012 Alexander Hogenboom Erasmus University Rotterdam Ewout Niewenhuijse.
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms BNAIC 2009 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
Optimizing RDF Chain Queries using Genetic Algorithms DBDBD 2010 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus University.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23,
News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
Mining and Summarizing Customer Reviews
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating Jorge Carrillo de Albornoz Laura Plaza Pablo Gervás Alberto Díaz Universidad.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
TOWL Time-determined ontology-based information system for real-time stock market analysis Econometric Institute Erasmus School of Economics Erasmus University.
Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
A Language Independent Method for Question Classification COLING 2004.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Automatic recognition of discourse relations Lecture 3.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
Part-of-Speech Tagging with Limited Training Corpora Robert Staubs Period 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Introduction to RST (Rhetorical Structure Theory)
Language Identification and Part-of-Speech Tagging
Kim Schouten, Flavius Frasincar, and Rommert Dekker
Aspect-Based Sentiment Analysis Using Lexico-Semantic Patterns
Aspect-Based Sentiment Analysis on the Web using Rhetorical Structure Theory Rowan Hoogervorst1, Erik Essink1, Wouter Jansen1, Max van den Helder1 Kim.
Web News Sentence Searching Using Linguistic Graph Similarity
Hierarchical, Perceptron-like Learning for OBIE
Presentation transcript:

Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska de Jong September 4, 2013 erasmus studio

Introduction (1) Information monitoring tools are needed for tracking sentiment in today’s complex systems The Web offers an overwhelming amount of textual data, containing traces of sentiment OR

Introduction (2) An intuitive approach to sentiment analysis involves scanning a text for cues signaling its polarity Let us consider the following negative review: –Example: Although Brad Pitt’s well-deserved fall off a cliff was quite entertaining, this movie was terrible! We hypothesize that structural aspects of content play an important role in conveying sentiment How can structural aspects of natural language text be exploited when determining its polarity? OR Although Brad Pitt’s well-deserved fall off a cliff was quite entertaining,

Sentiment Analysis Sentiment analysis is typically focused on determining the polarity of natural language text Applications in summarizing reviews, determining a general mood (consumer confidence, politics) State-of-the-art approaches classify polarity of natural language text by analyzing vector representations using, e.g., machine learning techniques Alternative approaches are lexicon-based, which renders them robust across domains and texts and enables linguistic analysis at a deeper level 44 OR 2013

Exploiting Structural Aspects of Text (1) Early approaches involve accounting for segments’ positions in a text or their semantic cohesion Recent work exploits discursive relations by applying the Rhetorical Structure Theory (RST) RST can be used to split a text into a hierarchical structure of rhetorically related segments Nucleus segments form the core of a text, whereas satellites support the nuclei Many types of relations between segments exist, e.g., background, elaboration, explanation, contrast, etc. 55 OR 2013

Exploiting Structural Aspects of Text (2) Existing work differentiates between important and less important segments w.r.t. the overall sentiment Previously proposed method assigns different weights to nuclei and satellites We propose to differentiate among rhetorical roles, i.e., RST relation types 66 OR 2013 Although Brad Pitt’s well- deserved fall off a cliff was quite entertaining, this movie was terrible! 1-2 Contrast

Framework (1) Lexicon-based document-level polarity classification Based on its lemma, Part-of-Speech (POS), and disambiguated word sense (Lesk-based), each individual word is scored in the range [-1,1] by using sentiment scores from SentiWordNet 3.0 Word scores are aggregated and corrected for a bias towards positivity in order to classify text as positive (corrected score ≥ 0) or negative (corrected score < 0) Discourse parsing is applied in order to determine appropriate weights for word scores in this process 77 OR 2013

Framework (2) Simple discourse parsing: weights proportional to position of words in full text Sentence-level PArsing of DiscoursE (SPADE): –SPADE I: Assign top-level RST nuclei weights of 1 Assign top-level RST satellites weights of 0 –SPADE II: Assign top-level RST nuclei weights of 1.5 Assign top-level RST satellites weights of 0.5 –SPADE X: Our novel, extended SPADE-based approach Assign top-level RST nuclei and satellite types different weights in the range [-2, 2], optimized by means of a genetic algorithm 88 OR 2013

Evaluation (1) Implementation in Java, OpenNLP POS tagger, Java WordNet Library (JWNL) API for lemmatization, SentiWordNet 3.0 sentiment lexicon Corpus of 500 positive and 500 negative English movie reviews (60% training set, 40% test set) Baselines: –All words assigned weight of 1 –Simple discourse parsing (position-based) –SPADE I –SPADE II 99 OR 2013

Evaluation (2) Alternative: SPADE X Optimized weight for nuclei equals Most significant RST relation weights: –Elaboration: –Enablement:0.956 –Attribution:0.451 –Contrast: OR 2013

Evaluation (3) 11 OR 2013 PositiveNegativeOverall MethodPrec.Rec.F1Prec.Rec.F1Acc.F1 Baseline Simple SPADE I SPADE II SPADE X

Conclusions Recent sentiment analysis methods consider more and more aspects of content other than word frequencies We identify important and less important text segments and weight them accordingly when analyzing sentiment Both nuclei and satellites appear to play an important role in conveying sentiment, whereas satellites have until now been deemed predominantly irrelevant Significantly improved overall polarity classification accuracy and macro-level F1 w.r.t. not accounting for structural aspects of content comes at a cost of increased processing times OR

Future Work Account for full discourse (RST) trees rather than for top-level segmentations only Perform paragraph-level or document-level analysis of structure rather than sentence-level analysis Explore other (faster) methods of identifying discourse structure in natural language text Investigate our findings’ applicability to vector-based machine learning approaches to sentiment analysis Evaluate our findings on different corpora OR

Questions? Alexander Hogenboom Econometric Institute Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands OR