Erasmus University Rotterdam

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
What is Statistical Modeling
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Ensemble Learning (2), Tree and Forest
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Mining and Summarizing Customer Reviews
Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan Leonard N. Stern School.
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Introductory Statistics. Learning Objectives l Distinguish between different data types l Evaluate the central tendency of realistic business data l Evaluate.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Show Me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Language Identification and Part-of-Speech Tagging
Lecture 1.31 Criteria for optimal reception of radio signals.
Kim Schouten, Flavius Frasincar, and Rommert Dekker
Linguistic Graph Similarity for News Sentence Searching
Aspect-Based Sentiment Analysis Using Lexico-Semantic Patterns
Aspect-Based Sentiment Analysis on the Web using Rhetorical Structure Theory Rowan Hoogervorst1, Erik Essink1, Wouter Jansen1, Max van den Helder1 Kim.
Web News Sentence Searching Using Linguistic Graph Similarity
Descriptive Statistics (Part 2)
Boosting and Additive Trees
Aspect-based sentiment analysis
Introduction Task: extracting relational facts from text
iSRD Spam Review Detection with Imbalanced Data Distributions
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Ontology-Driven Sentiment Analysis of Product and Service Aspects
Ensemble learning Reminder - Bagging of Trees Random Forest
Exploring Lexico-Semantic Patterns for Aspect-Based Sentiment Analysis
Ontology-Enhanced Aspect-Based Sentiment Analysis
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Erasmus University Rotterdam Detection of Multiple Implicit Features per Sentence in Consumer Review Data Flavius Frasincar* frasincar@ese.eur.nl Erasmus University Rotterdam The Netherlands * Joint work with Nikoleta Dosoula, Roel Griep, Rick den Ridder, Rick Slangen and Kim Schouten

Contents Motivation Related Work Method Data Evaluation Conclusion

Motivation Due to the convenience of shopping online there is an increasing number of Web shops Web shops often provide a platform for consumers to share their experiences, which lead to an increasing number of product reviews: In 2014: the number of reviews on Amazon exceeded 10 million Product reviews used for decision making: Consumers: decide or confirm which products to buy Producers: improve or develop new products, marketing campaigns, etc.

Motivation Reading all reviews is time consuming, therefore the need for automation Sentiment mining is defined as the automatic assessment of the sentiment expressed in text (in our case by consumers in product reviews) Several granularities of sentiment mining: Review-level Sentence-level Aspect-level (product aspects are sometimes referred to as product features): Aspect-Based Sentiment Mining (ABSA) [our focus here]

Motivation Aspect-Based Sentiment Mining has two stages: Main problem: Aspect detection: Explicit aspect detection: aspects appear literally in product reviews [relatively easy] Implicit aspect detection: aspects do not appear literally in the product reviews [our focus here] Sentiment detection: assigning the sentiment associated to explicit or implicit aspects Main problem: In previous work we have proposed an approach to detect at most one implicit feature per sentence, but a sentence can have more than one aspect How to find all product aspects mentioned in a review sentence?

Main Idea and Evaluation Result Two step approach: Use a classifier to predict the presence of multiple (more than one) features in a sentence Extend our previous approach to predict more than one implicit feature in a sentence Evaluation result: Collection of restaurant reviews from SemEval 2014 The old approach has an F1 of 62.9% We obtain an F1 of 64.5% There is a 1.6 percentage points statistically significant increase in F1 (p < 0.01)

Related Work Explicit Features available: Disadvantage: Use the co-occurrence (per sentence) matrix between explicit features and other words from training data Compute a score per sentence for each explicit feature by summing up its co-occurences with the words in the considered test sentence The explicit feature with the largest score and which passes a (learned) threshold is detected as an implicit feature Disadvantage: You need explicit features annotations An implicit feature is selected from the list of explicit features

Related Work Implicit Features available: Advantage: Use the co-occurrence (per sentence) matrix between implicit features and other words from training data Compute a score per sentence for each implicit feature (from training data) by summing up its co-occurences with the words of the considered test sentence The implicit feature with the largest score and which passes a (learned) threshold is detected as an implicit feature Advantage: Do no need explicit feature annotations An implicit feature does not have to appear as an explicit feature

Main Problem The previous approaches are able to find at most one feature per sentence Example of sentence with multiple features: “The fish is great, but the food is very expensive” has: ‘quality’ feature with sentiment word ‘great’ ‘price’ feature with sentiment word ‘expensive’ How to update the second approach (where implicit features are available) to cope with multiple features per sentence?

Method for each test sentence s List F: all features appearing in the training data List L: all unique lemmas appearing in the training data Matrix C of size |F| x |L| stores the co-occurences between elements in F and elements in L for each test sentence s for each fi in F do 𝑠𝑐𝑜𝑟𝑒 𝑓 𝑖 = 1 𝑛 𝑗=1 𝑛 𝑐 𝑖𝑗 𝑜 𝑗 where: n is the number of words in s

Method Approach 1: Approach 2: Solution: use a simpler classifier Select all features that have a score that exceeds a learned threshold Disadvantage: for data sets with few implicit features too many will be selected Approach 2: Use a classifier to determine the number of features Based on this number assign the top scoring features to the sentence Disadvantage: Difficult to predict the exact number of features (hard task) Solution: use a simpler classifier

Method Use a classifier to predict more than 1 feature for the considered test sentence (true), otherwise it is 0 or 1 features for the considered test sentence (false) for each test sentence s if classifier(s) then /* classifier predicts more than 1 feature */ for each fi in F do if 𝒔𝒄𝒐𝒓𝒆 𝒇 𝒊 >𝜺 then assign fi to s else /*classifier predicts 0 or 1 features */ fBestScore = 0; fBest = 𝒏𝒖𝒍𝒍 if 𝑠𝑐𝑜𝑟𝑒 𝑓 𝑖 >fBest fBestScore = 𝒔𝒄𝒐𝒓𝒆 𝒇 𝒊 ; fBest = fi if fBestScore >𝜀 assign fBest to s where 𝜺 is a first trained threshold on the training data (in interval [0,1])

Method We use as classifier: logistic regression Classifier uses a threshold 𝜹 to determine when to predict more than 1 feature for the considered test sentence 𝑠𝑐𝑜𝑟𝑒𝑠=𝑙𝑜𝑔 𝑝 𝑠 1− 𝑝 𝑠 = 𝛽 0 + 𝛽 1 #𝑁 𝑁 𝑠 + 𝛽 2 #𝐽 𝐽 𝑠 + 𝛽 3 #𝐶𝑜𝑚𝑚 𝑎 𝑠 + 𝛽 4 #𝐴𝑛 𝑑 𝑠 where 𝑝 𝑠 is the probability that sentence s contains multiple implicit features #𝑁 𝑁 𝑠 is the number of nouns in sentence s #𝐽 𝐽 𝑠 is the number of adjectives in sentence s #𝐶𝑜𝑚𝑚 𝑎 𝑠 is the number of commas in sentence s #𝐴𝑛 𝑑 𝑠 is the number of ands in sentence s

Method for each test sentence s if 𝒔𝒄𝒐𝒓𝒆𝒔>𝜹 then classifier(s) = true else classifier(s) = false where 𝜹 is a trained threshold on the training data (in interval [−∞,∞]) The new algorithm is trained in two steps using the training data: The threshold of the classifier (𝜹) is trained first [using a custom-made gold standard based on the original annotations] The threshold of the feature detector (𝜺) is trained second (using the prediction of the optimized classifier) [using as gold standard the original annotations]

Data Collection of restaurant reviews from SemEval 2014 Every review sentences is annotated with at least one of five implicit features: ‘food’ ‘service ‘ambience’ ‘price’ ‘anecdotes/miscellaneous’ All 3,044 sentences contain at least one implicit feature The ‘anecdotes/miscellaneous’ carries little semantics so we remove it from the data set: We have only four implicit features Some sentences have now no implicit features (which fits well our setup)

Data Distribution of the number of implicit features contained per sentence 14.8% of the sentences contain more than one implicit feature 32.7% of the sentences contain no implicit feature 52.6% of the sentences contain one implicit feature (a small majority)

Data Frequencies of the four unique features ‘food’ is the most frequent, followed by ‘service’ (half), then ‘ambience’ and then ‘price’ Frequencies of the four unique features

Data Co-occurrence frequencies of the four unique features More than 4% of the sentences refer to both ‘food’ and ‘price’, and almost the same percentage corresponds to ‘food’ and ‘service’ (most of the sentences contain only one implicit feature)

Evaluation 10-fold cross validation Coefficients of logistic regression for the classifier (full data set) All variables are significant for p-value < 0.01 We have also tried (but did not achieve statistical significance): Number of words in a sentence (some info already captured by #NNs and #JJs) Number of subjects in a sentence (the subject is often the product instead of a feature) Predictor Variable Coefficient p-value Constant -3.019479 0.0000 #NNs 0.116899 0.0002 #JJs 0.335530 Commas 0.216417 0.0004 Ands 0.399415

Evaluation Specifications of 1000 logistic regressions on 90% subsamples Constant excluded as it does not influence the results with a trained threshold Variable Mean Median Std. dev. #NNs 0.117361 0.11768 0.011342 #JJs 0.335538 0.33536 0.014345 Commas 0.216409 0.21672 0.023185 Ands 0.399507 0.39892 0.023409

Evaluation Box-plot of the coefficients of 1000 logistic regressions on 90% samples

Evaluation Classifier uses F where  = 1.8 Almost 2 times more importance given to recall than precision Recall is more important than precision, as some of the low precision can be corrected by the feature detector After  = 1.8 there is a sharp decrease in precision, while recall increases only a little bit 

Evaluation Mean F1-scores with different part-of-speech filters maximum possible percentage points improvement for the classifier is 1.6/(69.3-62.9) = 25% The old algorithm had an F1 = 62.9%, the new one has an F1 = 64.5%, hence an improvement of 1.6 percentage points error due to the feature detector error due to classifier 85.2% 69.3% 64.5% The best part-of-speech is NN+JJ (F1 = 64.5%), but difference is very small compared to NN (F1 = 64.1%)

Conclusion Implicit feature detection Two step approach: Classifier: classify sentences with more than 1 feature or not Feature detector: detect features per sentence Case 1: select all features that pass a threshold Case 2: select at most one feature, i.e., the best feature if it passes the threshold Classifier uses features as: Number of nouns in a sentence Number of adjectives in a sentence Numbers of commas in a sentence Number of ands in a sentence

Conclusion Future work: Use of more advanced classifiers as Support Vector Machines or Random Forests Learn the number of implicit features per sentence (a more advanced form of our current classifier) Improve the feature detector using a multi-label classifier for a sentence (a more advanced form of our current rule-based feature detector) Computing the sentiment associated to: Explicit features Implicit features (determining the scope of features and weighting sentiment words in relation to features)