Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan Leonard N. Stern School.

Slides:



Advertisements
Similar presentations
Mobile and Internet Systems Group Reputation Premiums in Electronic Peer-to-Peer Markets: Analyzing Textual Feedback and Network Structure Presenter: Sean.
Advertisements

Panos Ipeirotis Stern School of Business New York University Analyzing User-Generated Content using Econometrics.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Big Data Stupid Decisions The Importance Of Measuring What We Should Be Measuring Stern School of Business, New York University “A Computer.
Correlation and regression Dr. Ghada Abo-Zaid
Ch11 Curve Fitting Dr. Deshi Ye
Show me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews. Nikolay Archak, Anindya Ghose, Panagiotis Ipeirotis New York.
Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Anindya Ghose Panos Ipeirotis Arun Sundararajan Stern School of Business New York University Opinion Mining using Econometrics A Case Study on Reputation.
CIS630 Spring 2013 Lecture 2 Affect analysis in text and speech.
Prof. Panos Ipeirotis Search and the New Economy Session 5 Mining User-Generated Content.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Linear Regression.
Forecasting 5 June Introduction What: Forecasting Techniques Where: Determine Trends Why: Make better decisions.
July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.
Forecasting.
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Correlation and Linear Regression Chapter 13 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Correlation and Linear Regression
Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Panos Ipeirotis Stern School of Business New York University Joint.
Panos Ipeirotis Stern School of Business New York University Analyzing User-Generated Content using Econometrics.
Panos Ipeirotis Stern School of Business New York University Structuring and querying online opinions using econometrics.
Nikolay Archak,Anindya Ghose,Panagiotis G. Ipeirotis Class Presentation By: Arunava Bhattacharya.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Mining and Summarizing Customer Reviews
Panos Ipeirotis Stern School of Business New York University Opinion Mining using Econometrics A Case Study on Reputation Systems Join work with Anindya.
The Value of Reputation on eBay: A Controlled Experiment Andrew Berry 11/25/08.
(a.k.a: The statistical bare minimum I should take along from STAT 101)
Anindya Ghose Sha Yang Stern School of Business New York University An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising.
Panos Ipeirotis Stern School of Business New York University Opinion Mining Using Econometrics.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Panos Ipeirotis New York University Opinion Mining using Econometrics A Case Study on Reputation Systems Joint work with Anindya Ghose and Arun Sundararajan.
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Designing Ranking Systems for Consumer Reviews: The Economic Impact of Customer Sentiment in Electronic Markets Anindya Ghose Panagiotis Ipeirotis Stern.
Panos Ipeirotis New York University Opinion Mining using Econometrics A Case Study on Reputation Systems Joint work with Anindya Ghose and Arun Sundararajan.
Direct Teacher: Professor Ng Reporter: Cindy Pineapple 1 Summarized from :
Panos Ipeirotis Stern School of Business New York University Text Mining of Electronic News Content for Economic Research “On the Record”: A Forum on Electronic.
Panos Ipeirotis Stern School of Business New York University Opinion Mining using Econometrics A Case Study on Reputation Systems Join work with Anindya.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan.
1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan Presented by: Sapan Shah.
CHAPTER 3 INTRODUCTORY LINEAR REGRESSION. Introduction  Linear regression is a study on the linear relationship between two variables. This is done by.
Prediction of Influencers from Word Use Chan Shing Hei.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Department of Electrical Engineering and Computer Science Kunpeng Zhang, Yu Cheng, Yusheng Xie, Doug Downey, Ankit Agrawal, Alok Choudhary {kzh980,ych133,
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Show Me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis.
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Assignable variation Deviations with a specific cause or source. forecast bias or assignable variation or MSE? Click here for Hint.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
HW 17 Key. 21:37 Diamond Rings. This data table contains the listed prices and weights of the diamonds in 48 rings offered for sale in The Singapore Times.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Correlation and Linear Regression
DATA MINING © Prentice Hall.
Erasmus University Rotterdam
Aspect-based sentiment analysis
Simple Linear Regression
assignable variation Deviations with a specific cause or source.
MIS2502: Data Analytics Introduction to Advanced Analytics and R
Linear Regression and Correlation
Presentation transcript:

Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan Leonard N. Stern School of Business, New York University ACL 2007

Questions/Challenges What makes an opinion positive or negative? Is there an objective measure for this task? How can we rank opinions according to their strength? Can we define an objective measure for ranking opinions? How does the context change the polarity and strength of an opinion and how can we take the context into consideration?

Introduction Reputation systems in electronic markets Pricing power of merchants in Amazon.com Using 9,500 transactions over 180 days Textual feedback and star rank Discover polarity and strength without the need for human annotations or linguistic resources.

Arguments Textual feedback affects the power of merchants to charge higher prices than the competition for the same product and still make a sale.

Reputation Systems A reputation profile –Past transactions for the merchant. –Numerical ratings from buyers who have completed transactions. –Chronological list of textual feedback provided by these buyers.

Price Premiums Price premium/ relative price premium/ relative average price premium

Data Transaction Data: –1,078 merchants –9,487 unique transactions –107,922 price premiums Reputation Data: –4,932 postings per merchant –Numerical ratings: one to five stars –Reconstruct each seller’s exact feedback profile at the time of each transaction

Econometrics-based Opinion Mining Retrieving the dimensions of reputation –Features expressed by noun, noun phrase, verb, verb phrase. –For example, –X 1 might be shipping, X 2 might be packaging.

Reputation dimension example X=(delivery, packaging, service) I was impressed by the speedy delivery! Great service! (post 1) The item arrived in awful packaging and the delivery was slow. (post 2)

Scoring the dimension of reputation Construct an n x p matrix M(s i ) A total of 151 unique dimensions, and a total of 142 modifiers. c is the prob of clicking on the “Next” link. K is the number of postings that appear on each page. Posting–specific weight

Posting – specific weight example Weight is dropped exponentially if the page number is increased. If K = 25, total post = 51, weight of post 1 (page 3)= 1/(25+25c+c 2 ) weight of post 26 (page 2) = c/(25+25c+c 2 ) weight of post 51 (page 1)= c 2 /(25+25c+c 2 ) Score of modifier-dimention (feature) pair:

Reputation Score Modifier-dimention pair score = strength Feature weight = polarity A correlation between the appearance of modifier- dimension opinion phrase ( ) of the merchants and the price premiums observed for each transaction.

Scoring by regression Regressor coefficients Control variables Fixed effects The error term Score: counting appearances and weighting each appearance using the definition of r i Variations:

Regression Settings Predicting Control variables: –The product’s price on Amazon –The average star rating of the merchant –The number of merchant’s past transactions –The number of sellers for the product

Experimental Results Human Recall by two annotators, a random sample of 1,000 posts: Computer Recall: average over two annotators Precision is not an issue, noise will be ruled out by regression

Estimating polarity and strength Good packaging = -$0.58

Price Premiums vs. Ratings Many researches assume that text feedback will not influence the buyers. They used only rating stars as a summary of opinions. Examine R 2 fit of the regression, with and without the use of the text variables. Without: R 2 = 0.35; With: R 2 = 0.63 Text contains significantly more information than the ratings.

Prediction Predict which merchant will make a sale. C4.5 classifier, 4 months data as training and 2 months data as testing.

Possible application Exam the effect of product reviews on product sales and detect the weight that customers put on different product features. The analysis of the effect of news stories on stock prices; how opinion holders and their wordings can cause the market to move up or down. Extract the pragmatic effect of news and blogs on elections or other political events.