2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.

Slides:



Advertisements
Similar presentations
Trends in Sentiments of Yelp Reviews Namank Shah CS 591.
Advertisements

Data Mining and Text Analytics Advertising Laura Quinn.
Sentiment Analysis on Twitter Data
Farag Saad i-KNOW 2014 Graz- Austria,
Large-Scale Entity-Based Online Social Network Profile Linkage.
Promoting Your Business Through Twitter ©2009, All rights reserved Fox Coaching Associates.
Tweet Classification for Political Sentiment Analysis Micol Marchetti-Bowick.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
2015 SLA IT Webinar Using Analytics to Understand Social Media Activity Michelle Chen School of Information San José State University February 4 th, 2015.
1 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text.
Problem Semi supervised sarcasm identification using SASI
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining.
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Peiti Li 1, Shan Wu 2, Xiaoli Chen 1 1 Computer Science Dept. 2 Statistics Dept. Columbia University 116th Street and Broadway, New York, NY 10027, USA.
2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering.
NTC 2014 Social Data Analysis Bhupesh Chawda. Suggestions This presentation provides links to data sets as well as tools and resources for working on.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
Social Media Guide and Best Practices. Social Media Overview Successful social media strategy is dependent upon quality content and measurement. Celebrating.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
Opinion Mining on the Web 2.0 Characteristics of User Generated Content and Their Impacts ITEC 547 Text Mining Ass. Professor: Nazife Dimililer Name: Feras.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Using Social Media to Communicate and Support Your School A Closer Look at Twitter.
Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Microblogs: Information and Social Network Huang Yuxin.
Field monitors and Citizen journalists training day 2 By Aasim Zafar Khan.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
SEO Who knew 3 letters could mean so much?. What is SEO? Search Engine Optimization (SEO) is the practice of improving and promoting a web site in order.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Using Social Media for Fundraising and Communication with Supporters Lindsay Boyle – Communications & Research Coordinator Claire Chapman – Information.
Seminar Topics and Projects Giuseppe Attardi Dipartimento di Informatica Università di Pisa.
CSC 594 Topics in AI – Text Mining and Analytics
Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer.
© Copyright 2008 STI INNSBRUCK TrustYou Ioan Toma.
Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?
Reputation Management System
#GoingViral giulia_bonelli, formicablu Using social media to promote research CAGLIARI,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Unsupervised Streaming Feature Selection in Social Media
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Leveraging Social Media Analytics to Protect the Brand, Improve Products and enhance Operational Performance Derive business value from unstructured data.
@nmoneypenny Innovating New Products & Services with Enterprise Social Graphing: Naomi Moneypenny.
Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
More than words: Social network’s text mining for consumer brand sentiments Expert Systems with Applications 40 (2013) 4241–4251 Mohamed M. Mostafa Reporter.
A Sentiment-Based Approach to Twitter User Recommendation BY AJAY ABDULPUR RAJARAM NIKKAM.
A Simple Approach for Author Profiling in MapReduce
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
Sentiment Analysis on Interactive Conversational Agent/Chatbots
Sentiment analysis algorithms and applications: A survey
Sentiment analysis tools
University of Computer Studies, Mandalay
MID-SEM REVIEW.
Text Retrieval and Data Mining in SI - An Introduction
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Cryptocurrencies: A Brief Look & Sentiment Analysis
Sentiment/opinion analysis
Seminar Topics and Projects
Text Mining & Natural Language Processing
Text Mining & Natural Language Processing
NAÏVE BAYES CLASSIFICATION
Introduction to Search Engines
Presentation transcript:

2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software Engineer HP Big Data Business Unit #GHC

What to Expect  Sentiment Analysis −What is it? −Why is it interesting? −How HP Vertica Pulse works −Achieving greater accuracy −Different point of view using the most- mentioned word tree

2014 What I Expect  A 5-star rating on GHC app I just expect you to enjoy and learn!

2014 Sentiment Analysis  In plain English −the process of automatically detecting if a text segment contains emotional or opinionated content and determining its polarity (e.g., “thumbs up” or “thumbs down”), is a field of research that has received significant attention in recent years, both in academia and in industry. [Wright, 2009]

2014 Gimme Examples!  Also known as: −Opinion Mining −Text Mining  Determine people’s general opinion −“I just got a new car, and I’m loving it ” −“My new car isn’t as fast as I thought.”

2014 Why are we interested?  Increasing(every minute!) web usage −Articles −Blogs −Comments  Power of Social Media −Online Shopping −Customer Reviews −Recommended products on Amazon −How other people feel about the product

2014 Product Review

2014 Data… Data… Data…

2014 HP Vertica Pulse

2014 How to Analyze?  Lexicon-based approach – HP Labs [Zhang et. al. 2011]  Choose a product, person, event, organization, or topic [Hu and Liu, 2004] to analyze the opinion  Determine the Semantic Orientation score of opinion lexicons WordSemantic Orientation Value Fabulous+3 Good+1 Bad Nasty-3

2014 Sentiment Scoring  Input: text or sentence  Output: For each attribute or entity, generates a sentiment score ranging from -1 to 1 −-1: Negative sentiment − 0: Neutral sentiment − 1: Positive sentiment  Entity-level lexicon-based sentiment scoring

2014 Limitation  Semantic Orientation value(‘missed’) = -1  Gives more weight to the closely located word  Accuracy can suffer

2014 Improve accuracy  Accuracy is what we strive for!  More robust pre-processing −Prune data to fit for different types of user opinion (e.g. Twitter vs. YouTube comments)  Naïve Bayes Classifier Training  Tune accordingly

2014 Data Set  Test dataset −Stanford students collected −In 2009 −Over 3 million tweets with tested score −Analyzed 3500 tweets  Collected dataset −HP Vertica Pulse Twitter Connector −In 2014 −Total of 1.2 million tweets

2014 Data Pruning  Remove −Job postings #job, #jobs, #tweetmyjob −Links −Duplicates −Twitter specific characters # −Emoticons I hate my life :-), sarcasm is wide-spread disease  After pruning −~ tweets, 24% of the 1.2 million tweets

2014 Naïve Bayes Classifier

2014 Naïve Bayes Classifier  Results: −Final accuracy : 0.788

2014 Tuning Pulse  Positive words  Negative words  Neutral words  White lists  Stop words  Synonym mappings

2014 Accuracy Comparison  Sentiment scores generated for each phase

2014 Trend/Targeted Analysis  Targeted dataset analysis can help improve accuracy  Identify the most-mentioned words −Use the most-recurrent words to narrow the scope of analysis  Find new trends −Government healthcare (2009) vs. Obamacare (2014)  Are we looking at the targeted data? −“Solve healthcare challenges with technology!” −“Healthcare After ObamaCare” −“Get affordable healthcare at HealthCare.gov”

2014 Generating Tree  Increase the relevancy of sentiment score by running the sentiment analysis on the entity, as well as on the most-recurrent words to identify: −Homonyms that machines do not understand −More accurate scores based on user interest  Generate tree using Text Search −Merge stemmer words e.g. query, queries, querying… −Lucene - apache open source

2014 Tree View healthcare obamacare !(Obamacare) obama !(Obama) !(health) health

2014 Thank you Questions? Many thanks to*: Tim Donar, Solution Engineer Beth Favini, Tech Pubs Sr. Manager Judith Plummer, Tech Pubs Editor in Chief * In alphabetical order

2014 Got Feedback? Rate and Review the session using the GHC Mobile App To download visit