Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Leticia Arco García

Similar presentations


Presentation on theme: "Prof. Leticia Arco García"— Presentation transcript:

1 Prof. Leticia Arco García leticiaarco@gmail.com
Central University of Las Villas, Cuba Artificial intelligence Lab Computer Science Department Opinion Mining Prof. Leticia Arco García

2 Motivation Someone who wants to buy a car
Looks for comments and reviews Someone who just bought a car Comments on it Writes about his experience Car manufacturer Gets feedback from customers Improve their products Adjust marketing strategies

3 This is not only true for individuals but also true for organizations
Opinions Opinions are central to almost all human activities and are key influencers of our behaviours Our beliefs and perceptions of reality, and the choices we make, are, to a considerable degree, conditioned upon how others see and evaluate the world When we need to make a decision we often seek out the opinions of others This is not only true for individuals but also true for organizations

4 Social media on the Web With the explosive growth of social media on the Web, individuals and organizations are increasingly using the content in these media for decision making

5 Contents Origin and definition Different levels of analysis
Opinions: definition, types and main problems Sentiment analysis tasks Polarity detection: two approaches Other sentiment analysis proposals Lexical resources and available datasets Applications in Business Informatics Our results Challenges: SemEval-2017 and TASS 2017

6 Origin (1/2) Some earlier work on interpretation of metaphors, sentiment adjectives, subjectivity, view points and affects ( ) Learning subjective adjectives from corpora (Wiebe, 2000) Yahoo! for Amazon: Extracting market sentiment from stock message boards (Dans and Chen, 2001) An operational system for detecting and tracking opinions in on-line discussion (Tong, 2001)

7 Origin (2/2) Sentiment analysis
Sentiment analysis: Capturing favourability using natural language processing (Nasukawa and Yi, 2003) Opinion mining Mining the peanut gallery: Opinion extraction and semantic classification of product reviews (Dave et. al., 2003)

8 Subjectivity analysis
Definition Sentiment analysis is the field of study that analyses people’s opinions sentiments evaluations appraisals attitudes emotions towards entities such as products services organizations individuals issues events topics their attributes Emotion analysis Review mining Opinion extraction Sentiment analysis Subjectivity analysis Opinion mining Sentiment mining Affect analysis

9 Different levels of analysis (1/2)
Document level Classifies whether a whole opinion document expresses a positive or negative sentiment Assumption: Each document expresses opinions on a single entity Sentence level Determines whether each sentence expresses a positive, negative or neutral opinion Closely related to subjectivity classification “The iPhone’s call quality is good, but its battery life is short.” Both levels do not discover what exactly people liked and did not like

10 Different levels of analysis (2/2)
Entity and aspect level Performs finer-grained analysis Directly looks at the opinion itself Goal: discover sentiments on entities and/or their aspects “The iPhone’s call quality is good, but its battery life is short.” Entity: iPhone Aspects: call quality and battery life Sentiment on iPhone’s call quality: positive Sentiment on its battery life: negative

11 Opinion The meaning of opinion itself is still very broad
Sentiment analysis mainly focuses on opinions which express or imply positive or negative sentiments 1, 2, 3 and 4 positive 5 negative Opinion source Opinion holders Date Sentiment orientations Opinion polarities Topic

12 Opinion definition An opinion is a quintuple (ei, aij, sijkl, hk, tl) where: ei is the name of an entity aij is an aspect of ei sijkl is the sentiment on aspect aij of entity ei hk is the opinion holder tl is the time when the opinion is expressed by hk The sentiment sijkl is positive, negative, or neutral, or expressed with different strength/intensity level, e.g., 1 to 5

13 Types of opinions (1/2) Regular opinions
Express a sentiment only on a particular entity or an aspect of the entity Direct opinion “Belgian chocolates taste very good.” Indirect opinion “After injection of the drug, my joints felt worse.” Comparative opinions Compare multiple entities based on some of their shared aspects “Belgian beers taste much better than Cuban beers.”

14 Types of opinions (2/2) Explicit opinion Is a subjective statement that gives a regular or comparative opinion “UHASSELT is a very good university.” Implicit opinion Is an objective statement that implies a regular or comparative opinion “The battery life of Nokia phones is longer than Samsung phones.” Explicit opinions are easier to detect and to classify than implicit opinions

15 Sentiment analysis is a NLP problem
It touches every aspect of NLP Co-reference resolution Negation handling Word sense disambiguation Sentiment analysis is a highly restricted NLP problem It does not need to fully understand the semantics of each sentence or document It only needs to understand some aspects of it Positive or negative sentiments Their target entities Their topics

16 Opinion Mining is more difficult than Text Mining
Informal language Abbreviations Emoticons Spelling and typographical errors Ironic and sarcastic language Language knowledge level Cultural level These characteristics impose a greater difficulty on the opinion mining, regarding other text mining tasks

17 Sentiment analysis tasks
Objective of sentiment analysis: Given an opinion document, discover all opinion quintuples Entity extraction and categorization Aspect extraction and categorization Opinion holder extraction and categorization Time extraction and standardization Aspect sentiment classification Opinion quintuple generation

18 Polarity detection: two approaches
Semantic approaches Characterized by the use of dictionaries of words (lexicons) with semantic orientation of polarity or opinion Computational learning techniques Consist on training a classifier using any supervised learning algorithm from a collection of annotated texts

19 Words expressing feeling or opinion
Positive opinion: good, wonderful, amazing, … Negative opinion: bad, poor, terrible, … Sentiment lexicon or opinion lexicon (sentiment words, opinion words, polar words, opinion-bearing words) Base type Comparative type

20 Approaches to compile sentiment words
Manual approach Labour intensive and time consuming Useful for final check in automated approaches Dictionary-based approach Few seed sentiment words to bootstrap based on the synonym and antonym structure of a dictionary Corpus-based approach Given a seed list of known sentiment words, discover other sentiment words and their orientations from a domain corpus Adapt a general-purpose sentiment lexicon to a new one using a domain corpus for sentiment analysis applications in the domain

21 Sentiment lexicon Although sentiment words and phrases are important for sentiment analysis, only using them is far from sufficient Sentiment lexicon is necessary but not sufficient for sentiment analysis

22 Some problems of feeling words
They may have opposite orientations in different application domains A sentence containing sentiment words may not express any sentiment Sarcastic sentences with or without sentiment words are hard to deal with Many sentences without feeling words can also imply opinions “It is a large dictionary, covering thousands of words.” “He has put on weight, and is now quite large.” “He likes to talk large, but I think he exaggerates.” “Can you tell me which camera is good?” “If I can find a good camera in the shop, I'll buy it.” “He killed the ant before it could bite him.” “I HATE to admit it but, I LOVE admitting things.” “I liiikeee winter, summer does not arrive yet :-(“ “What a great car! It stopped working in two days.” “This washer uses a lot of water.”

23 Sentiment classification using supervised learning (1/3)
Two-class classification problem: positive and negative Training and testing data used are normally product reviews A review with 4 or 5 stars is considered a positive review A review with 1 to 2 stars is considered a negative review First approaches: Naïve Bayes classification Support Vector Machines

24 Sentiment classification using supervised learning (2/3)
Like other supervised machine learning applications, the key for sentiment classification is the engineering of a set of effective features Terms and their frequency Part of speech Sentiment words and phrases Rules of opinions Sentiment shifters Syntactic dependency

25 Sentiment classification using supervised learning (3/3)
Apart from classification of positive and negative sentiments, researchers also studied the problem of predicting the rating scores (e.g., 1–5 stars) of reviews Regression problem

26 Subjectivity classification
Objective sentences Express factual information from sentences Subjective sentences Express subjective views and opinions Is subjectivity equivalent to sentiment? “I think that he went home.” “The phone broke in two days.”

27 Emotion Emotions are our subjective feelings and thoughts
Six primary emotions: love, joy, surprise, anger, sadness and fear Opinions that we study in sentiment analysis are mostly evaluations Rational evaluations are from rational reasoning, tangible beliefs, and utilitarian attitudes Emotional evaluations are from non-tangible and emotional responses to entities which go deep into people’s state of mind Five sentiment ratings emotional negative (-2), rational negative (-1), neutral (0), rational positive (+1), and emotional positive (+2)

28 Aspect-based sentiment analysis
Such methods are typically unsupervised Sentiment lexicon Composite expressions Rules of opinions Sentence parse tree Sentiment shifters But-clauses Aggregate opinions

29 Aspect extraction approaches
Extraction based on frequent nouns and noun phrases Extraction by exploiting opinion and target relations Extraction using supervised learning Extraction using topic modelling Semantic classification and deep learning

30 Grouping aspects into categories
Aspect expressions need to be grouped into synonymous aspect categories Each category represents a unique aspect Same aspect for phones: “call quality” and “voice quality” Grouping such aspect expressions from the same aspect is critical for opinion analysis WordNet and other thesaurus “movie” and “picture” are synonyms in movie reviews “picture” is more likely to be synonymous to “photo” while “movie” to “video” in camera reviews Many aspect expressions are multi-word phrases, which cannot be easily handled with dictionaries

31 Opinion summarization (1/3)
Different entity names Aspect-based opinion summary Different aspect names

32 Opinion summarization (2/3)
Visualization of aspect-based summary of opinions on a digital camera

33 Opinion summarization (3/3)
Visualization of aspect-based summaries of opinions

34 Businesses that provide fake review writing services
Opinion spammers A key feature of social media is that it enables anyone from anywhere in the world to freely express his/her views and opinions without disclosing his/her true identify and without the fear of undesirable consequences Opinion spammers Friends and family Competitors Company employees Businesses that provide fake review writing services Genuine customers Some businesses give discounts and even full refunds to some of their customers on the condition that the customers write positive reviews for them Agencies and political organizations may employ people to post messages to secretly influence social media conversations and to spread lies and disinformation

35 Opinion spammers vs opinion spam detection
Review content: linguistic features Meta-data about the review: user-id, star rating, time, host IP address, … Product information Opinion spam detection Supervised Unsupervised There is no labelled training data for learning Exploit duplicate reviews Create features Spam detection based on atypical behaviours Spam detection using review graph Group spam detection: Frequent pattern mining

36 Cross-domain sentiment classification
A classifier trained using opinion documents from one domain often performs poorly on test data from another domain Words and even language constructs used in different domains for expressing opinions can be quite different The same word in one domain may mean positive but in another domain may mean negative Learn as humans do Retain learned knowledge from previous tasks and use it to help future learning Is a continuous learning process where the learner has performed a sequence of learning tasks Domain adaptation or transfer learning is needed A small amount of labelled training data for the new domain No labelled data for the new domain Lifelong machine learning

37 Cross-language sentiment classification
Perform sentiment classification of opinion documents in multiple languages Researchers from different countries want to build sentiment analysis systems in their own languages. Companies want to know and compare consumer opinions about their products and services in different countries Co-training methods Lexical resources

38 Opinion search and retrieval
Find public opinions about a particular entity or an aspect of the entity Find customer opinions about a digital camera Find opinions of a person or organization (i.e., opinion holder) about a particular entity or an aspect of the entity (or topic) Find Charles Michel’s opinion about terrorism

39 Lexical resources WordNet Affect SentiWordNet General Inquirer

40 WordNet Affect WordNet-Affect is an extension of WordNet Domains, including a subset of synsets suitable to represent affective concepts correlated with affective words Affective labels (a-labes) are assigned to a number of WordNet synsets

41 WordNet Affect: Terms and affective categories

42 Some terms related to "university" through their emotional categories

43 SentiWordNet SentiWordNet is a lexical resource for opinion mining
SentiWordNet assigns to each synset of WordNet three sentiment scores: Positivity Negativity Objectivity Generating SentiWordNet 1. A weak-supervision, semi-supervised learning step 2. A random-walk step

44 General Inquirer Harvard categories:
Positive, Negative, Strong, Week, Active, Passive, … Pleasure, Pain, Feel, Arousal, Virtue, Emotion, … New categories based social cognition Lasswell value dictionary categories

45 Some public available datasets
Stanford large movie dataset Movie TripAdvisor TBOD ISEAR DUC data Spinn3r dataset HASH EMOT OpinRank dataset

46 Opinion mining and enterprises
Enterprises are open and flexible in the use of technological tools to “sense” customers and market Acquiring information in real-time allows the company to be agile and to develop ”Sense and Response” capabilities An agile enterprise respond immediately to any internal or external event as customer demand or customer opinions Knowing what the customer thinks of a given product/service helps top management to introduce improvements in processes and products Customer opinions represent a potential of knowledge to be consider for the acquisition of competitive advantages

47 Opinions are very important for decision making
Gretzel and Yoo (2008) demonstrate that 97.7% of travel booking decisions are made after consulting other travellers’ opinions, of which 77.9% involve the use of customer reviews as a source of information helping to make a better decision Gretzel, U. & Yoo, K. H. (2008) Use and Impact of Online Travel Reviews Information and Communication Technologies in Tourism. Innsbruck, Austria.

48 How can a sentiment analysis tool help my brand?
Better understand the motivations behind sentiment Learn from social posts, news, reviews, and more Benchmark against competitors Track purchase intent Evaluate campaign impact Analyse product launch response

49 Some sentiment analysis tools
Opinion Crawl Meaning cloud Trackur SAS Opentext Statsoft NetOwl Extractor Meltwater

50 Cloud‐based Event‐processing Architecture for Opinion Mining (1/2)
Smart distributed architecture for opinion mining on internet-based content that answers key challenges: Integrating heterogeneous data sources Adapting to events through dynamic system configuration A novel approach of semantic complex event processing in a cloud environment capturing different levels of information: Event data Content from various heterogeneous sources Distributed sources Dynamic co-reference resolution

51 Cloud‐based Event‐processing Architecture for Opinion Mining (2/2)
Topic modelling and sentiment analysis Deep linguistic and interlinking analysis Transfer learning and active learning of opinions Cloud computing and event processing

52 Enterprise information fusion for real-time business intelligence (1/2)
Correlate the external events in real-time with known facts about the internal operations and transactions of the enterprise and its ecosystem

53 News event detection from Twitter
Enterprise information fusion for real-time business intelligence (2/2) News event detection from Twitter

54 Identifying customer preferences about tourism products using an aspect-based opinion mining approach

55 A novel application mining for competitive intelligence
A new method to extract opinion patterns from customer reviews and its application to evaluate resources or internal factors in an enterprise Opinion gathering Text pre-processing Factor and polarity detection Internal factor evaluation

56 Customer voice sensor Call centre is an important intermediary between enterprise and customers It helps customers to solve the problems It allows the enterprise to deeply analyse the customer's voice and make a distinct market positioning A comprehensive opinion mining system for call centre conversation

57 Mobile application for customers’ reviews opinion mining

58 The Power of Text-mining in Business Process Management (1/2)

59 The Power of Text-mining in Business Process Management (2/2)

60 PosNeg Opinion Opinions Identify terms
Disambiguate lexically each term Obtain all meanings of each term Classify each term in positive or negative Evaluate the opinion

61 Improving SentiWordNet 3.0
84342 terms Preprocessing stage: Split terms considering if they have polarity values assigned or not 5037 79305 terms Stage 1: Assign polarity values considering the synonyms of terms without assigned polarity values 51027 28278 terms Stage 2: Assign inverse polarity values considering the antonyms of terms without assigned polarity values 5678 22600 terms Stage 3: Assign polarity values considering the synonyms of terms with assigned polarity values 5770 16830 terms Stage 4: Assign inverse polarity values considering the antonyms of terms with assigned polarity values 1539 15291 terms without assigned polarity values 69051 terms with assigned polarity values

62 SpanishSentiWordNet Intralinguistic index
agresor n attacker assailant aggressor assaulter aggressor robber Intralinguistic index The spanish term and its POS label English meaning of the term Improved SentiWordNet 3.0 Negative and positive polarities of each meaning Evaluate the polarity of the term by adding the positive and negative polarities of its meanings Negative and positive polarities of the Spanish term SpanishSentiWordNet

63 Topic detection assisting polarity detection
We got a large room with 2 double beds and 2 bathrooms, The TV was Ok, a 27' CRT Flat Screen. We stay at Hilton for 4 nights last march. It was a pleasant stay. We got a large room with 2 double beds and 2 bathrooms, The TV was Ok, a 27' CRT Flat Screen. The concierge was very friendly when we need. The room was very cleaned when we arrived, we ordered some pizzas from room service and the pizza was Ok also. The main Hall is beautiful. The breakfast is charged, 20 dollars, kinda expensive. The internet access (WiFi) is charged, 13 dollars/day. Pros: Low rate price, huge rooms, close to attractions at Loop, close to metro station. Cons: Expensive breakfast, Internet access charged. Tip: When leaving the building, always use the Michigan Av exit. Its a great view. The concierge was very friendly when we need. The breakfast is charged, 20 dollars, kinda expensive. The room was very cleaned when we arrived

64 Schema for topic segmentation and detection
Textual corpora Identify textual units textual units Pre-process tokens Represent textual units vectors, graphs, probabilistic distribution Segment segments Represent segments Cluster segments segment clusters (topics) Label segment clusters Topics and corresponding labels Framework OpinionTopicDetection Desktop application OpinionTD

65 Open issues and future directions (1/2)
Data collected from various resources are often so much noisy, wrongly spelt and unstructured There is a lack of universal opinion grading system across sentiment dictionaries Online discussion and political discussions often contain irony and sarcastic sentences For better product comparison, we should compare a set of products with respect to their common aspects The lack of proper review spam dataset is a major issue in order to perform opinion spam detection

66 Open issues and future directions (2/2)
A very few attempts were made to utilize the potential of optimization techniques for feature selection There is a lack of opinion mining system in non-English languages Cross-domain sentiment analysis is still a major challenge Aspect level sentiment analysis is very much required for comparative visualization of similar kind of products The main challenge lies in review helpfulness is the validation of the proposed method

67 Challenges SemEval 2017 Detecting sentiment, humor, and truth
Task 4: Sentiment Analysis in Twitter Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News Task 6: #HashtagWars: Learning a Sense of Humor Task 7: Detection and Interpretation of English Puns Task 8: RumourEval: Determining rumour veracity and support for rumours TASS 2017 Task 1: Sentiment analysis at tweet level Task 2: Aspect-based sentiment analysis

68 Prof. Leticia Arco García leticiaarco@gmail.com
Central University of Las Villas, Cuba Artificial intelligence Lab Computer Science Department Thanks! Questions, ideas, suggestions, comments, … Opinion Mining Prof. Leticia Arco García


Download ppt "Prof. Leticia Arco García"

Similar presentations


Ads by Google