Event Detection and Opinion Mining

Event Detection and Opinion Mining
Ji Yang, Wenzhu Tong

Event Detection News Only Twitter Only Jointly on News and Twitter

Event Detection in News
Retrospective Event Detection Cluster historical news articles into events Group Average Clustering with Buckets [40] Probabilistic Model [23] Online Event Detection Classify newly arrived article into New or Old Single-pass clustering [2, 40]

Event Detection in Twitter
Disaster Events [30] Semantic analysis with SVM Temporal analysis based on Poisson distribution Spatial analysis as Markov process Tweet Clustering [37] Words are modeled as wavelet signals modularity-based graph partition

Joint Event Detection Cross-Collection Topic Aspect Model [9]
General topic model, news-specific topic model, and tweet-specific topic model Topics and aspects are incorporated ET-LDA [17] Event evolution in news follows a Markov process Tweet topic is either a general topic or a specific event topic

Joint Event Detection Probabilistic Source LDA [10]
Heterogeneous sources including news and tweet Local topic models for each source Topic-topic congruence between different sources Time-dependent topic model [15] Local topics for each source and common topics Dirichlet parameter for each topic is associated with a time-dependent function

Linking Tweets to News WTMF-G [12]
A matrix factorization model to enrich short tweets with latent tokens hashtags, named entities, and temporal relations are modeled as three graph regularizations News that is the most similar to the latent vector of a given tweet is linked to that tweet

Linking News to Tweets Finding Relevant Messages [33]
Multiple query models from source news article and social media to retrieve relevant messages Merge different ranked list of different query models with data fusion techniques Mapping News to Hashtags [32] Tweets are retrieved and separated per article by a shallow matching of keywords Classify and score article-hashtag pairs

Opinion Mining Growing popularity of opinion-rich resources
An eruption of research, majorly text analysis General review Specific opinion-rich resource, i.e. Twitter With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can actively user information technologies to seek out and understand the opinions of others. In the recent two decades, an eruption of research has been done to computationally treat the problem of opinion mining and sentiment analysis, majorly based on text analysis. We give a review on techniques and approaches in this field, with a stress on a specific opinion-rich resource, i.e., twitter.

Sentiment Polarity Classification
Datasets Online review context Political perspectives News orientation As part of the framework for summarizing text units A large portion of work falls into the category of sentiment polarization. Many of them are conducted in the online review context, in which they use classification methods to classify a piece of review text into expressing either a positive opinion or a negative opinion about a product. Some others are more problem-specific. Some methods also employ sentiment polarization as part of the framework and use the polarized opinions for summarizing text units on various topics. Republic, democratic

Sentiment Polarity Classification
Multi-class text categorization Vs. general topic-based multi-class classification Regression using probabilistic generative models LARA [KDD’10] – rating regression LARAM [KDD’11] – latent topic model JMARS [KDD’14]- collaborative filtering and topic model The more general problem of sentiment classification must determine the user’s evaluation with respect to a multi-point scale or multiple aspects and can be partly viewed as a multi-class text categorization problem. However unlike general topic-based multi-class classification problems, sentiment-related multi-class classification can be naturally formulated as a regression problem, because ratings are ordinal. aims at analyzing opinions expressed in each review at the level of topical aspects and discovering each individual reviewer’s latent rating on each aspect as well as the relative importance weight on different aspects when forming the overall judgement. It takes rating scores, review texts and a list of major aspects identified by a bootstrapping-based algorithm as input and models the review texts generation process. The model is estimated through an EM algorithm. further extends the power of the model in [35] by incorporating a latent topic model, so that the model can work without the supervision of a list of major aspects and properly associate words with latent aspects automatically identified within the review context. For better modeling on multiple aspects of objects as well as multiple interests of individuals, a probabilistic model called Jmars wasproposes based on collaborative filtering and topic modeling. It is formalized in the movie review scenario, because those reviews often inform both about the content of a movie and also the interests of a user. Its generative model explicitly considers the interests of users and aspects of movies to generate the review text and is estimated under the guidance of the overall rating obtained with each review text.

Subjective Information Detection
Detect subjective sentence based on adjectives Use similarity among word distributions to compute subjectivity according to word sense Unsupervised approaches to create sentiment lexicons While polarity classification assumes that the incoming documents are always opinionated, some work has been done on detecting subjective information within the texts. As a representative, an early work of subjectivity detection focuses on telling whether a given sentence is subjective or not based on the adjectives appearing in that sentence. For the subjectivity in single words, some work uses similarity among word distributions to computed subjectivity according to word sense. Some unsupervised approaches are taken to create sentiment lexicons from text corpora. Functions based on subjective indicators determined by the lexicon are then designed to compute the degree of subjectivity of text units.

Language model adaption
Jointly models sentiment words, topic words and sentiment polarity as a triple Compare posterior distributions using KL divergence to determine perspectives KL divergence between different aspects is an order of magnitude smaller than that between different topics Language models have also been applied to various opinion mining and sentiment analysis tasks. As an example, one work jointly models sentiment words, topic words and sentiment polarity in a sentence as a triple, so it can rank documents or sentences according to both sentiment relevancy and topic relevancy. Another work models the generation of documents with Dirichlet priors and uses Kullback-Leibler (KL) divergence to compare the posterior distributions of documents to determine if they are from different perspectives. They find that the KL divergence between different aspects is an order of magnitude smaller than that between different topics, which may also help explain for the difficulty of sentiment analysis in comparison with general topic analysis.

Opinion Summarization
Extracted sentiment information as summary Capture key aspects by single passages Location and constituent words matter Track the change of sentiment orientation from one sentence to the next Combination of isotonic regression and conditional random fields generate summary using local extrema Use graphs to represent opinions among entities Another group of work is on opinion summarization. A summery is the aggregation and representation of sentiment information extracted from individual or collections of documents. In this , one work attempts to capture key aspects of the author’s opinion from a document by single passages using Naive Bayes and regularized logistic regression models. Some of their experimental show that a sentence’s location and constituent words are valuable predictors of whether is should be chosen as a sentiment summary. Another work views each entire document as a timeline and track the change of sentiment orientation form one sentence to the next. After modeling the sentiment flow using a combination of isotonic regression and conditional random fields, it can then generate sentiment summaries by picking up the sentences at local extrema of the sentiment flow. Graphs are also employed to represent the outputs of sentiment extractions from single documents, because they are very suitable when the important information consists of a set of entities being described and the opinions that some of the entities hold about each other.

Summary for document collections
Decide same semantic content Redundancy and conflicts matter Modify LDA to incorporate the influence of aspects and sentiments on the generative process of reviews Classic natural language generation pipeline of content selection, lexical selection and sentence planning It is also intriguing and challenging to develop sentiment summaries for document collections. The most important problem is to decide whether two sentences or text passages have the same semantic content. The unique challenges in identifying sentiment semantic content is that the redundancy of opinions matters and conflict sets of opinions largely exist in opinion-oriented setting. Some work modifies classic topic models like LDA to incorporate the influence of aspects and sentiments on the generative process of reviews, while in other cases, classic natural language generation pipeline of content selection, lexical selection and sentence planning is systematically generated for sentiment summarization.

Summary for document collections
Create textual summaries for sets of documents using headlines Select a few documents of interest as representative samples of opinions Mine only product features that have been commented on by customers Identify opinion sentences in each review and summarize the results Use synonym set and antonym set in WordNet Some techniques are specifically developed for opinion-based summarization. For example, one work proposes to create textual summaries for sets of documents using headlines of documents. They choose documents with the most positive on-topic sentences based on topic modeling and sentence polarity classification. Another work attempts to select a few documents of interest as representative samples of opinions from the corpus for presentation to the user, so that both positive and negative points of view are covered, rather than just the dominant sentiment. Another work differentiates the tasks of traditional text summarization and sentiment summarization by mining only product features that have been commented on by customers in e-commerce websites, they identify opinion sentences in each review and summarizing the results by using adjective synonym set and antonym set in WordNet.

Opinion mining on Twitter
Variability and instantaneity Sentiment classification first visit Basic text features like unigrams, bigrams Emoticons as noisy labels for training Classifiers including Naïve Bayes, maximum entropy and SVM Standard baseline Microblogs have evolved to become a source of varied kind of information, among which Twitter is the most popular and well developed website. The most valuable properties of twitter include its variability and instantaneity - content on it is a natural mixture of texts representing human sentiments and attitudes about an incredible breadth of topics, providing a rich text resource for opinion mining and sentiment analysis.

Opinion mining on Twitter
Tree representation of tweets and Partial Tree kernel for similarity computation Various features like n-grams, lexicon features, part-of-speech features and other micro-blogging features Method to automatically collect corpus, statistical linguistic analysis and POS-tagging Many work follow afterwards about the problem of opinion mining on twitter. They concluded that the best results on the evaluation comes from the n-grams and lexicoal features. Part-of-speech features may not be useful for sentiment analysis in the microblogging domain. They use happy and sad emoticons to query twitter to get a corpus with positive and negative sentiments. And they query the accounts of popular newspapers and magazines to get objective tweets. Hinting on the different patterns of the formation of these two groups of texts.

Entity-centric topic-based summarization framework
Mine topics from #hashtags Graph-based topic extraction Generate templates for insight tweets Classify entity dependent opinion tweets Opinion summary through an unified optimization framework Develop topic related opinion summaries for entities in twitter such as celebrities and brands. Templates generalized from paraphrasing are then used to identify tweets with deep insights and an entity dependent sentiment classification approach is employed to identify the opinion towards given entities of tweets. They finally generate opinion summary through an unified optimization framework by integrating information from dimensions of topic, opinion and insight. This work provides a good roadmap of sentiment analysis on microblog data with a coverage of most challenges and corresponding possible solutions.

Event Detection and Opinion Mining

Similar presentations

Presentation on theme: "Event Detection and Opinion Mining"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Event Detection and Opinion Mining

Similar presentations

Presentation on theme: "Event Detection and Opinion Mining"— Presentation transcript:

Similar presentations

About project

Feedback