Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 594 Topics in AI – Text Mining and Analytics

Similar presentations


Presentation on theme: "CSC 594 Topics in AI – Text Mining and Analytics"— Presentation transcript:

1 CSC 594 Topics in AI – Text Mining and Analytics
Fall 2015/16 10. Sentiment Analysis

2 Sentiment Analysis Sentiment Analysis is to extract and identify the polarity of sentiments expressed in texts. Lately sentiment analysis has been widely applied to reviews/opinion pieces and texts from social media. But there are many challenges in conducting sentiment analysis, e.g. Judgement of sentiment (existence, degree/granularity) is not clear-cut. Sentiments are dependent on the domains and contexts (e.g. “addictive”) Sentences with negations (“not”, “no”, “__n’t”, etc.). Sentences with comparatives (“A is better than B, but still have problems”). User texts contain spelling errors, irregular typography (e.g. emoticons), and ungrammatical sentences. Words/expressions that imply sentiments are subtle (sentiment lexicon). Multiple sentiments could be expressed in one sentence/document. Possibility of sarcasm.

3 Sentiment Analysis Tasks (1)
Supervised: Classify documents into sentiment categories (positive, negative, neutral, etc.) Goals/End Products: Predictive models for sentiment categorization “Important/relevant features” that determine the sentiments.  look at features which are weighted heavier in the resulting model. Text Pre-processing: Standard pre-processing – stemming/lemmatizing, removing stop words Part-of-speech tagging – often focus on adjectives and nouns Term weighting N-grams or noun groups/phrases – unigram is too small of a unit Common techniques (in machine learning): Typical classification algorithms, such as SVM, Decision Tree, KNN. Naïve Bayes (as with general text classification)

4 Sentiment Analysis Tasks (2)
UnSupervised: Typical goal is to mine opinions for features/aspects Example: product features (e.g. “awesome graphics”) Features/aspects are often pre-defined (for specific domains). Sometimes (pre-defined) sentiment lexicons are also used. However, automatic identification of features or sentiment lexicon could be possible as well. Text Pre-processing: Standard pre-processing, POS-tagging and possible n-grams (or noun groups) are applied. Processing is done at the sentence-level – to get narrower context. Deeper NLP is often applied to extract precise/accurate result. Common techniques: Word Association/Collocations – PMI, Likelihood Clustering – to obtain general topics of the opinions in a corpus

5 Sentiment Lexicon for English (around 6800 words) – from (Hu and Liu, KDD-2004),


Download ppt "CSC 594 Topics in AI – Text Mining and Analytics"

Similar presentations


Ads by Google