Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kevin Hsin-Yih Lin, Changhua Yang, Hsin-Hsi Chen

Similar presentations

Presentation on theme: "Kevin Hsin-Yih Lin, Changhua Yang, Hsin-Hsi Chen"— Presentation transcript:

1 Emotion Classification of Online News Articles from the Reader’s Perspective
Kevin Hsin-Yih Lin, Changhua Yang, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University IEEE 2008

2 1. Introduction Past studies on the emotion classification of documents focus on the writer’s emotional state. This paper addresses the problem from the reader’ perspective. There are distinctions between reader and writer emotions, because they do not always agree. an infamous politician’s miserable day

3 1. Introduction Reader-emotion classification has several applications. Integrating reader-emotion classification into a web search engine dog-loving girl heartwarming puppy stories Another application is to classify a website’s contents into emotion classes Like Yahoo! Kimo News Need feedback of users Automatic method can relieve the problem

4 1. Introduction An essential prerequisite to realizing the above applications is the ability to classify documents into reader-emotion categories. Research in such an area was difficult in the past, due to the scarcity of large manually-annotated corpora. Now we have many websites like Yahoo! Kimo News or United Daily News.

5 1. Introduction They classify online news articles into reader- emotion categories: a set N of news articles , a set of E of emotions The goal is to find function f: N  E They experiment adopted the machine learning method and different features were involved.

6 2. Related Work Past research on emotion classification focuses on writers. Only a few (K. H. Lin 2007 ) address the reader aspect. Some studies relating to writer-emotion: Pang et al (2002) design a classifier to decide whether a movie review contains a positive or negative sentiment. Their results reveal that using SVM with word unigram features outperforms other combinations of unigram, bigram, part-of-speech, word position, and adjective features.

7 2. Related Work More work has been done to search for features better than unigrams. Mullen and Collier (2004), and Hu et al. (2005) exploit word sentiments to achieve better classification accuracy. Cui et al. (2006) show that high-order n-grams are beneficial if the corpus size is large enough. Sentiment classification of texts is not restricted to the document level. Wiebe (2000) conducts experiments to learn the subjectivity of adjectives. Kim and Hovy (2004) study sentence sentiments.

8 3. Constructing the Corpus
They obtain Chinese news articles from Yahoo! Kimo News, which allows a user to cast a vote for one of eight emotions. collect news articles along with their voting statistics a week after their publication dates to ensure that the vote counts have stabilized. They use Yahoo!’s eight emotions: happy, sad, angry, surprising, boring, heartwarming, awesome, and useful. They treat the most dominant emotion of a news article as the article’s emotion class. The corpus consists of news articles dating from January 24, 2007 to August 7, 2007.

9 4. Extracting Features After obtaining news articles, the next step is to convert them into features. Five different types of features are used: Chinese character bigrams Chinese words News metadata Affix similarities Word emotions

10 4. Extracting Features 4.1Basic Feature Chinese character bigrams
Taken from the headline and content of each news article. Binary value to indicate the presence of a bigram. If a bigram appears at least once in a news article, then the bigram has a feature value of 1. Chinese words Extract words by utilizing Stanford NLP Group’s Chinese word segmenter. As in the case of bigrams, binary feature values are used. News meta-data News category, Agency, Hour of publication ,reporter, event location Again, binary feature values was used.

11 4. Extracting Features 4.2 Affix Similarity Features
It is computed by first identifying all the common substrings between a news article and the training data of an emotion class. Then we quantify the similarity based on the number and lengths of the common substrings. Affix similarity can be divided into two parts: prefix similarity and suffix similarity.

12 4. Extracting Features Compute emotions’ score of an article:
1. 先將每個emotion value 歸零 2. 對每個article的suffixes做計算 3. 找到 S 中與 t 最長的相似字串 s* 4. 讓 e* 為 s* 所代表的emotion 5. 更新article 在 emotion e*的得分 E: The set of emotions S: The set containing all suffixes of all news articles in the training corpus T: The set containing all suffixes we wish to obtain features from Ve: a value representing the degree of similarity between emotion e and the test news article LCP(t, s) : a function which returns the length of the longest common prefix of t and s. EMOTION(E, s*) : a function which returns the emotion associated with s* normalize 回傳 a set of emotion value

13 4. Extracting Features DATA INPUTS Processing
Training corpus: S = {“The team won”, “team won”, “won”, “This team lost”, “team lost”, “lost”} article string: T = {“This team won”, “team won”, “won”} emotions: E = {“happy”, “sad”} Assume we are now at line 2 of Algorithm 1 and t = “This team won” 跟S作比對 “This team lost” 與 “This team won” 有最長的相似prefix Then, s* 為 “This team lost” , e* 為 “sad” 由於 t 與 s 的最長相似prefix長度為2, 所以Ve*也就是Vsad增加2 重複執行 t=“team won” 與 t=“won” 在整個程式結束後可以得到這篇文章的emotion feature value Vsad =2/3 , Vhappy =3/3 Happy Sad Processing t=“This team won” (2) t = “team won” (2) + t = “won” (1)

14 4. Extracting Features The algorithm for computing prefix similarity is the same as suffix similarity algorithm. Except that all substrings in S and T are reversed In the last example, T would become {“won team This”, “team This”, “This”}.

15 4. Extracting Features 4.3. Word Emotion Features
Many words have implied emotional meanings. Wonderful happiness We first generate an emotion lexicon. Method C. Yang, K. H. Lin (2007) The lexicon contains entries describing collocation information between words and 40 emotions Each entry in the lexicon is a 3-tuple (w, b, m), w is a Chinese word b is a blog emotion m is the point-wise mutual information of w and b.

16 4. Extracting Features Suppose we have a test news article string “an excellent and tearful story”. Then W = {“an”, “excellent”, “and”, “tearful”, “story”}. Suppose only the words “excellent” and “tearful” appear in L (set of emotion lexicon) the associated entries are (“excellent”, happy, 9) (“excellent”, surprising, 5) (“tearful”, sad, 7) (“tearful”, surprising, 3). Then the feature values are Vhappy = 9/9, Vsurprising = 8/9, and Vsad = 7/9. The values of the other 37 features (other emostions) are 0.

17 5. Experiment and Results
5.1 Experiment Setup Given the great performance of support vector machines (SVM) in many classification tasks, they choose SVM as the classifier algorithm. The implementation they use is libsvm. To estimate the optimal C cost parameter value, they perform four-fold cross-validation on the training data. As for the kernel, linear kernel is used.

18 5. Experiment and Results
5.1 Experiment Setup Other methods are implemented for performance comparison. Baseline: naïve bayes (NB) on Chinese character bigrams and Chinese words. Writer- emotion classification Pang (2002) Cui (2006) Extend their methods to handle multi-class classification

19 5. Experiment and Results
5.3. Results and Discussions exceptions The best significantly higher than every other model with p-value ≤ 0.01. SVM – support vector machines PA – passive-aggressive classifier NB – Naïve Bayes classifier BI – bigram WD – word MT – metadata AS – affix similarity WE – word emotion CN – Cui’s combined word-ngram The number following CN is the number of features kept after performing χ2 test filtering.

20 5. Experiment and Results
Analyzing each feature type individually, we see that SVM+BI has the best accuracy of It is also worth noting that SVM+AS obtains a relatively high accuracy of Only 16 distinct features in total. In contrast, BI consists of 865,451 distinct features.

21 5. Experiment and Results
Let us investigate the effect of adding a feature type to an existing feature combination. Classification accuracy increases when AS is added to any combination of BI, WD, MT and WE. Every accuracy improvement is statistically significant with p-value ≤ The increase indicates that AS is able to capture some important emotion-related characteristics As for BI, adding it to any combination improves accuracy with p- value ≤ 0.01. BI is also an important feature type. In contrast, adding WD, MT or WE to an existing feature combination neither consistently increases nor consistently decreases accuracy. However, adding WE to SVM+BI+WD+MT+AS produces the model with the highest accuracy.

22 5. Experiment and Results
Both SVM+BI and SVM+WD perform better than their NB+BI and NB+WD baseline counterparts. Pang’s word unigram classifier is equivalent to the SVM+WD model, which achieves a relatively high accuracy of Unlike the observation made in Pang’s work, however, combining WD with other feature types can improve accuracy in this experiment The PA classifier does not perform as well as SVM when used with Cui’s n-gram features. The PA classifier does not perform as well as SVM when used with Cui’s n-gram features The PA classifier does not perform as well as SVM when used with Cui’s n-gram features This accuracy is beaten by the simpler model of SVM+BI, which has a slightly higher feature count of 865,451. So, contrary to Cui’s results, using high-order n-grams does not improve accuracy in this experiment.

23 6. Reader Behavior Versus Classifier Behavior
Examine how closely the best classifier, SVM+BI+WD+MT+AS+WE, models reader behavior. Observe the similarities and differences between the classifier’s confusion matrix and the news articles’ emotional distributions. Average votes in Happy class Classifying result

24 6. Reader Behavior Versus Classifier Behavior
Figure 1(a) shows that if most people feel heartwarming after reading a news article, then many other people are going to feel happy. Figure 2(b) reveals that if the SVM classifier wrongly classifies a happy article, then the incorrect category is most likely to be the boring class. Figures 1(a) to 1(c) are placed directly above 2(a) to 2(c) so that we can observe the similarities and differences between the readers’ and the classifier’s behavior. Only histograms for heartwarming, happy and useful classes are shown, because the patterns they exhibit are representative of the characteristics found in other histograms.

25 6. Reader Behavior Versus Classifier Behavior
In Figure 1(a), the happy class receives 20% of the votes on average when the most dominant class is heartwarming. However, the percentage of instances wrongly assigned to the happy category is only 6% in Figure 2(b) 2(a). In fact, the happy class is not even the category that the SVM classifier is most likely to wrongly classify a heartwarming instance into. Although Figure 1(a) indicates that many readers are likely to feel happy after reading a heartwarming article, Figure 2(a) shows that the classifier does not exhibit this tendency. It implies that heartwarming articles have certain discriminative features that differentiate them from happy articles.

26 6. Reader Behavior Versus Classifier Behavior
They use χ2 test to measure how discriminative a feature is, and inspect the features that appear in heartwarming instances. The Chinese translations of the words such as affect, caring, story and mother, are among the most discriminative features according to the χ2 test. The news category, charity, is also prevalent and discriminative.

27 6. Reader Behavior Versus Classifier Behavior
Figure 1(b) shows that if most readers feel happy after reading an article, then the vote counts for other emotions will be quite low. The SVM classifier’s high accuracy for the happy class mirrors this pattern. It is discovered that many happy news articles have features related to sports, especially baseball. In fact, the sports and baseball news categories are the two most discriminative features for the happy class according to χ2 test. 27.9% of all the happy articles in the training corpus are in the baseball category the probability of an article in this news category belonging to the happy class is 50.3% of all the happy instances in the training corpus are in the sports category. The probability of a sports news article belonging to the happy class is The highly-skewed emotional distribution of the happy class is likely to be an effect of the readers’ great interest in sports.

28 6. Reader Behavior Versus Classifier Behavior
Figure 1(c) and Figure 2(c) display different patterns. The classifier has an outstanding accuracy of 0.90 for the useful class, but the average fraction of reader votes is 0.65. It is discovered that certain news categories have very large χ2 values with respect to the useful class. For example, 92% of all the news articles in the weather news category are in the useful class in the training corpus. The corresponding percentage for the test corpus is 86%. Other news categories associated with the useful class include cosmetics, financial management, and health. These observations are intuitive, because the news categories listed above should contain news articles with practical information. The emotional ambiguities indicated by the readers’ voting patterns do not necessarily translate into classifier performance.

29 7. Ranking Emotions Sometimes more than one emotion may be prevalent in a news article. In such cases, it would be useful to provide a ranking of emotions. To rank emotions, we use regression on an emotion to predict its percentage of votes in a news article. To perform regression, we adopt support vector regression (SVR).

30 7. Ranking Emotions The evaluation metric is or accuracy at n, which considers a proposed emotion list to be correct if its first n emotions are both the same as and in the same order as the true emotion list’s first n emotions.

31 7. Ranking Emotions The accuracy for predicting the most dominant emotion (i.e., is slightly lower than the best accuracy in the classification experiment. The sharp decrease in accuracy as n increases reflects the hardness of the ranking task. We regard each unique emotion sequence of length n as a class. In particular, when n = 8, we are essentially classifying news articles into 8! = classes. Generating a completely correct ranked list is a difficult task.

Download ppt "Kevin Hsin-Yih Lin, Changhua Yang, Hsin-Hsi Chen"

Similar presentations

Ads by Google