Emoticon Analysis for Chinese Health and Fitness Topics Shuo Yu 1, Hongyi Zhu 1, Shan Jiang 2, Hsinchun Chen 1,2 1 School of Economics and Management,

Emoticon Analysis for Chinese Health and Fitness Topics Shuo Yu 1, Hongyi Zhu 1, Shan Jiang 2, Hsinchun Chen 1,2 1 School of Economics and Management, Tsinghua University, China 2 Artificial Intelligence Lab, The University of Arizona, USA

2 Outline Introduction Literature Review Research Gaps and Questions Research Design Test-bed and Evaluations Conclusions and Future Directions

3 1. Introduction An emoticon is a meta-communicative pictorial representation of a facial expression, which serves to convey information about the user’s emotional state. –As a compensation for the loss of non-verbal symbolic information in text-based online communication, emoticons have been widely used by Internet users to facilitate text-based communication (Chiu 2007; Suzuki and Tsuda, 2006). Recently, an increasing use of emoticons has been observed in Chinese web forums and other social networking sites that feature health and fitness topics.

4 1. Introduction Some comments from health and fitness topics in video websites are shown in the table below. Comment examplesEnglish translation 现在跳到这里还没有喘气受不了的感觉，运动锻炼万岁 ~\( ≧▽≦ )/~ Danced so far without being short of breath. Hooray sports and exercise! ~\( ≧▽≦ )/~ 第一次完整做完妈蛋明天跑不动了怎么办 QAQ Finished the full course the first time. What should I do if have no strength to run tomorrow QAQ 生姜可以加多点让热浪来的更猛烈些吧 _( ﾟ Д ﾟ ) ﾉ Add more ginger, make it more hot and spicy! _( ﾟ Д ﾟ ) ﾉ培根咸是烤的时候的问题，你可以不放那么咸啊 -_-|| The salty bacon is due to wrong roasting procedure. You may make it less salty -_-||

5 1. Introduction Analysis of emoticons in these platforms can provide insights into how people use emoticons to express health-related concerns, and revealing people’s feelings and moods hidden beneath the text. Involving emoticon analysis may improve the performance of sentiment analysis for health and fitness topics, as well as monitoring people’s emotional conditions from social networks, especially for the ones who should avoid violent emotional fluctuations.

6 1. Introduction We developed a novel kinesics model with affect mapping to perform emoticon analysis in Chinese health and fitness topics. The system: –Utilizes a large lexicon of emotion symbols constructed based on existing online emoticon dictionaries, and adapts it to the Chinese context. –Segments emoticons into semantic components (e.g., eyes & mouth) and evaluates their roles. –Extracts emoticons from online texts and classifies them into one of the 7 pre-defined affect categories.

7 2. Literature Review Four main streams on emoticon research are reviewed: –Emoticon and user behaviors –Emoticon and sentiment analysis –Emoticon generation and recommendation –Emoticon extraction and classification

8 2.1 User Behaviors This stream of research focuses on the motivation of using emoticons, the outcomes of using emoticons, and how emoticons affect individuals’ behaviors. –Emoticons have been found to help clarify and enhance the meaning of text messages (Rezabeck and Cochenour 1998). –Derks et al. (2007) has shown that people tend to use emoticons in a positive context and with their friends. These studies have contributed to valuable behavioral investigations into the pragmatics of emoticons used as a supplement of verbal language. However, these studies mostly focused on a small selected set of emoticons.

9 2.2 Sentiment Analysis This stream of research focuses on using the valence of emoticons to improve classification performance in sentiment analysis. –Yang et al. (2007) used emoticons to perform sentiment tagging of sentences appearing in articles, avoiding manual annotation. –Poongodi et al. (2013) proposed sentiment analysis models involving emoticons as features. These studies have demonstrated that involving emoticons in sentiment analysis could result in positive outcomes. However, the coverage of emoticon types was limited, and the structure of emoticons was not fully analyzed.

10 2.3 Generation and Recommendation This stream of research focuses on emoticon generation and recommendation in human-computer interaction systems and input methods. –Nakamura et al. (2003) used an algorithm to learn relationships between sentences and emoticon areas, and proposed a system to automatically generate emoticons in a natural language dialogue system. –Urabe et al. (2013) proposed an emoticon recommendation system that automatically suggests emoticons based on users’ emotional expressions in Japanese sentences. Although emoticon recommendation is not a focus in our system, the idea of exploiting semantic areas of emoticons is applied to our system.

11 2.4 Extraction and Classification Various statistical or machine learning-based methods have been used to identify emoticons from text and classify emoticons based on underlying emotions. –Tanaka et al. (2005) used kernel methods for emoticon classification. –Yamada et al. (2007) used statistics of n-grams on emoticon classification. –Ptaszynski et al. (2010) used a kinesics-based model for emoticon extraction and classification, using a large emoticon database from the Internet. However, the studies above were conducted in a Japanese context. Little work has been done to analyze emoticons in a Chinese context, where characters and symbols used by the Internet users may differ.

12 3. Research Gaps and Questions The following research gaps are identified: –Most prior studies have focused on simple emotion valence carried by emoticons. –Studies focusing a more refined classification of emoticons based on affect categories have been mostly conducted in a Japanese context. To address these gaps, we develop an emoticon analysis system to extract and classify text-based emoticons from Chinese health and fitness topics, and ask the following research questions: –How can we effectively extract emoticons from Chinese texts? –How accurately can our system classify emoticons into different affect categories?

13 4. Research Design Figure 1 shows our research framework which consists of four major components: –Kinesics representation –Affect mapping –Emoticon extraction –Emoticon classification

14 4.1 Kinesics Representation We first utilized 7 online Japanese emoticon dictionaries to construct a lexicon of emoticons. –These dictionaries have been used by prior emoticon research (Ptaszynski, 2010). –Altogether 11,988 emoticons were collected. Emoticons in these websites are already categorized with a tag, e.g., “smile” or “cry.” We observed that a large proportion of collected emoticons contained Japanese Kana, which is a part of Japanese language and is rarely seen in Chinese websites. As shown in Table 1, we truncated Kana from raw emoticons, since they are actually not essential parts of “emoticons.”

15 4.1 Kinesics Representation To segment emoticons into semantic parts that represent different functional components, we adopted the kinesics model in Ptaszynski et al. (2010), and represented each emoticon using the following nine- component model: Table 2 shows some examples of the kinesics representation of emoticons. Some components in the model can be empty, except for the {E L ME R } triplet. This eye-mouth-eye triplet is the essential component of any emoticon.

16 4.2 Affect Mapping In order to fit into Chinese language context, we adopted the 7-type emotional categorization schema suggested by Chen (2009). –The schema classifies emotions as happiness, sadness, fear, disgust, anger, surprise, and love. Since the original tags in the online dictionaries were often redundant and/or ambiguous, we manually mapped all the tags onto one of the 7 categories. As shown in Figure 2, “Happiness” was the most frequently expressed emotion by Internet users. Positive emoticons (happiness and love) accounted for 51%, while negative ones accounted for 49%, which indicated a balanced distribution of emotion valence.

17 4.2 Affect Mapping Furthermore, we separately mapped the {E L ME R } triplets and {S 1 } {S 2 } {S 3 } {S 4 } {E L } {M} {E R } components onto affect categories, because the same component could be used to express different emotions, and new emoticons may be created using these components. –An example is shown in the table below. –In this case, triplet “ ﾟ Д ﾟ ” would be assigned “Happiness (1), Sadness (1), Anger (1), Surprise (1)” as its affect category. –Affect category frequency is accordingly calculated. (Triplet)HappinessSadnessAngerSurprise ﾟДﾟﾟДﾟ ( ｏﾟ Д ﾟｏ )(ll ﾟ д ﾟ )(# ﾟ Д ﾟ )Σ( ﾟ Д ﾟﾉ )

read(input) output = emoticon_lexicon.find(input) if output != null return output end if// search the emoticon lexicon triplet = triplet_lexicon.find(input) // search the triplet lexicon if triplet == null do (EL, M, ER) = individual_kinesics_component_lexicon.find(input) triplet = EL + M + ER until input.contains(triplet) or lexicon is traversalled end if// if both emoticon and triplet lexicons failed, search the individual component lexicons and try to constitute a triplet (B1, B2) = localize_borders(input) regex = “(.*)“ + B1 + “(.*)“ + triplet + “(.*)“ + B2 + “(.*)“ (S1, S2, S3, S4) = regex.match(input) // use a regular expression to extract the remaining components return output 18 4.3 Emoticon Extraction

// triplet, S1 to S4 are inherited from the previous extraction procedure (happiness[0], sadness[0], fear[0], disgust[0], anger[0], surprise[0], love[0]) = affect_category_frequency_for_triplets(triplet) for i = 1 to 4 (happiness[i], sadness[i], fear[i], disgust[i], anger[i], surprise[i], love[i]) = affect_category_frequency_for_si(s[i]) end for// read the affect category frequencies for triplet and S1 to S4 largest_frequency = max(happiness[0~4], sadness[0~4], fear[0~4], disgust[0~4], anger[0~4], surprise[0~4], love[0~4]) for i = 0 to 4 if largest_frequency == happiness[i] return "happiness" else if... (omitted) else if largest_frequency == love[i] return "love" end if end for // get the largest frequency value, and infer the affect category for the entire emoticon 19 4.4 Emoticon Classification

20 4.4 Emoticon Classification For example, when dealing with emoticon “o(* ￣▽￣ *)o”, first it is segmented to semantic components: Triplet: “ ￣▽￣ ”, S 1 : “o”, S 2 : “*”, S 3 : “*”, S 4 : “o” Then the affect category frequencies for these components are fetched from the database: PositionHappi- ness SadnessFearDisgustAngerSurpriseLove ￣▽￣ Triplet0.640.03 00.010.030.26 oS1S1 0.380.160.040.020.180.010.21 *S2S2 0.360.060.160.010.050.040.32 *S3S3 0.370.070.160.010.050.040.30 oS4S4 0.490.110.040.010.130.030.19

21 5. Test-bed and Evaluations Research test-bed System evaluation on extraction System evaluation on classification

22 5.1 Research Test-bed We also collected comments and replies belonging to food- and sports-related topics from two large, well-known Chinese video sites (bilibili.tv and acfun.tv). –Users upload their self-made videos onto these websites. –They are popular among Chinese youngsters. –We expected that a corpus with relatively rich emoticons could be obtained. In total, 1,003,244 comments were collected.

23 5.1 Research Test-bed We manually annotated 2,000 comments, among which 985 comments contained emoticons. To create a test dataset for evaluation, if a comment contained emoticons, we manually tagged it with one of the 7-type affect categories. –Otherwise, it was annotated as not containing emoticons. *manual data collection was used to complement automatic data crawling whenever needed. Comment examplesEnglish translationTag 现在跳到这里还没有喘气受不了的感觉，运动锻炼万岁 ~\( ≧▽≦ )/~ Danced so far without being short of breath. Hooray sports and exercise! ~\( ≧▽≦ )/~ Happy 第一次完整做完妈蛋明天跑不动了怎么办 Finished the full course the first time. What should I do if have no strength to run tomorrow None

24 5.2 Evaluation of Emoticon Extraction We evaluated the performance of our system on emoticon extraction by comparing the results of our system with the manual annotations on the test dataset. The 95.0% F-measure indicated that our system was more effective in extracting emoticons from Chinese text, compared to the 86.1% F-measure in Tanaka et al. (2005) PrecisionRecallF-measure 99.4%90.9%95.0%

25 5.2 Evaluation of Emoticon Extraction The high precision rate indicates that emoticons often use very unique characters that are otherwise unused. –The remaining 0.6% indicates that, however, there exists some entries in the lexicon that are sometimes not used as emoticons. –E.g. “00” and “==”, could be mis-interpreted as two eyes, while they are actually used in “100” or as an equal sign. It also indicates that the system may be weak in dealing with some emoticons, especially visually similar but different characters. –E.g. “` ∀ `” vs. “´ ∀ `” and “^ ◯ ^” vs “^O^”

26 5.3 Evaluation on Classification We compared our system with two benchmark classification systems used in prior research: –Benchmark 1: classification based on the unigram model, implemented by Yamada et al. (2007). Each character was considered regardless of the position it appeared. For example, P(“T_T”|”happiness”) = P(“T”|”happiness”) * P(“_”|”happiness”) * P(“T”|”happiness”)

27 5.3 Evaluation on Classification We compared our system with two benchmark classification systems used in prior research: –Benchmark 2: classification based on only {E L ME R } triplets, implemented by Ptaszynski et al. (2010). This benchmark system did not consider kinesics components {S 1 }{S 2 }{S 3 }{S 4 } whenever a triplet was found. For example, when dealing with “Σ( ﾟ Д ﾟﾉ )”, the system reports the category frequency of triplet “ ﾟ Д ﾟ ” to substitute the whole emoticon, without considering components “Σ” and “ ﾉ ”.

28 5.3 Evaluation on Classification The classification procedure only deals with the extracted emoticon itself, without considering the surrounding text. We have two sets to do the classification evaluation: –Lexicon set of 11,988 emoticons –Test set of 985 emoticons Each item in both sets belongs to a real affect category (tagged or annotated). Each system infers the affect category of an item, then compares to its real affect category. Accuracy is calculated based on the proportion of correct inference. Real categoryBenchmark 1Benchmark 2Our system (# ﾟ Д ﾟ ) AngerHappinessSurpriseAnger o( ｀ Д´*)o AngerHappinessAnger (T_T)Sadness

29 5.3 Evaluation on Classification The emoticon classification tasks were performed on both the lexicon dataset and the manually annotated test dataset. Paired t-tests were conducted to compare the classification accuracy. Accuracy Benchmark 1Benchmark 2Our system Lexicon dataset49.6%69.5%88.5% Test dataset40.3%62.0%73.1% T-test value (H 0 : Our system = benchmark) Lexicon dataset57.98 ** 12.00 ** - Test dataset6.18 ** 2.42 * - * : p-value < 0.05; ** : p-value < 0.01

30 5.3 Evaluation on Classification The results showed that our system significantly outperformed the two benchmarks, in both datasets. The relatively low accuracy for benchmark 1 suggests that a pure unigram algorithm is insufficient for describing the structure of emoticons. –E.g., “o” in “o_o” and “^o^” are at different positions (eye vs. mouth), and thus should express different affects. The first one is a surprised face, whereas the second one is a happy face. The relatively low accuracy for benchmark 2 suggests that additional kinesics components {S 1 }, {S 2 }, {S 3 }, and {S 4 } should be taken into consideration in addition to triplets. –E.g., “Σ( ﾟ Д ﾟﾉ )” is a surprised face, “(# ﾟ Д ﾟ )” is an angry face, and “ ヾ ( ﾟ Д ﾟ ;)” is a disgusted face. The triplets in these emoticons are identical (“ ﾟ Д ﾟ ”), thus the additional components “Σ”, “ ﾉ ”, “#”, “ ヾ ”, and “;” account for their affect inclinations.

31 6. Conclusions and Future Directions In this study, we used a novel kinesics model with affect mapping to perform emoticon analysis in Chinese health and fitness topics. We developed an emoticon analysis system to extract and classify emoticons, which may shed light on better understanding on people's feelings and moods hidden beneath the text. –Comments in Chinese video websites were used for empirical tests.

32 6. Conclusions and Future Directions In this study, we examined only text-based emoticons, but did not include image-based emoticons which are also popular. –E.g., While requiring the use of computer image recognition technology, this may help resolve the system’s weakness in recognizing visually similar characters. –E.g., “^ ◯ ^” vs “^O^” More advanced machine learning methods can also be implemented after a deeper understanding of the structure of emoticons.

33 References Cao, Z., & Ye, J. (2009, November). Attention Savings and Emoticons Usage in BBS. In Proceedings of the Fourth International Conference on Computer Sciences and Conver-gence Information Technology (ICCIT'09) (pp. 416-419). IEEE. Chen J. (2009). The Construction and Application of Chinese Emotion Word Ontology. Master's Thesis, Dalian University of Technology, China. Chiu, K. C. (2007). Explorations in the Effect of Emoticon on Negotiation Process from the Aspect of Communication. Master's Thesis, Department of Information Management, National Sun Yat-sen University, Taiwan. Derks, D., Bos, A. E., & Grumbkow, J. V. (2007). Emoticons and Social Interaction on the Internet: the Importance of Social Context. Computers in Human Behavior, 23(1), 842-849. Ekman, P. (1999). Basic Emotions. Handbook of Cognition and Emotion, 98, 45-60. Face-mark Party: http://www.facemark.jp/facemark.htm Jia, S., Di, S., & Fan. T. (2013). Text Sentiment Analysis Model Based on Emoticons and Emotional Words. Journal of the Hebei Academy of Sciences, 30(2), 11-15.

34 References Kaomoji-café: http://kaomojicafe.jp/ Kaomoji Paradise: http://kaopara.net/ Kaomoji Station: http://kaosute.net/jisyo/kanjou.shtml Kaomojisyo: http://matsucon.net/material/dic/ Kaomoji-toshokan: http://www.kaomoji.com/kao/text/ Kaomojiya: http://kaomojiya.com/ Nakamura, J., Ikeda, T., Inui, N., & Kotani, Y. (2003, October). Learning Face Marks for Natural Language Dialogue Systems. In Proceedings of International Conference on Natu-ral Language Processing and Knowledge Engineering (pp. 180-185). IEEE. Poongodi, S., & Radha, N. (2013). Classification of User Opinions from Tweets Using Machine Learning Techniques. International Journal of Advanced Research in Computer Science and Software Engineering, 3(9). Ptaszynski, M., Maciejewski, J., Dybala, P., Rzepka, R., & Araki, K. (2010). CAO: A Ful-ly Automatic Emoticon Analysis System Based on Theory of Kinesics. IEEE Transactions on Affective Computing, 1(1), 46-59.

35 References Read, J. (2005, June). Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification. In Proceedings of the ACL Student Research Workshop (pp. 43-48). Association for Computational Linguistics. Rezabek, L. L., & Cochenour, J. J. (1998). Visual Cues in Computer-Mediated Communi-cation: Supplementing Text with Emoticons. Journal of Visual Literacy, 18(2). Suzuki, N., & Tsuda, K. (2006, January). Express Emoticons Choice Method for Smooth Communication of E-business. In Knowledge-Based Intelligent Information and Engineer-ing Systems (pp. 296-302). Springer Berlin Heidelberg. Tanaka, Y., Takamura, H., & Okumura, M. (2005, January). Extraction and Classification of Facemarks. In Proceedings of the 10th International Conference on Intelligent User In-terfaces (pp. 28-34). ACM. Urabe, Y., Rafal, R., & Araki, K. (2013, September). Emoticon Recommendation for Jap-anese Computer-Mediated Communication. In Proceedings of the Seventh International Conference on Semantic Computing (ICSC) (pp. 25-31). IEEE.

36 References Walther, J. B., & D’Addario, K. P. (2001). The Impacts of Emoticons on Message Inter-pretation in Computer-mediated Communication. Social Science Computer Review, 19(3), 324-347. Wolf, A. (2000). Emotional Expression Online: Gender Differences in Emoticon Use. Cy-berPsychology & Behavior, 3(5), 827-833. Yamada, T., Tsuchiya, S., Kuroiwa, S., & Ren, F. (2007, August). Classification of Face-marks Using N-gram. In Proceedings of International Conference on Natural Language Processing and Knowledge Engineering (pp. 322-327). IEEE. Yang, C., Lin, K. H., & Chen, H. H. (2007, November). Emotion Classification Using Web Blog Corpora. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (pp. 275-278). IEEE.

37 Thank you! Q & A

Emoticon Analysis for Chinese Health and Fitness Topics Shuo Yu 1, Hongyi Zhu 1, Shan Jiang 2, Hsinchun Chen 1,2 1 School of Economics and Management,

Similar presentations

Presentation on theme: "Emoticon Analysis for Chinese Health and Fitness Topics Shuo Yu 1, Hongyi Zhu 1, Shan Jiang 2, Hsinchun Chen 1,2 1 School of Economics and Management,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Emoticon Analysis for Chinese Health and Fitness Topics Shuo Yu 1, Hongyi Zhu 1, Shan Jiang 2, Hsinchun Chen 1,2 1 School of Economics and Management,

Similar presentations

Presentation on theme: "Emoticon Analysis for Chinese Health and Fitness Topics Shuo Yu 1, Hongyi Zhu 1, Shan Jiang 2, Hsinchun Chen 1,2 1 School of Economics and Management,"— Presentation transcript:

Similar presentations

About project

Feedback