How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
2015 SLA IT Webinar Using Analytics to Understand Social Media Activity Michelle Chen School of Information San José State University February 4 th, 2015.
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany.
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
Cristian Danescu-Niculescu-Mizil 1, Gueorgi Kossinets 2, Jon Kleinberg 1, Lillian Lee 1 1 Dept. of Computer Science, Cornell University, 2 Google Inc.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Existing tools to analyze Blogosphere. IceRocket Ice Spy – Spy on what others are searching. Blog Trends – Identifies the trend of particular terms in.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Commentary-based Video Categorization and Concept Discovery By Janice Leung.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
TwitterSearch : A Comparison of Microblog Search and Web Search
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
A Geographical Characterization of YouTube: a Latin American View Fernando Duarte, Fabrício Benevenuto, Virgílio Almeida, Jussara Almeida Federal University.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Ranking and Classifying Attractiveness of Photos in Folksonomies Jose San Pedro and Stefan Siersdorfer University of Sheffield, L3S Research Center WWW.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
1 Meet the Blog & Feed Search Engines Robyn DeuPree Director Product Management, Bloglines & Ask.com August 2006 | SES San Jose, CA.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Poorva Potdar Sentiment and Textual analysis of Create-Debate data EECS 595 – End Term Project.
Software Quality in Use Characteristic Mining from Customer Reviews Warit Leopairote, Athasit Surarerks, Nakornthip Prompoon Department of Computer Engineering,
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
More than words: Social network’s text mining for consumer brand sentiments Expert Systems with Applications 40 (2013) 4241–4251 Mohamed M. Mostafa Reporter.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Information Organization: Overview
TED Talks – A Predictive Analysis Using Classification Algorithms
Information Organization: Overview
Introduction Dataset search
Presentation transcript:

How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San Pedro WWW’10 19 June 2015 Hyewon Lim

 Introduction  Data  Sentiment Analysis of Rated Comments  Predicting Comment Ratings  Comment Ratings and Polarizing YouTube Content  Category Dependencies of Ratings  Conclusion and Future Work Outline 2/28

 YouTube ‒ Traffic: >20% of the web total and 10% of the whole internet ‒ 60% of the videos watched on-line  Social tools on YouTube ‒ Filter relevant opinions ‒ Skip offensive or inappropriate comment Introduction 3/28

Can we predict the community feedback for comments? Is there a connection between sentiment and comment ratings? Can comment ratings be an indicator for polarizing content? Do comment ratings and sentiment depend on the topic of the discussed content? Introduction 4/28

 Introduction  Data  Sentiment Analysis of Rated Comments  Predicting Comment Ratings  Comment Ratings and Polarizing YouTube Content  Category Dependencies of Ratings  Conclusion and Future Work Outline 5/28

 Collect 756 keyword queries ‒ From Google’s Zeitgeist archive ( ) ‒ Remove inappropriate queries (e.g., “windows update”)  Collect information for each video (2009) ‒ The first 500 comments  With authors, timestamps, and comment ratings ‒ Metadata  Title, tags, category, description, upload date, and statistics ‒ Statistics: overall number of comments, views, and star ratings  Final size ‒ 67,290 videos ‒ About 6.1 million comments Data 6/28

Data 7/28

Data 8/28

 Introduction  Data  Sentiment Analysis of Rated Comments  Predicting Comment Ratings  Comment Ratings and Polarizing YouTube Content  Category Dependencies of Ratings  Conclusion and Future Work Outline 9/28

Do comment language and sentiment have an influence on comment rating?  WordNet ‒ Thesaurus containing textual descriptions of terms and relationships between terms  SentiWordNet ‒ A lexical resource built on top of WordNet ‒ A triple of senti values (pos, neg, obj)  e.g., good = (0.875, 0.0, 0.125), ill = (0.25, 0.375, 0.375) Sentiment Analysis of Rated Comments Vehicle Car Automobile 10/28

 SentiWordNet-based analysis of terms ‒ The terms corresponding to negatively rated comments towards higher negative sentivalue assignments Sentiment Analysis of Rated Comments 11/28

 Sentiment analysis of ratings ‒ Intuition  The choice of terms provoke strong reactions of approval or denial  therefore determine the final rating score Sentiment Analysis of Rated Comments 0-5 5Neg5Pos 0Dist 5 12/28

Sentiment Analysis of Rated Comments 13/28

 Introduction  Data  Sentiment Analysis of Rated Comments  Predicting Comment Ratings  Comment Ratings and Polarizing YouTube Content  Category Dependencies of Ratings  Conclusion and Future Work Outline 14/28

 Can we predict community acceptance? ‒ Categorize comments as likely to obtain a high overall rating or not  Term-based representations of comments  Support vector machine classification ‒ Consideration  Different levels of restrictiveness (distinct threshold) ‒ Above/below +2/-2, +5/-5, and +7/-7  Different amounts of randomly chosen training comments (accepted/unaccepted) ‒ T = 1000, 10000, 50000, Predicting Comment Ratings 15/28

Predicting Comment Ratings 16/28

 Introduction  Data  Sentiment Analysis of Rated Comments  Predicting Comment Ratings  Comment Ratings and Polarizing YouTube Content  Category Dependencies of Ratings  Conclusion and Future Work Outline 17/28

1.Variance of comment ratings as indicator for polarizing videos ‒ User evaluation  Sort top- and bottom-50 videos by their variance  Put 100 videos into random order  Evaluated by 5 users on a 3-point Likert scale ‒ 3: polarizing, 1: rather neutral, 2: in between ‒ Mean user rating for videos on top: / bottom: 1.25 ⇨ Polarizing videos tend to trigger more diverse comment rating behavior Comment Ratings and Polarizing YouTube Comment 18/28

2.Variance of comment ratings as indicator for polarizing topics ‒ 1,413 tags occurring in at least 50 videos ‒ User evaluation  Mean user rating for tags in the top-100: 1.53/ bottom-100: 1.16 ⇨ Tags corresponding to polarizing topics tend to be connected to more diverse comment rating behavior Comment Ratings and Polarizing YouTube Comment 19/28

 Introduction  Data  Sentiment Analysis of Rated Comments  Predicting Comment Ratings  Comment Ratings and Polarizing YouTube Content  Category Dependencies of Ratings  Conclusion and Future Work Outline 20/28

Category Dependencies of Ratings News & Politics Sports Science Comments? Discussions? Feedback? 21/28

 Classification Category Dependencies of Ratings 22/28

 Analysis of comment ratings for different categories ‒ Intuition  Some topics are more prone to generate intense discussions than others  Science video: a majority of 0-scored comments  Politics video: more negatively / Music video: more positively Category Dependencies of Ratings 23/28

 Analysis of comment ratings for different categories (cont.) ‒ Intuition  Some topics are more prone to generate intense discussions than others Category Dependencies of Ratings 24/28

 Analysis of comment ratings for different categories (cont.) ‒ Further analyze whether the rating score difference across categories was significant  One-way ANOVA / Games-Howell post hoc test Category Dependencies of Ratings 25/28

Category Dependencies of Ratings 26/28

 Introduction  Data  Sentiment Analysis of Rated Comments  Predicting Comment Ratings  Comment Ratings and Polarizing YouTube Content  Category Dependencies of Ratings  Conclusion and Future Work Outline 27/28

 In-depth analysis of YouTube comments ‒ Different aspects of comment ratings for the YouTube platform ‒ Automatically determining the community acceptance of comments ‒ Rating behavior can be often connected to polarizing topics and content  Future work ‒ Temporal aspects ‒ Additional stylistic and linguistic features ‒ User relationships ‒ Techniques for aggregating information obtained from comments and ratings  Application ‒ Comment search Conclusion and Future Work 28/28