Modeling User Interactions in Social Media Eugene Agichtein Emory University.

Slides:

Advertisements

Similar presentations

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Advertisements

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.

What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha*, Yan Li †, Lujo Bauer* *Carnegie Mellon.

SIGIR 2008 Yandong Liu, Jiang Bian, Eugene Agichtein from Emory & Georgia Tech University.

Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Finding High-Quality Content in Social Media chenwq 2011/11/26.

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Online Search Evaluation with Interleaving Filip Radlinski Microsoft.

A Social Help Engine for Online Social Network Mobile Users Tam Vu, Akash Baid WINLAB, Rutgers University May 21,

Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.

Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.

Modeling Information Seeking Behavior in Social Media Eugene Agichtein Emory University.

1 Natural Language Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.

Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.

Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

Modeling Information Seeking Behavior in Social Media Eugene Agichtein Intelligent Information Access Lab (IRLab)

Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.

Universit at Dortmund, LS VIII

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Question Answering over Implicitly Structured Web Content

LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.

IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow.

Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15

Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.

Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.

Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Post-Ranking query suggestion by diversifying search Chao Wang.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

Towards a Personal Briefing Assistant

A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.

Presentation transcript:

Modeling User Interactions in Social Media Eugene Agichtein Emory University

Outline User-generated content Community Question Answering Contributor authority Content quality Asker satisfaction Open problems

3

Trends in search and social media Search in the East: – Heavily influenced by social media: Naver, Baidu Knows, TaskCn,.. Search in the West: – Social media mostly indexed/integrated in search repositories Two opposite trends in social media search: – Moving towards point relevance (answers, knowledge search) – Moving towards browsing experience, subscription/push model How to integrate “active” engagement and contribution with “passive” viewing of content?

Social Media Today Published: 4Gb/day Social Media: 10Gb/Day Page views: Gb/day Technorati+Blogpulse ~120M blogs ~2M posts/day Twitter: since 11/07: ~2M users ~3M msgs/day Facebook/Myspace: M users Average 19 min/day Yahoo Answers 90M users, ~20M questions, ~400M answers [From Andrew Tomkins/Yahoo!, SSM2008 Keynote]

People Helping People Naver: popularity reportedly exceeds web search Yahoo! Answers: some users answer thousands of questions daily – And get a t-shirt Open, “quirky”, information shared, not “sold” Unlike Wikipedia: – Chatty threads: opinions, support, validation – No core group of moderators to enforce “quality” 6

Where is the nearest car rental to Carnegie Mellon University?

8

9

10 Successful Search Give up on “magic”. Lookup CMU address/zipcode Google maps  Query: “car rental near:5000 Forbes Avenue Pittsburgh, PA 15213”

11 Total time: 7-10 minutes, active “work”

Someone must know this…

13 +0 minutes : 11pm

14

15

16 +1 minute

minutes

+7 hours: perfect answer

Why would one wait hours? Rational thinking: effective use of time Unique information need Subjective/normative question Complex Human contact/community Multiple viewpoints

20

21 Challenges in ____ing Social Media Estimating contributor expertise Estimating content quality Infering user intent Predicting satisfaction: general, personalized Matching askers with answerers Searching archives Detecting spam

22 Work done in collaboration with: Qi Guo Yandong Liu Abulimiti Aji Thanks: Prof. Hongyuan Zha Jiang Bian Yahoo! Research: ChaTo Castillo, Gilad Mishne, Aris Gionis, Debora Donato, Ravi Kumar Pawel Jurczyk

Related Work Adamic et al., WWW 2007, WWW 2008 – Expertise sharing, network structure Kumar et al.: Info diffusion in blogspace Harper et al., CHI 2008: Answer quality Lescovec et al: Cascades, preferential attachment models Glance & Hurst: Blogging Kraut et al.: community participation and retention SSM 2008 Workshop (Searching Social Media) Elsas et al, blog search, ICWSM 2008s 23

24 Estimating Contributor Authority Question 1 Question 2 Answer 5 Answer 1 Answer 2 Answer 4 Answer 3 User 1 User 2 User 3 User 6 User 4 User 5 Answer 6 Question 3 User 1 User 2 User 3 User 6 User 4 User 5 P. Jurczyk and E. Agichtein, Discovering Authorities in Question Answer Communities Using Link Analysis (poster), CIKM 2007 Hub (asker) Authority (answerer)

25 Finding Authorities: Results

26 Qualitative Observations HITS effective   HITS ineffective

27 Trolls

28 Estimating Content Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, WSDM 2008

29 29

30 30

31 31

32 32

33 33Community

34 34

35 35

36

37 37

38 from all subsets, as follows: UQV Average number of "stars" to questions by the same asker. ; The punctuation density in the question's subject. ; The question's category (assigned by the asker). ; \Normalized Clickthrough:" The number of clicks on the question thread, normalized by the average number of clicks for all questions in its category. UAV Average number of "Thumbs up" received by answers written by the asker of the current question. ; Number of words per sentence. UA Average number of answers with references (URLs) given by the asker of the current question. UQ Fraction of questions asked by the asker in which he opens the question's answers to voting (instead of pick- ing the best answer by hand). UQ Average length of the questions by the asker. UAV The number of \best answers" authored by the user. U The number of days the user was active in the system. UAV \Thumbs up" received by the answers wrote by the asker of the current question, minus \thumbs down", divided by total number of \thumbs" received. ; \Clicks over Views:" The number of clicks on a question thread divided by the number of times the question thread was retrieved as a search result (see [2]). ; The KL-divergence between the question's language model and a model estimated from a collection of question answered by the Yahoo editorial team (available in

39 39

40 ; Answer length. ; The number of words in the answer with a corpus frequency larger than c. UAV The number of \thumbs up" minus \thumbs down" received by the answerer, divided by the total number of \thumbs" s/he has received. ; The entropy of the trigram character-level model of the answer. UAV The fraction of answers of the answerer that have been picked as best answers (either by the askers of such questions, or by a community voting). ; The unique number of words in the answer. U Average number of abuse reports received by the answerer over all his/her questions and answers. UAV Average number of abuse reports received by the answerer over his/her answers. ; The non-stopword word overlap between the question and the answer. ; The Kincaid [21] score of the answer. QUA The average number of answers received by the questions asked by the asker of this answer. ; The ratio between the length of the question and the length of the answer. UAV The number of \thumbs up" minus \thumbs down" received by the answerer. QUAV The average numbers of \thumbs" received by the answers to other questions asked by the asker of this answer.

Rating Dynamics 41

42 42 Editorial Quality != Popularity != Usefulness

43 Yahoo! Answers: Time to Fulfillment FIFA World Cup 2. Optical 3. Poetry 4. Football (American) 5. Scottish Football (Soccer) Time to close a question (hours) for sample question categories Time to close (hours) 6. Medicine 7. Winter Sports 8. Special Education 9. General Health Care 10. Outdoor Recreation

44 Predicting Asker Satisfaction Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. – “Satisfied” : The asker has closed the question AND The asker has closed the question AND Selected the best answer AND Selected the best answer AND Rated best answer >= 3 “stars” Rated best answer >= 3 “stars” – Else, “Unsatisfied Yandong Liu Jiang Bian Y. Liu, J. Bian, and E. Agichtein, Predicting Information Seeker Satisfaction in Community Question Answering, in SIGIR 2008

45 Motivation Save time: don’t bother to post Suggest a good forum for information need Notify user when satisfactory answer contributed From “relevance” to information need fulfillment Explicit ratings from asker & community

46 ASP: Asker Satisfaction Prediction asker is satisfied asker is not satisfied Text Category Answerer History Asker History Answer Question Wikipedia News Classifier

47Datasets Crawled from Yahoo! Answers in early 2008 (Thanks, Yahoo!)QuestionsAnswersAskersCategories % Satisfied 216,1701,963,615158, % Available at

48 Dataset Statistics Category#Q#A #A per Q Satisfied Avg asker rating Time to close by asker 2006 FIFA World Cup(TM) % minutes Mental Health % day and 13 hours Mathematics % minutes Diet & Fitness % days Asker satisfaction varies by category #Q, #A, Time to close … -> Asker Satisfaction

49 Satisfaction Prediction: Human Judges Truth: asker’s rating Truth: asker’s rating A random sample of 130 questions A random sample of 130 questions Researchers Researchers – Agreement: 0.82 F1: 0.45 Amazon Mechanical Turk Amazon Mechanical Turk – Five workers per question. – Agreement: 0.9 F1: – Best when at least 4 out of 5 raters agree

50 ASP vs. Humans (F1) ClassifierWith TextWithout TextSelected Features ASP_SVM ASP_C ASP_RandomForest ASP_Boosting0.67 ASP_NB Best Human Perf0.61 Baseline (naïve)0.66 ASP is significantly more effective than humans Human F1 is lower than the na ï ve baseline!

51 Features by Information Gain Q: Askers’ previous rating Q: Average past rating by asker UH: Member since (interval) UH: Average # answers for by past Q UH: Previous Q resolved for the asker CA: Average rating for the category UH: Total number of answers received CA: Average voter rating Q: Question posting time CA: Average # answers per Q

52 “Offline” vs. “Online” Prediction Offline prediction: – All features( question, answer, asker & category) – F1: 0.77 Online prediction: – NO answer features – Only asker history and question features (stars, #comments, sum of votes…) – F1: 0.74

53 Feature Ablation PrecisionRecallF1 Selected features No question-answer features No answerer features No category features No asker features No question features Asker & Question features are most important. Answer quality/Answerer expertise/Category characteristics: may not be important caring or supportive answers often preferred

54 54 Satisfaction: varying by asker experience Group together questions from askers with the same number of previous questions Accuracy of prediction increase dramatically Reaching F1 of 0.9 for askers with >= 5 questions

55 Personalized Prediction of Asker Satisfaction with info Same information != same usefulness for different users! Personalized classifier achieves surprisingly good accuracy (even with just 1 previous question!) Simple strategy of grouping users by number of previous questions is more effective than other methods for users with moderate amount of history For users with >= 20 questions, textual features are more significant

56 Some Personalized Models

57 Satisfaction Prediction When Grouping Users by “Age”

58 Self-Selection: First Experience Crucial Days as member vs. rating # prev questions vs. rating

59 Summary Asker satisfaction is predictable Asker satisfaction is predictable – Can achieve higher than human accuracy by exploiting interaction history User’s experience is important User’s experience is important General model: one-size-fits-all General model: one-size-fits-all – 2000 questions for training model are enough Personalized satisfaction prediction: Personalized satisfaction prediction: – Helps with sufficient data (>= 1 prev interactions, can observe text patterns with >=20 prev. interactions)

Problems Sparsity: most users post only a single question Cold start problem CF: individualize content, no (visible) rating history – C.f: Digg: ratings are public Subjective information needs 60

61

62 Subjectivity in CQA How can we exploit structure of CQA to improve question classification? Case Study: Question Subjectivity Prediction – Subjective: Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting? – Objective: What is the difference between chemotherapy and radiation treatments? 62 B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation, in EMNLP 2008

63 Dataset Statistics (~1000 questions) Objective Subjective

64 Key Observations Analysis of real questions in CQA is challenging: – Typically complex and subjective – Can be ill-phrased and vague – Not enough annotated data Idea: – Can we utilize the inherent structure of the CQA interactions, and use unlabeled CQA data to improve classification performance? 64

65 Natural Approach: Co-Training Introduced in: – Combining labeled and unlabeled data with co- training, Blum and Mitchell, 1998 Two views of the data – E.g.: content and hyperlinks in web pages Provide complementary information Iteratively construct additional labeled data 65

66 Questions and Answers: Two Views Example: – Q: Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting? – A: My mom has one as she is diabetic so its important for her to monitor it she finds it useful. Answers usually match/fit question – My mom… she finds… Askers can usually identify matching answers by selecting the “best answer” 66

67 CoCQA: A Co-Training Framework over Questions and Answers 67 Labeled Data CQCQ CQCQ CACA CACA Q A Unlabeled Data ?????????? Unlabeled Data ?????????? Q A Unlabeled Data ?????????? Unlabeled Data ?????????? Validation (Holdout training data) Validation (Holdout training data) Classify Stop

68 Results Summary Features FeaturesMethod QuestionQuestion+ Best Answer Supervised GE (-0.7%) (+3.2%) CoCQA (+1.9%) (+7.2%) 68

69 CoCQA for varying amount of labeled data 69

70 Summary User-generated Content – Growing – Important: impact on main-stream media, scholarly publishing, … – Can provide insight into information seeking and social processes – “Training” data for IR, machine learning, NLP, …. – Need to re-think quality, impact, usefulness

71 Current work Intelligently route a question to ``good’’ answerers Improve web search ranking by incorporating CQA data ``Cost’’ models for CQA-based question processing vs. other methods Dynamics of User Feedbacks Discourse analysis Discourse analysis

72 Takeaways People specify their information need fully when they know humans are on the other end Next generation of search must be able to cope with complex, subjective, and personal information needs To move beyond relevance, must be able to model user satisfaction CQA generates rich data to allow us (and other researchers) to study user satisfaction, interactions, intent for real users

Estimating contributor expertise [CIKM 2007] Estimating content quality [WSDM 2008] Inferring asker intent [EMNLP 2008] Predicting satisfaction [SIGIR 2008, ACL 2008] Matching askers with answerers Searching CQA archives [WWW 2008] Coping with spam [AIRWeb 2008] Thank you!

Backup Slides

75 75 Question-Answer Features Q: length, posting time… QA: length, KL divergence Q:Votes Q:Terms

76 76 User Features U: Member since U: Total points U: #Questions U: #Answers

77 77 Category Features CA: Average time to close a question CA: Average time to close a question CA: Average # answers per question CA: Average # answers per question CA: Average asker rating CA: Average asker rating CA: Average voter rating CA: Average voter rating CA: Average # questions per hour CA: Average # questions per hour CA: Average # answers per hour CA: Average # answers per hour Category#Q#A #A per Q Satisfied Avg asker rating Time to close by asker General Health % day and 13 hours

Backup slides

79 79 Prediction Methods Heuristic: # answers Heuristic: # answers Baseline: guess the majority class (satisfied). Baseline: guess the majority class (satisfied). ASP: (our system) ASP: (our system) ASP_SVM: Our system with the SVM classifier ASP_SVM: Our system with the SVM classifier ASP_C4.5: with the C4.5 classifier ASP_C4.5: with the C4.5 classifier ASP_RandomForest: with the RandomForest classifier ASP_RandomForest: with the RandomForest classifier ASP_Boosting: with the AdaBoost algorithm combining weak learners ASP_Boosting: with the AdaBoost algorithm combining weak learners ASP_NaiveBayes: with the Naive Bayes classifier ASP_NaiveBayes: with the Naive Bayes classifier …

80 80 Satisfaction Prediction: Human Perf (Cont’d): Amazon Mechanical Turk Methodology Methodology – Used the same 130 questions – For each question, list the best answer, as well as other four answers ordered by votes – Five independent raters for each question. – Agreement: 0.9 F1: – Best accuracy achieved when at least 4 out of 5 raters predicted asker to be ‘satisfied’ (otherwise, labeled as “unsatisfied”).

81 81 Some Results

82 Details of CoCQA implementation Base classifier – LibSVM Term Frequency as Term Weight – Also tried Binary, TF*IDF Select top K examples with highest confidence – Margin value in SVM 82

83 Feature Set Character 3-grams – has, any, nyo, yon, one… Words – Has, anyone, got, mom, she, finds… Word with Character 3-grams Word n-grams (n<=3, i.e. W i, W i W i+1, W i W i+1 W i+2 ) – Has anyone got, anyone got one, she finds it… Word and POS n-gram (n<=3, i.e. W i, W i W i+1, W i POS i+1, POS i W i+1, POS i POS i+1, etc.) – NP VBP, She PRP, VBP finds… 83