Modeling Information Seeking Behavior in Social Media Eugene Agichtein Intelligent Information Access Lab (IRLab)

Slides:

Advertisements

Similar presentations

Beliefs & Biases in Web Search

Advertisements

Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:

1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.

SIGIR 2008 Yandong Liu, Jiang Bian, Eugene Agichtein from Emory & Georgia Tech University.

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.

Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Evaluating Search Engine

Finding High-Quality Content in Social Media chenwq 2011/11/26.

Modeling User Interactions in Social Media Eugene Agichtein Emory University.

Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.

Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.

2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering.

Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.

The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.

Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.

Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.

Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:

Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.

From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,

Modeling Information Seeking Behavior in Social Media Eugene Agichtein Emory University.

1 Natural Language Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)

Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.

Using LinkedIn to Build Business Presented by: Mandy Boyle SEO Manager.

Towards Inferring Searcher Intent Eugene Agichtein.

Kristina Lerman Aram Galstyan USC Information Sciences Institute Analysis of Social Voting Patterns on Digg.

User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.

Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.

Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.

Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.

Microblogs: Information and Social Network Huang Yuxin.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Online curriculum centre Faculty member training, April 2009.

Question Answering over Implicitly Structured Web Content

LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.

BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.

Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.

IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow.

Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,

Retroactive Answering of Search Queries Beverly Yang Glen Jeh.

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

Page  1 Social Media in Business Session 6 Agenda Guest Speaker: Nestor Portillo, Bill Dean Microsoft Customer Service and Support Social Media in Customer.

Post-Ranking query suggestion by diversifying search Chao Wang.

Learning User Behaviors for Advertisements Click Prediction Chieh-Jen Wang & Hsin-Hsi Chen National Taiwan University Taipei, Taiwan.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Get Connected through the My Oracle Support Community Lynn Pionkowski Sr Regional Support Advocate

2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.

Antisocial Behavior in Online Discussion Communities Authors: Justin Cheng, Cristian Danescu-Niculescu-Mizily, Jure Leskovec Presented by: Ananya Subburathinam.

Chapter 8: Web Analytics, Web Mining, and Social Analytics

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

By : Namesh Kher Big Data Insights – INFM 750

Proposal for Term Project

Search User Behavior: Expanding The Web Search Frontier

Introduction to IR Research

A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.

Presentation transcript:

Modeling Information Seeking Behavior in Social Media Eugene Agichtein Intelligent Information Access Lab (IRLab)

Eugene Agichtein, Emory University, IR Lab 2 Qi Guo (3 rd year Phd) Ablimit Aji (2 nd year PhD) Modeling information seeking behavior Web search and social media search Text and data mining for medical informatics and public health In collaboration with: - Beth Buffalo (Neurology) - Charlie Clarke (Waterloo) - Ernie Garcia (Radiology) - Phil Wolff (Psychology) - Hongyuan Zha (GaTech) 1 st year graduate students: Julia Kiseleva, Dmitry Lagun, Qiaoling Liu, Wang Yu Yandong Liu (2 nd year Phd)

Online Behavior and Interactions Eugene Agichtein, Emory University, IR Lab 3 Information sharing: blogs, forums, discussions Search logs: queries, clicks Client-side behavior: Gaze tracking, mouse movement, scrolling

Research Overview Eugene Agichtein, Emory University, IR Lab 4 4 Information sharing Health Informatics Cognitive Diagnostics Intelligent search Discover Models of Behavior (machine learning/data mining)

Key Challenges for Web Search Query interpretation (infer intent) Ranking (high dimensionality) Evaluation (system improvement) Result presentation (information visualization) Eugene Agichtein, Emory University, IR Lab 5

Contextualized Intent Inference SERP text Mouse trajectory, hovering/dynamics Scrolling Clicks Eugene Agichtein, Emory University, IR Lab 6

Research Intent Eugene Agichtein, Emory University, IR Lab 7

Purchase Intent Eugene Agichtein, Emory University, IR Lab 8

Relationship between behavior and intent? Search intent is contextualized within a search session Implication 1: model session-level state Implication 2: improve detection based on client- side interactions Eugene Agichtein, Emory University, IR Lab 9

Model: Linear Chain CRF Eugene Agichtein, Emory University, IR Lab 10

Results: Ad Click Prediction 200%+ precision improvement (within mission) Eugene Agichtein, Emory University, IR Lab 11

Research Overview Eugene Agichtein, Emory University, IR Lab 12 Information sharing Health Informatics Cognitive Diagnostics Intelligent search Discover Models of Behavior (machine learning/data mining)

Finding Information Online (Revisited) 13 Next generation of search: Algorithmically-mediated information exchange CQA (collaborative question answering): Realistic information exchange Searching archives Train NLP, IR, QA systems Study of social behavior, norms Content quality, asker satisfaction Current and future work

Goal: Hybrid Human-Powered Search 14

Talk Outline Overview of the Emory IR Lab  Intent-centric Web Search  Classifying intent of a query  Contextualized search intent detection 15 Eugene Agichtein, Emory University, IR Lab

16

(Text) Social Media Today Published: 4Gb/day Social Media: 10Gb/Day Technorati+Blogpulse 120M blogs 2M posts/day Twitter: since 11/07: 2M users 3M msgs/day Facebook/Myspace: M users Avg 19 m/day Yahoo Answers: 90M users, 20M questions, 400M answers [Data from Andrew Tomkins, SSM2008 Keynote] Yes, we could read your blog. Or, you could tell us about your day

18

19 Total time: 7-10 minutes, active “work”

Someone must know this…

21 +1 minute

+7 hours: perfect answer

Update (2/15/2009) 23

24

25

Finding Information Online (Revisited) 26 Next generation of search: Algorithmically-mediated information exchange CQA (collaborative question answering): Realistic information exchange Searching archives Train NLP, IR, QA systems Study of social behavior, norms Content quality, asker satisfaction Current and future work

(Some) Related Work Adamic et al., WWW 2007, WWW 2008: – Expertise sharing, network structure Elsas et al., SIGIR 2008: – Blog search Glance et al.: – Blog Pulse, popularity, information sharing Harper et al., CHI 2008, 2009: – Answer quality across multiple CQA sites Kraut et al.: – community participation Kumar et al., WWW 2004, KDD 2008, …: – Information diffusion in blogspace, network evolution SIGIR 2009 Workshop on Searching Social Media 27

Finding High Quality Content in SM Well-written Interesting Relevant (answer) Factually correct Popular? Provocative? Useful? 28 As judged by professional editors E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne, Finding High Quality Content in Social Media, in WSDM 2008

Social Media Content Quality 29 E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, WSDM 2008 quality

30 30

31 How do Question and Answer Quality relate?

32 32

33 33

34 34

35 35

Community36

Link Analysis for Authority Estimation 37 Question 1 Question 2 Answer 5 Answer 1 Answer 2 Answer 4 Answer 3 User 1 User 2 User 3 User 6 User 4 User 5 Answer 6 Question 3 User 1 User 2 User 3 User 6 User 4 User 5 Hub (asker) Authority (answerer)

Qualitative Observations HITS effective   HITS ineffective 38

39 39 Random forest classifier

Result 1: Identifying High Quality Questions 40

Top Features for Question Classification Asker popularity (“stars”) Punctuation density Question category Page views KL Divergence from reference LM 41

Identifying High Quality Answers 42

Top Features for Answer Classification Answer length Community ratings Answerer reputation Word overlap Kincaid readability score 43

Finding Information Online (Revisited) 44 Next generation of search: human-machine-human CQA: a case study in complex IR Content quality Asker satisfaction Understanding the interactions

Dimensions of “Quality” Well-written Interesting Relevant (answer) Factually correct Popular? Timely? Provocative? Useful? 45 As judged by the asker (or community)

Are Editor Labels “Meaningful” for CGC? Information seeking process: want to find useful information about topic with incomplete knowledge – N. Belkin: “Anomalous states of knowledge” Want to model directly if user found satisfactory information Specific (amenable) case: CQA

Yahoo! Answers: The Good News Active community of millions of users in many countries and languages Effective for subjective information needs – Great forum for socialization/chat Can be invaluable for hard-to-find information not available on the web 47

48

Yahoo! Answers: The Bad News49 May have to wait a long time to get a satisfactory answer May never obtain a satisfying answer 1. FIFA World Cup 2. Optical 3. Poetry 4. Football (American) 5. Soccer 6. Medicine 7. Winter Sports 8. Special Education 9. General Health Care 10. Outdoor Recreation Time to close a question (hours)

Predicting Asker Satisfaction Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. – “Satisfied” : The asker has closed the question AND Selected the best answer AND Rated best answer >= 3 “stars” (# not important) – Else, “Unsatisfied 50 Yandong Liu Jiang Bian Y. Liu, J. Bian, and E. Agichtein, in SIGIR 2008

51 ASP: Asker Satisfaction Prediction asker is satisfied asker is not satisfied Text Category Answerer History Asker History Answer Question Wikipedia News Classifier

52 Experimental Setup: Data QuestionsAnswersAskersCategories% Satisfied 216,1701,963,615158, % Crawled from Yahoo! Answers in early 2008 “Anonymized” dataset available at: 1/2009: Yahoo! Webscope : “Comprehensive” Answers dataset: ~5M questions & answers.

Satisfaction by Topic TopicQuestionsAnswersA per QSatisfiedAsker rating Time to close by asker 2006 FIFA World Cup , % minutes Mental Health % days Mathematics % minutes Diet & Fitness % days 53

54 Satisfaction Prediction: Human Judges Truth: asker’s rating A random sample of 130 questions Researchers – Agreement: 0.82 F1: 0.45  2P*R/(P+R) Amazon Mechanical Turk – Five workers per question. – Agreement: 0.9 F1: 0.61 – Best when at least 4 out of 5 raters agree

Performance: ASP vs. Humans (F1, Satisfied) ClassifierWith TextWithout TextSelected Features ASP_SVM ASP_C ASP_RandomForest ASP_Boosting0.67 ASP_NB Best Human Perf0.61 Baseline (random) ASP is significantly more effective than humans Human F1 is lower than the random baseline!

Top Features by Information Gain 0.14 Q: Askers’ previous rating 0.14 Q: Average past rating by asker 0.10 UH: Member since (interval) 0.05 UH: Average # answers for by past Q 0.05 UH: Previous Q resolved for the asker 0.04 CA: Average asker rating for category 0.04 UH: Total number of answers received … 56

57 “Offline” vs. “Online” Prediction Offline prediction (AFTER answers arrive) – All features( question, answer, asker & category) – F1: 0.77 Online prediction (BEFORE question posted) – NO answer features – Only asker history and question features (stars, #comments, sum of votes…) – F1: 0.74

Personalized Prediction of Satisfaction Same information != same usefulness for different searchers! Personalization vs. “Groupization”? 58 Y. Liu and E. Agichtein, You've Got Answers: Personalized Models for Predicting Success in Community Question Answering, ACL 2008

Example Personalized Models 59

Outline 60 Next generation of search: Algorithmically mediated information exchange CQA: a case study in complex IR Content quality Asker satisfaction

Current Work (in Progress) Partially supervised models of expertise (Bian et al., WWW 2009) Real-time CQA Sentiment, temporal sensitivity analysis Understanding Social Media dynamics

Answer Arrival 62

Exponential Decay Model [Lerman 2007]

Factors Influencing Dynamics

Example: Answer Arrival | Category

Subjectivity

Answer, Rating Arrival

Preliminary Results: Modeling SM Dynamics for Real-Time Classification Adapt SM dynamics models to classification e.g.: predict ratings  feature value:

Outline 69 Next generation of search: Algorithmically mediated information exchange CQA: a case study in complex IR Content quality Asker satisfaction Understanding social media dynamics

Question Urgency Eugene Agichtein, Emory University, IR Lab 70 Problem – a growing volume of questions competing for visibility Time-sensitive (urgent) questions pushed out by newer questions Delayed responses may become useless to seeker – wastes site resources and responders’ time

Goal: Query Processing over Web and Social Systems 71

Takeaways Robust machine learning over behavior data  system improvements, insights into behavior Contextualized models for NLP and text mining  system improvements, insights into interactions Mining social media: potential for transformative impact for IR, sociology, psychology, medical informatics, public health, … 72

References Modeling web search behavior [SIGIR 2006, 2007] Estimating content quality [WSDM 2008] Estimating contributor authority [CIKM 2007] Searching CQA archives [WWW 2008, WWW 2009] Inferring asker intent [EMNLP 2008] Predicting satisfaction [SIGIR 2008, ACL 2008, TKDE] Coping with spam [AIRWeb 2008] More information, datasets, papers, slides:

Thank you! Yandex (for hosting my visit) Eugene Agichtein, Emory University, IR Lab 74 Supported by: