Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.

Slides:

Advertisements

Similar presentations

Beliefs & Biases in Web Search

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

Optimizing search engines using clickthrough data

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.

SIGIR 2008 Yandong Liu, Jiang Bian, Eugene Agichtein from Emory & Georgia Tech University.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Evaluating Search Engine

Click Evidence Signals and Tasks Vishwa Vinay Microsoft Research, Cambridge.

Search Engines and Information Retrieval

Modeling User Interactions in Social Media Eugene Agichtein Emory University.

Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Scalable Text Mining with Sparse Generative Models

Online Search Evaluation with Interleaving Filip Radlinski Microsoft.

Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.

Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:

Modern Retrieval Evaluations Hongning Wang

Learning to Rank for Information Retrieval

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.

Modeling Information Seeking Behavior in Social Media Eugene Agichtein Emory University.

1 Natural Language Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)

Personalization of the Digital Library Experience: Progress and Prospects Nicholas J. Belkin Rutgers University, USA

Search Engines and Information Retrieval Chapter 1.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.

Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

Modeling Information Seeking Behavior in Social Media Eugene Agichtein Intelligent Information Access Lab (IRLab)

1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Question Answering over Implicitly Structured Web Content

LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.

IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow.

Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Post-Ranking query suggestion by diversifying search Chao Wang.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

NTU & MSRA Ming-Feng Tsai

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.

Search User Behavior: Expanding The Web Search Frontier

Evaluation of IR Systems

Learning to Rank Shubhra kanti karmaker (Santu)

Eugene Agichtein Mathematics & Computer Science Emory University

Learning to Rank with Ties

Presentation transcript:

Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University

Intelligent Information Access Lab Intelligent Information Access Lab Research areas: Research areas: – Information retrieval & extraction, text mining, and information integration – User behavior modeling, social networks and interactions, social media People And colleagues at Yahoo! Research, Microsoft Research, Emory Libraries, Psychology, Emory School of Medicine, Neuroscience, and Georgia Tech College of Computing. Support Walter Askew, EC‘09 Qi Guo, 2 nd year Ph.D Yandong Liu, 2 nd year Ph.D Ryan Kelly, Emory’10 Alvin Grissom, 2 nd year MS Abulimiti Aji, 1 st Year Ph.D

3 User Interactions: The 3 rd Dimension of the Web Amount exceeds web content and structure Amount exceeds web content and structure – Published: 4Gb/day; Social Media: 10gb/Day – Page views: 100Gb/day [Andrew Tomkins, Yahoo! Search, 2007]

Talk Outline Web Search Interactions – Click modeling – Browsing Social media – Content quality – User satisfaction – Ranking and Filtering

Interpreting User Interactions Clickthrough and subsequent browsing behavior of individual users influenced by many factors – Relevance of a result to a query – Visual appearance and layout – Result presentation order – Context, history, etc. General idea: – Aggregate interactions across all users and queries – Compute “expected” behavior for any query/page – Recover relevance signal for a given query

Case Study: Clickthrough Clickthrough frequency for all queries in sample Clickthrough (query q, document d, result position p) = expected (p) + relevance (q, d)

Clickthrough for Queries with Known Position of Top Relevant Result Relative clickthrough for queries with known relevant results in position 1 and 3 respectively Higher clickthrough at top non-relevant than at top relevant document

Model Deviation from “Expected” Behavior Relevance component: deviation from “expected”: Relevance(q, d)= observed - expected (p)

Predicting Result Preferences Task: predict pairwise preferences – A user will prefer Result A > Result B Models for preference prediction – Current search engine ranking – Clickthrough – Full user behavior model

Predicting Result Preferences: Granka et al., SIGIR 2005 SA+N: “Skip Above” and “Skip Next” – Adapted from Joachims’ et al. [SIGIR’05] – Motivated by gaze tracking Example – Click on results 2, 4 – Skip Above: 4 > (1, 3), 2>1 – Skip Next: 4 > 5, 2>

Our Extension: Use Click Distribution CD: distributional model, extends SA+N – Clickthrough considered iff frequency > ε than expected Click on result 2 likely “by chance” 4>(1,2,3,5), but not 2>(1,3)

Results: Click Deviation vs. Skip Above+Next

Problem: Users click based on result summaries/”captions”/”Snippets” Effect of Caption Features on Clickthrough Inversions, C. Clarke, E. Agichtien, S. Dumais, R. White, SIGIR 2007

Clickthrough Inversions

Relevance is Not the Dominant Factor!

Snippet Features Studied

Feature Importance

Important Words in Snippet

Summary Clickthrough inversions are powerful tool for assessing the influence of caption features. Relatively simple caption features can significantly influence user behavior. Can help more accurately predicting relevance from clickthough by accounting for summary bias.

20 Idea: go beyond clickthrough/download counts Presentation ResultPosition Position of the URL in Current ranking QueryTitleOverlap Fraction of query terms in result Title Clickthrough DeliberationTime Seconds between query and first click ClickFrequency Fraction of all clicks landing on page ClickDeviation Deviation from expected click frequency Browsing DwellTime Result page dwell time DwellTimeDeviation Deviation from expected dwell time for query

User Behavior Model Full set of interaction features – Presentation, clickthrough, browsing Train the model with explicit judgments – Input: behavior feature vectors for each query-page pair in rated results – Use RankNet (Burges et al., [ICML 2005]) to discover model weights – Output: a neural net that can assign a “relevance” score to a behavior feature vector

RankNet for User Behavior RankNet: general, scalable, robust Neural Net training algorithms and implementation Optimized for ranking – predicting an ordering of items, not scores for each Trains on pairs (where first point is to be ranked higher or equal to second) – Extremely efficient – Uses cross entropy cost (probabilistic model) – Uses gradient descent to set weights – Restarts to escape local minima

RankNet [Burges et al. 2005] Feature Vector1 Label1 NN output 1 For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)

RankNet [Burges et al. 2005] Feature Vector2 Label2 NN output 1 NN output 2 For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)

RankNet [Burges et al. 2005] NN output 1 NN output 2 Error is function of both outputs (Desire output1 > output2) For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)

RankNet [Burges et al. 2005] NN output 1 NN output 2 Error is function of both outputs (Desire output1 > output2) Update feature weights: – Cost function: f(o1-o2) – details in Burges et al. paper – Modified back-prop

Predicting with RankNet Feature Vector1 NN output Present individual vector and get score

28 Example results: Predicting User Preferences Baseline < SA+N < CD << UserBehavior Rich user behavior features result in dramatic improvement

How to Use Behavior Models for Ranking? Use interactions from previous instances of query – General-purpose (not personalized) – Only for the queries with past user interactions Models: – Rerank, clickthrough only: reorder results by number of clicks – Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences – Integrate directly into ranker: incorporate user interactions as features for the ranker

Enhance Ranker Features with User Behavior Features For a given query – Merge original feature set with user behavior features when available – User behavior features computed from previous interactions with same query Train RankNet [Burges et al., ICML’05] on the enhanced feature set

Feature Merging: Details Value scaling: – Binning vs. log-linear vs. linear (e.g., μ=0, σ=1) Missing Values: – 0? (meaning for normalized feats s.t. μ=0?) Runtime: significant plumbing problems Result URLBM25PageRank … ClicksDwellTime … sigir2007.org … ?? … Sigir2006.org … … acm.org/sigs/sigir/1.22 … … Query: SIGIR, fake results w/ fake feature values

Evaluation Metrics Precision at K: fraction of relevant in top K NDCG at K: norm. discounted cumulative gain – Top-ranked results most important MAP: mean average precision – Average precision for each query: mean of the precision at K values computed after each relevant document was retrieved

Content, User Behavior: NDCG BM25 < Rerank-CT < Rerank-All < +All

Full Search Engine, User Behavior: NDCG, MAP MAPGain RN0.270 RN+ALL ( 19.13%) BM BM25+ALL (23.71%)

User Behavior Complements Content and Web Topology RN (Content + Links)0.632 RN + All (User Behavior) (10%) BM BM25+All (31%)

Which Queries Benefit Most Most gains are for queries with poor ranking

Result Summary Incorporating user behavior into web search ranking dramatically improves relevance Providing rich user interaction features to ranker is the most effective strategy Large improvement shown for up to 50% of test queries

38 User Generated Content

39 Some goals of mining social media Find high-quality content Find high-quality content Find relevant and high quality content Find relevant and high quality content Use millions of interactions to Use millions of interactions to – Understand complex information needs – Model subjective information seeking – Understand cultural dynamics

40

41

Lifecycle of a Question in CQA 42 User Choose a category Choose a category Compose the question Open question Open question Examine Find the answer? Close question Choose best answers Give ratings Close question Choose best answers Give ratings Question is closed by system. Best answer is chosen by voters Question is closed by system. Best answer is chosen by voters Yes No Answer

43

44

45

46

47

48Community

49

50

51

52

53

54 Editorial Quality != User Popularity != Usefulness

Are editor/judge labels “meaningful”? Information seeking process: want to find useful information about topic with incomplete knowledge N. Belkin: “Anomalous States of Knowledge” Want to model directly if user found satisfactory information Specific (amenable) case: CQA

56 Yahoo! Answers: The Good News Active community of millions of users in many countries and languages Active community of millions of users in many countries and languages Accumulated a great number of questions and answers Accumulated a great number of questions and answers Effective for subjective information needs Effective for subjective information needs – Great forum for socialization/chat (Can be) invaluable for hard-to-find information not available on web (Can be) invaluable for hard-to-find information not available on web

57

58 Yahoo! Answers: The Bad News May have to wait a long time to get a satisfactory answer May have to wait a long time to get a satisfactory answer May never obtain a satisfying answer May never obtain a satisfying answer FIFA World Cup 2. Optical 3. Poetry 4. Football (American) 5. Scottish Football (Soccer) 6. Medicine 7. Winter Sports 8. Special Education 9. General Health Care 10. Outdoor Recreation Time to close a question (hours) for sample question categories Time to close (hours)

59 Asker Satisfaction Problem Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. – Where “Satisfied” is defined as: The asker personally has closed the question AND The asker personally has closed the question AND Selected the best answer AND Selected the best answer AND Provided a rating of at least 3 “stars” for the best answer Provided a rating of at least 3 “stars” for the best answer – Otherwise, the asker is “Unsatisfied

Approach: Machine Learning over Content and Usage Features Theme: holistic integration of content analysis and usage analysis Method: Supervised (and later partially- supervised) machine learning over features Tools: – Weka (ML library): SVM, Boosting, DTs, NB, … – Part of speech taggers, chunkers – Corpora (wikipedia, web, queries, …)

61 Satisfaction Prediction Features Approach: Classification algorithms from machine learning Approach: Classification algorithms from machine learning Classifier Support Vector Machines Decision Tree Boosting Naïve Bayes asker is satisfied asker is not satisfied Textual Features Category Features Answerer History Features Asker History Features Answer Features Question Features

62 Prediction Algorithms Heuristic: # answers Heuristic: # answers Baseline: Simply predicts the majority class (satisfied). Baseline: Simply predicts the majority class (satisfied). ASP_SVM: Our system with the SVM classifier ASP_SVM: Our system with the SVM classifier ASP_C4.5: with the C4.5 classifier ASP_C4.5: with the C4.5 classifier ASP_RandomForest: with the RandomForest classifier ASP_RandomForest: with the RandomForest classifier ASP_Boosting: with the AdaBoost algorithm combining weak learners ASP_Boosting: with the AdaBoost algorithm combining weak learners ASP_NaiveBayes: with the Naive Bayes classifier ASP_NaiveBayes: with the Naive Bayes classifier

63 Evaluation metrics Precision Precision – The fraction of the predicted satisfied asker information needs that were indeed rated satisfactory by the asker. Recall Recall – The fraction of all rated satisfied questions that were correctly identified by the system. F1 F1 – The geometric mean of Precision and Recall measures, – Computed as 2*(precision*recall)/(precision+recall) Accuracy Accuracy – The overall fraction of instances classified correctly into the proper class.

64 Datasets Crawled from Yahoo! Answers in early 2008 (Thanks, Yahoo! for support)QuestionAnswerAskersCategories % Satisfied 216,1701,963,615158, % Data is available at

65 Dataset Statistics Category#Q#A #A per Q Satisfied Avg asker rating Time to close by asker 2006 FIFA World Cup(TM) % minutes Mental Health % day and 13 hours Mathematics % minutes Diet & Fitness % days Asker satisfaction varies significantly across different categories. #Q, #A, Time to close … -> Asker Satisfaction

66 Satisfaction Prediction: Human Perf Truth: asker’s rating Truth: asker’s rating A random sample of 130 questions A random sample of 130 questions Annotated by researchers to calibrate the asker satisfaction Annotated by researchers to calibrate the asker satisfaction – Agreement: 0.82 – F1: 0.45

67 A service provided by Amazon. Workers submit responses to a Human Intelligence Task (HIT) for $ per A service provided by Amazon. Workers submit responses to a Human Intelligence Task (HIT) for $ per Can usually get 1000s of items labeled in hours Can usually get 1000s of items labeled in hours Satisfaction Prediction: Human Perf (Cont’d): Amazon Mechanical Turk Amazon Mechanical Turk

68 Satisfaction Prediction: Human Perf (Cont’d): Amazon Mechanical Turk Methodology Methodology – Used the same 130 questions – For each question, list the best answer, as well as other four answers ordered by votes – Five independent raters for each question. – Agreement: 0.9 F1: – Best accuracy achieved when at least 4 out of 5 raters predicted asker to be ‘satisfied’ (otherwise, labeled as “unsatisfied”).

69 Comparison of Human and Automatic (F1 measure) Classifier With Text Without Text Selected Features ASP_SVM ASP_C ASP_RandomForest ASP_Boosting ASP_NB Best Human Perf 0.61 Baseline (naïve) 0.66 C4.5 is the most effective classifier in this task Human F1 performance is lower than the na ï ve baseline!

70 Features by Information Gain (Satisfied class) Q: Askers’ previous rating Q: Average past rating by asker UH: Member since (interval) UH: Average # answers for by past Q UH: Previous Q resolved for the asker CA: Average asker rating for the category UH: Total number of answers received CA: Average voter rating Q: Question posting time CA: Average # answers per Q

71 “Offline” vs. “Online” Prediction Offline prediction: Offline prediction: – All features( question, answer, asker & category) – F1: 0.77 Online prediction: Online prediction: – NO answer features – Only asker history and question features (stars, #comments, sum of votes…) – F1: 0.74

72 Feature Ablation PrecisionRecallF1 Selected features No question-answer features No answerer features No category features No asker features No question features Asker & Question features are most important. Answer quality/Answerer expertise/Category characteristics: may not be important caring or supportive answers might be preferred sometimes

73 Satisfaction: varying by asker experience Group together questions from askers with the same number of previous questions Accuracy of prediction increase dramatically Reaching F1 of 0.9 for askers with >= 5 questions

74 Personalized Prediction of Asker Satisfaction with info Same information != same usefulness for different users! Same information != same usefulness for different users! Personalized classifier achieves surprisingly good accuracy (even with just 1 previous question!) Personalized classifier achieves surprisingly good accuracy (even with just 1 previous question!) Simple strategy of grouping users by number of previous questions is more effective than other methods for users with moderate amount of history Simple strategy of grouping users by number of previous questions is more effective than other methods for users with moderate amount of history For users with >= 20 questions, textual features are more significant For users with >= 20 questions, textual features are more significant

75 Some Results

76 Some Personalized Models

77 Summary Asker satisfaction is predictable Asker satisfaction is predictable – Can achieve higher than human accuracy by exploiting interaction history User’s experience is important User’s experience is important General model: one-size-fits-all General model: one-size-fits-all – 2000 questions for training model are enough Personalized satisfaction prediction: Personalized satisfaction prediction: – Helps with sufficient data (>= 1 prev interactions, can observe text patterns with >=20 prev. interactions)

78 Other tasks in progress Subjectivity, sentiment analysis Subjectivity, sentiment analysis – B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation, in Proc. of EMNLP 2008 Discourse analysis Discourse analysis Cross-cultural comparisons Cross-cultural comparisons CQA vs. web search comparison CQA vs. web search comparison

79 Summary User-generated Content User-generated Content – Growing – Important: impact on main-stream media, scholarly publishing, … – Can provide insight into information seeking and social processes – “Training” data for IR, machine learning, NLP, …. – Need to re-think quality, impact, usefulness

References Y. Liu, J. Bian, and E. Agichtein, Predicting Information Seeker Satisfaction in Community Question Answering, in Proc. of the ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR), 2008 Y. Liu and E. Agichtein, You've Got Answers: Towards Personalized Models for Predicting Success in Community Question Answering (short paper), in Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL), 2008 B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation, in Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP), 2008 E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, in Proc. of the ACM Web Search and Data Mining Conference (WSDM), 2008 C. Clarke, E. Agichtein, S. T. Dumais, and R. W. White, The Influence of Caption Features on Clickthrough Patterns in Web Search, in Proc. of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2007 P. Jurczyk and E. Agichtein, Discovering Authorities in Question Answer Communities Using Link Analysis (short paper), in Proc. of the ACM Conference on Information and Knowledge Management (CIKM), 2007 E. Agichtein, E. Brill, and S. T. Dumais, Improving Web Search Ranking by Incorporating User Behavior Information, in Proc. of the ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR), 2006 E. Agichtein, E. Brill, S. T. Dumais, and R. Ragno, Learning User Interaction Models for Predicting Web Search Result Preferences, in Proc. of the ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR), 2006

Thank you!

82 Question-Answer Features Q: length, posting time… QA: length, KL divergence Q:Votes Q:Terms

83 User Features U: Member since U: Total points U: #Questions U: #Answers

84 Category Features CA: Average time to close a question CA: Average time to close a question CA: Average # answers per question CA: Average # answers per question CA: Average asker rating CA: Average asker rating CA: Average voter rating CA: Average voter rating CA: Average # questions per hour CA: Average # questions per hour CA: Average # answers per hour CA: Average # answers per hour Category#Q#A #A per Q Satisfied Avg asker rating Time to close by asker General Health % day and 13 hours