11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese University of Hong Kong 2 AT&T Labs Research {czhou,lyu,king}@cse.cuhk.edu.hk irwin@research.att.com Workshop on Community Question Answering on the Web in Conjunction with World Wide Web 2012 April 17, 2012

22 Introduction Problem Definition and Feature Experiments Conclusions and Future Work Related Work

33 Community-based Question Answering Knowledge dissemination, information seeking Natural language questions Explicit, self-contained answers

44 How CQA Works Submit Question Get Answers? Answer Selection, Question Resolved yes no Question Not Resolved CQA users The number of posted questions grows fast. Whether users could get questions resolved within a reasonable period?

55 Whether Questions Get Resolved Randomly sample 140 questions from each category in Yahoo! Answers 26 top-level categories In total 3,640 questions Track the status of each question

6 123456789 11.95%19.95%24.75%26.48%27.31%51.32%61.92%63.41%64.45% Percentage of Questions Resolved

77 CQA users How CQA Works Submit Question Get Answers? Answer Selection, Question Resolved yes no Question Not Resolved How about we carefully select a set of CQA users who may be interested in the question?

88 Question Routing Definition –Routing open questions to suitable answerers who may be interested in the question Not interested in the question Interested in the question No Yes

99 Question Routing Benefits –Asker’s Perspective Reduce time lag between the time a question is posted and it is answered –Answerer’s Perspective More enthusiastic in providing answers for interested questions –CQA’s perspective Leverage users’ answering passion, leading to the improvement of the CQA, as well as the boosts of the user’s adhesiveness and loyalty to the system

11 Problem Definition Question Routing Problem Given a question and a user in CQA, determine whether the user will contribute his/her knowledge to answer the question

12 Feature Investigation Local Features –Only local information about question, user history and question-user relationships are needed Global Features –Take into account the global information of CQA –Consider category as the global information –Questions in the same category discuss similar topics –Incorporating global information act as the smoothing effect

13 Feature Investigation # of featuresQuestionUser HistoryQuestion- User Relationship Local Features 3107 Global Features 321 Feature Investigation Summary

14 Local Features Question (3 features) –Question Length Agichtein et al. 2008 found question length an important feature to measure question quality 1.Title length 2.Detail length –Question Type 3.5W1H type –Why, what, where, who and how

15 Local Features User History (10 features) –Users’ history would have implications for users’ interests and behaviors –Profile, question and answering behaviors 1.Member since 2.Percentage of best answer 3.Total points 4.Number of answers 5.Number of best answers 6.Number of asked questions 7.Number of resolved questions

16 Local Features User History (10 features) 8.Number of stars received 9.Answer/question ratio 10.Best answer/question ratio

17 Local Features Question-User Relationship (7 features) –Capture the relationship between a question and a user –Features adapted from the existing CQA service 1.Top contributor –Features that measure the extent the user is interested in the category given question belongs to 2.Ratio of answered question in the category 3.Ratio of best answered question in the category 4.Ratio of asked question in the category 5.Ratio of starred question in the category

18 Local Features Question-User Relationship (7 features) –Features describing the similarity of the question’s language model and the user’s language model 6.KL-divergence between given question and a user’s answered questions 7.KL-divergence between given question and a user’s background language model (answered, asked, and starred questions)

19 Global Features Question (3 features) –Category-level features that smooth each question 1.Average title length 2.Average detail length –Whether the question is representative in the category 3.KL-divergence value between given question and questions in the category given question belongs to

20 Global Features User History (2 features) –Capture the uniqueness of a user Question-User Relationship (1 feature) –The more similar the language model of a user’s answered questions and that of the questions in a category, the more probable a user would answer the questions from the category KL-divergence between the user’s answered questions and questions in the category given question belongs to

22 Experiments Classification Algorithm –Support vector machines (SVM) with linear kernel Metrics –Precision, recall, F1 for positive class –Accuracy for both classes Dataset –Crawled from 3,500 users’ “Answers”, “Questions”, and “Starred Questions” pages from Yahoo! Answers

23 Effect of Local Features PrecisionRecallF1Accuracy Question0.53140.38960.44960.5157 User History0.82780.46820.59810.6805 Question-User Relationship 0.58240.9350.71780.6267 Question-User Relationship achieves the best F1 and recall Capture the user’s performance and interests in the category of the given question Capture the semantic relatedness of the given question and the user User History achieves the best precision Some users are quite active in the system These highly active users only account for a few percentage among all users

24 Effect of Local Features PrecisionRecallF1Accuracy Q + QU Relationship 0.59740.91340.72230.6435 U + QU Relationship 0.73620.82750.77920.7619 Q + U + QU Relationship 0.74180.82530.78140.7655 Top 10 features in Local features 0.69640.80950.74870.7241 The combination of all local features achieves the best F1 Results of employing the top 10 features are also encouraging

25 Effect of Local Features Two most important local features –KL-divergence value between given question and questions answered by the user Capture the most accurate semantic relatedness between the given question and the knowledge of the user –KL-divergence value between given question and questions answered, asked, and starred by the user Consider the user’s interests as well by incorporating other factors

26 Effect of Local and Global Features PrecisionRecallF1Accuracy Local0.74180.82530.78140.7655 Global0.57790.87130.69490.6109 Local + Global0.72790.84990.78420.7689 Combination of local features and global features promise to maintain the best elements of the two, and the best F1 score is consequently achieved

27 Effect of Local and Global Features Three most important features –KL-divergence value between given question and questions answered by the user –KL-divergence value between given question and questions answered, asked, and starred by the user –KL-divergence value between given question and questions from the same category If a question is quite typical in the category, it would have higher chance to be answered by users, and this could also partially explain the reason why CQA services usually have well-structured categories

29 Related Work Question Routing –Zhou et al. 2009, expertise-based question routing –Li and King 2010, language model based framework for combining expertise estimation and availability estimation –Li et al. 2011, category-sensitive language model Link analysis and Expert Finding –Jurczyk and Agichtein, 2007 –Zhang, Ackerman and Adamic, 2007 –Apply PageRank and HITS in social media

31 Conclusions Formulate question routing as a classification task Derive a variety of local and global features Analyze the contributions from different sources Thorough experimental study

32 Future Work Semi-supervised approach Incorporate social aspects into the model

33 Thanks Q&A

34 FAQ How to prepare positive and negative instances? –If a user answered a question, we considered the question-user pair as positive instance –If a user asked a question, we considered the question-user pair as negative instance –Assumption If a user asked a question, it might mean that he/she did not possess the knowledge about the question –More realistic negative instance? Present an open question to a user, but the user does not answer the question (Data only available at CQA owners)

35 FAQ How to know the importance of each feature? –We employ SVM with linear kernel. We could rank features’ importance by sorting the absolute weight values of the SVM model, and the weight value of the j-th feature could be calculated as follows: –n is the number of training samples, α i is the support vector, y i is the label, and x i j is the value of j-th feature of observation x i

36 FAQ How to calculate KL-divergence? –Kullback-Leibler divergence of two distributions P and Q are calculated as follows: –P(i) and Q(i) are estimated based on Maximum Likelihood Estimation (MLE)

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Similar presentations

Presentation on theme: "11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Similar presentations

Presentation on theme: "11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese."— Presentation transcript:

Similar presentations

About project

Feedback