Presentation is loading. Please wait.

Presentation is loading. Please wait.

清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu University.

Similar presentations


Presentation on theme: "清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu University."— Presentation transcript:

1 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

2 清华大学计算机系 Outline Introduction and Related Work List-type Question –Answer generating method –Method result and analysis Solution-type Question –Visible list –Select the best list –Experiment and analysis Conclusion Future Work

3 清华大学计算机系 Introduction Online community question answering (cQA) portals have become a popular way to acquire information, like Soso Wenwen and Baidu Zhidao. But they have some limitations: –Can’t get answers in real-time. –The quality of many answers is not high.

4 清华大学计算机系 Related Work To overcome unreal-time limitation, cQA portals support search service. –Users need to click links to see the whole answers. –Spend long time to find useful information.

5 清华大学计算机系 Related Work To return high-quality answers –Predict the quality of cQA answers. User profile features, text features, etc. –Use multi-document summarization to summarize answers. More comprehensive but less readable. –To improve answer quality, almost all well-perform systems introduce a question taxonomy.

6 清华大学计算机系 Related Work The question taxonomy proposed by Fan Bu contains 6 question types: TYPEproportion List23.8% Solution19.7% Reason18.1% Navigation14.8% Fact14.4% Definition7.5% Examples: – List-type: List Nobel prize winners in 1990s? – Solution-type: How to make pizzas?

7 清华大学计算机系 Research Framework Propose answer generating methods for both List- type and Solution-type questions.

8 清华大学计算机系 List-type Question Each answer will be a single phrase or a list of phrases.

9 清华大学计算机系 Answer Generating Method Two characteristics about answers: –“Best Answer” often don’t contain all answer points. –Answer points which are high-quality or relevant to the question often appear in more than one answers. Propose a method based on clustering of answer points.

10 清华大学计算机系 Answer Generating Method

11 清华大学计算机系 Answer Generating Method

12 清华大学计算机系 Example of the Method Result

13 清华大学计算机系 Method Result and Analysis Result contains more answer points than “Best Answer”. Outputs are ranked. Easy to control the answer length. Further research is needed: –Split answer into answer points. –Choose the threshold of clustering.

14 清华大学计算机系 Solution-type Question Visible List

15 清华大学计算机系 Solution-type Question Visible List –Choose 1179 solved Solution-type questions from Baidu Zhidao, 30% questions’ answers having visible lists. –Average length of “Best Answer” is above 1400 words, while average length of visible list is about 600 words. –55% questions have more than one visible lists. We propose a method to select the best list.

16 清华大学计算机系 Select the Best List Features: –FirstList If the list is the first list of the answer, then this feature value is 1, otherwise its value is 0. –GuideSimilarity Cosine similarity between Guide words and question title. –Guide words: 列表四:三种方法巧疗慢性咽炎 –Question title: 问题:慢性咽炎怎么治疗? –ContentSimilarity Cosine similarity between list content and question.

17 清华大学计算机系 Select the Best List Features: –VPRatio Word ratio of verbs and prepositions in the content of the list. –SummaryScore Summarized answer contains N sentences, for every visible list, if it contains k sentences out of the N sentences, then it will have a summary score of k/N. Method: –Each feature is a [0, 1] value, we use Learning to Rank model to get the weight of every feature.

18 清华大学计算机系 Experiment and Analysis Dataset: –Choose 1179 questions from Baidu Zhidao, 358 (30%) questions have visible lists. –196 (55%) questions have more than one lists. –Manually label a score to the 196 questions with more than one visible list: 1: high quality; 0:low quality. Two evaluations: –Evaluate the method of selecting the best list. –Evaluate the quality of visible list as the answer

19 清华大学计算机系 Result of Selected Visible-lists *Random select: 51.7%

20 清华大学计算机系 Evaluate Visible List as Answer Manually compare the quality of “Best Answer” and visible list for each question: –Mainly focus on the relevance to question, completeness and whether containing redundant information. The average length of visible list is 600 words, while the average length is more than 1400 words for “Best Answer”.

21 清华大学计算机系 Conclusion Relying on the similar questions and their answers from the cQA portals, propose appropriate answer generating methods for List-type and Solution-type questions –List-type questions: based on the clustering of answer points. –Solution-type questions: based on visible lists.

22 清华大学计算机系 Future Work List-type questions: –Do further research to split the answer into answer points more robustly. Solution-type questions: –Introduce more semantic features to improve the semantic relevance between selected list and question. Other types of questions: –Do further research to generate high-quality answers.

23 清华大学计算机系 Thanks


Download ppt "清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu University."

Similar presentations


Ads by Google