Presentation is loading. Please wait.

Presentation is loading. Please wait.

Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.

Similar presentations


Presentation on theme: "Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009."— Presentation transcript:

1 Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009

2 Outline Introduction Related Work Problem Definition Classification Methods Experiments Conclusion

3 Introduction Online users share ideas, discuss issues and form communities within discussion boards(online forums) Knowledge discovery and information extraction Several potential applications about mining QA content: Search engines Online QA services Experts in social media Knowledge base of automatic chat-bots

4 Related Work Cong et al., 2008 They developed a classification-based method for question detection sequential pattern features extracted from both questions and non-questions in forums Preprocess by applying a POS tagger while keeping 5W1H and modal words Time-consuming problem Focus on question sentences or question paragraphs

5 Related Work(cont’d) Knowledge acquisition from discussion boards Zhou and Hovy, 2005 Feng et al., 2006 Using non-textual features like click count to predict the quality of answers Jeon et al., 2006 In general all related work does not need to detect questions

6 Tasks Tasks: Identifying question-related first posts Fining potential answers in subsequent responses within the corresponding threads Some questions…

7 Tasks(cont’d) Some questions: Can we detect question-related threads in an efficient and effective manner? What other features can be used to improve the performance? How much can the combinations of some simple heuristics improve performance? Are traditional relevance-based approaches suitable to these QA content?

8 Problem Definition Questions Focus on finding whether the first post is a question post Treat the whole post as a question post:

9 Problem Definition Questions Focus on finding whether the first post is a question post Treat the whole post as a question post:

10 Problem Definition Questions Focus on finding whether the first post is a question post Treat the whole post as a question post:

11 Problem Definition(cont’d) Answers If one of the replied posts contains answers to the questions proposed in the first post, then regard that reply as an answer post Also consider replied post not containing the actual content of answers but providing links to other potential answers an answer posts. Result from the system: Question-answer post pairs

12 Classification Methods(1/3) NTU CSIE LIBSVM 2.88 Question detection: Question mark 5W1H words Total number of posts within one thread Authorship N-gram

13 Classification Methods(2/3) Answer detection The position of the answer post Authorship N-gram Stop words Query likelihood model score

14 Classification Methods(3/3) Cong et al., 2008 Sequential pattern mining Graph-based model Query likelihood language model KL-divergence language model

15 Experiments(1/9) Data crawled 555,954 threads from Ubuntu dataset 721,422 threads from Photography On The Net Question detection task: Randomly sampled 572 threads from Ubuntu dataset and 500 threads from the DC dataset Answer detection task: Randomly sampled 500 question-related threads from both dataset

16 Experiments(2/9)

17

18

19

20 Experiments(3/9)

21 Experiments(4/9)

22

23 Experiments(5/9)

24

25

26 Experiments(6/9)

27 Experiments(7/9)

28

29

30 Experiments(8/9) Propose a ranking scheme Ranking score: V1: position + authorship, V2: position, V3: authorship

31 Experiments(9/9)

32 Conclusion Use of N-grams and the combination of several non- content features can improve the performance Relevance-based retrieval methods would not be effective in tackling the problem but the performance can be improved by combining with non-content features Design a simple ranking scheme that outperforms previous approaches

33 Combine several potential answers together to make a better answer ? A good understanding of the interaction of question answering in the discussion boards

34 Thank You !


Download ppt "Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009."

Similar presentations


Ads by Google