Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding Question-Answer Pairs from Online Forums ACM, SIGIR 08 Gao Cong Aalborg University, Aalborg, Denmark Long Wang Tianjin University, Tianjin, China.

Similar presentations


Presentation on theme: "Finding Question-Answer Pairs from Online Forums ACM, SIGIR 08 Gao Cong Aalborg University, Aalborg, Denmark Long Wang Tianjin University, Tianjin, China."— Presentation transcript:

1 Finding Question-Answer Pairs from Online Forums ACM, SIGIR 08 Gao Cong Aalborg University, Aalborg, Denmark Long Wang Tianjin University, Tianjin, China Chin-Yew Lin Microsoft Research Asia, Beijing, China Young-In Song Korea University, Seoul, South Korea Yueheng Sun Tianjin University, Tianjin, China

2 Introduction Yahoo! Answers. Yahoo! Answers. Forums contain a huge amount of valuable user generated content on a variety of topics. Find Question-Answer pair in forums.

3 Algorithms Question Detection Question Detection 5W1H 5W1H Most of questions are not begin with 5W1H. Most of questions are not begin with 5W1H. Question Mark Question Mark 30% questions do not end with question mark. 30% questions do not end with question mark. “ I am wondering where I can buy cheap and good clothing in beijing. ” “ I am wondering where I can buy cheap and good clothing in beijing. ” Labeled Sequential Pattern (LSP) Labeled Sequential Pattern (LSP)

4 Graph based propagation method Building Graph Building Graph Given a question q, and the set A_q of its candidate answers. Given a question q, and the set A_q of its candidate answers. For 2 candidate answers a1 a2, compute KL(a1|a2) For 2 candidate answers a1 a2, compute KL(a1|a2) If 1/(1+KL(a1|a2)) is lager than a threshold θ, then add an edge from a1 to a2. If 1/(1+KL(a1|a2)) is lager than a threshold θ, then add an edge from a1 to a2.

5 Graph based propagation method Edge Weight Edge Weight Normalized Normalizedλ=0.01

6 Computing Propagated Scores Propagation without initial score Propagation without initial score Propagation with initial score Propagation with initial score

7 Answer Detection score(q,a) score(q,a) Cosine Similarity. Cosine Similarity. Query likelihood language model. Query likelihood language model. KL-divergence language model. KL-divergence language model.

8 Experiment Data Data Select three forums of different scales to obtain source data. Select three forums of different scales to obtain source data. Two annotators Two annotators The kappa statistic for identifying questions is 0.96. The kappa statistic for linking answers and questions given a question is 0.69.

9 Experiment Q-Tinter : intersection of two annotators. Q-Tinter : intersection of two annotators.

10 Experiment 1,535 questions from 600 threads, 284 questions do not have answers. 1,535 questions from 600 threads, 284 questions do not have answers.

11 Experiment Improved results on subsets Improved results on subsets Of 486 first questions, only 21 of them do not have answers for A-TUnion data and 45 for A-TInter data. Of 486 first questions, only 21 of them do not have answers for A-TUnion data and 45 for A-TInter data.

12 Experiment G_K : Computing weight with KL-Divergence alone. G_K : Computing weight with KL-Divergence alone. G_1 : Propagation without initial score. G_1 : Propagation without initial score. G_2 : Propagation with initial score. G_2 : Propagation with initial score.

13 Experiment Data from three forum Data from three forum Tripadvisor, Lonely Planet, Tripadvisor, Lonely Planet, Bootsnall.

14 Thank You.

15 Experiment


Download ppt "Finding Question-Answer Pairs from Online Forums ACM, SIGIR 08 Gao Cong Aalborg University, Aalborg, Denmark Long Wang Tianjin University, Tianjin, China."

Similar presentations


Ads by Google