Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.

Similar presentations


Presentation on theme: "Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous."— Presentation transcript:

1 Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous Knowledge Processing (UKP) Lab Computer Science Department Technische Universit¨at Darmstadt, Hochschulstraße 10 D-64289 Darmstadt, Germany ACL 2009 Reporter: Kan-Wen Tien Date: 2009.10.22

2 Outlines Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

3 Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

4 Introduction Lexical gap between queries and documents or questions and answers Several solutions : – Query reformulation, query paraphrasing – Query expansion – Semantic information retrieval

5 Introduction Several solutions : – Integrate monolingual statistical translation models in the retrieval process (1999) Drawback: limited availability of truly parallel monolingual corpora Training data often consist in question-answer pairs and usually extracted from the evaluation corpus itself

6 Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

7 Related Work Statistical translation models for retrieval Built synthetic training data Train translation models on Q&A pairs – Answers -> source language – Questions -> target language Select the most important terms to build compact translation models

8 Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

9 Parallel Datasets Different data resources: (1)Manually-tagged question reformulations and question-answer pairs from the WikiAnswers social Q&A site (2) Glosses from WordNet, Wiktionary, Wikipedia and Simple Wikipedia

10

11 Parallel Datasets (1) Manually-tagged question reformulations and question-answer pairs From social Q&A sites: WikiAnswers (WA) – Question-Answer Pairs (WAQA) – Question Reformulations (WAQ) [URL]

12 Parallel Datasets (2) Glosses from WordNet, Wiktionary, Wikipedia and Simple Wikipedia Lexical Semantic Resources (LSR) – Word sense alignment Example !

13 Parallel Datasets Example: “moon” – Wordnet (sense 1): The natural satellite of the Earth. – English Wiktionary: The Moon, the satellite of planet Earth. – English Wikipedia: The Moon (Latin: Luna) is Earth’s only natural satellite and the fifth largest natural satellite in the Solar System.

14 Parallel Datasets Three datasets: Question-Answer Pairs (WAQA) 1,227,362 parallel pairs Question Reformulations (WAQ) 4,379,620 parallel pairs Lexical Semantic Resources (LSR) 397,136 pairs

15 Parallel Datasets Translation Model Training – Pre-processing steps – GIZA++ SMT Toolkit -> word-to-word translation probabilities – IBM translation model 1

16 Parallel Datasets Combination of the datasets – Lin (combination of models after training) – Pool (concatenating the corpora before training)

17 Parallel Datasets

18

19

20 Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

21 Semantic Relatedness Experiments Goal: Word translation probabilities vs. Concept vector based measure Concept vector based measure relying on Explicit Semantic Analysis (Gabrilovich and Markovitch, 2007) Compare with traditional semantic relatedness measures

22

23 Semantic Relatedness Experiments

24 Testing data set: 353 word-to-word pairs – Created by Finkelstein et al. (2002) – Fin1-153: 153 pairs – Fin2-200: 200 pairs

25 Semantic Relatedness Experiments Testing data set: 353 word-to-word pairs – Created by Finkelstein et al. (2002) – Fin1-153: 153 pairs – Fin2-200: 200 pairs

26 Semantic Relatedness Experiments Use Spearman’s Rank Correlation Coefficients (-1, 0, +1) [URL]

27 Semantic Relatedness Experiments Use Spearman’s Rank Correlation Coefficients (-1, 0, +1) [URL]

28 Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

29 Answer Finding Experiments Goal: provide an extrinsic evaluation of the translation probabilities by employing them in an answer finding task. Using a ranking function to perform retrieval

30 Answer Finding Experiments Ranking function (β = 0.8, λ = 0.5)

31 Answer Finding Experiments Ranking function (β = 0.8, λ = 0.5)

32 Answer Finding Experiments Ranking function (β = 0.8, λ = 0.5) Query likelihood model Translation model

33 Answer Finding Experiments Testing data: Microsoft Research QA Corpus 1,364 questions, 9,780 answers 5 levels of relevance judgements: 0: No Judgement Made 1: Extract Answers 3: Off Topic 4: On Topic, Off Target 5: Partial Answer

34 Answer Finding Experiments Testing data: Microsoft Research QA Corpus 1,364 questions, 9,780 answers 5 levels of relevance judgements: 0: No Judgement Made 1: Extract Answers 3: Off Topic 4: On Topic, Off Target 5: Partial Answer

35 Answer Finding Experiments

36 Mean Average Precision (MAP) Mean R-Precision (R-prec) Baselines: – Query likelihood model (QLM) ---> β = 0 – Lucene Query likelihood model Translation model

37 Answer Finding Experiments

38

39

40

41

42

43

44

45 Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

46 Propose new kinds of datasets for training Provide the first intrinsic evaluation of word translation probabilities with respect to human relatedness rankings for reference word pairs Models based on translation probabilities for answer finding

47 Thank you !


Download ppt "Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous."

Similar presentations


Ads by Google