Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.

Slides:



Advertisements
Similar presentations
A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch.
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL /7/91Rick Liu.
Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.
A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia Presenter: Ziqi Zhang OAK Research Group, Department.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
NL Question-Answering using Naïve Bayes and LSA By Kaushik Krishnasamy.
11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.
| Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | Zhemin Zhu | Zhemin Zhu, UKP, TU Darmstadt, Germany Delphine Bernhard,
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.
The More the Better? Assessing the Influence of Wikipedia’s Growth on Semantic Relatedness Measures Torsten Zesch and Iryna Gurevych Ubiquitous Knowledge.
What Helps Where – And Why? Semantic Relatedness for Knowledge Transfer Marcus Rohrbach 1,2 Michael Stark 1,2 György Szarvas 1 Iryna Gurevych 1 Bernt Schiele.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
CLEF2003 Forum/ August 2003 / Trondheim / page 1 Report on CLEF-2003 ML4 experiments Extracting multilingual resources from corpora N. Cancedda, H. Dejean,
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Ling573 NLP Systems and Applications May 7, 2013.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Evgeniy Gabrilovich and Shaul Markovitch
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Unsupervised Sparse Vector Densification for Short Text Similarity
Recognizing Partial Textual Entailment
Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Presentation transcript:

Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous Knowledge Processing (UKP) Lab Computer Science Department Technische Universit¨at Darmstadt, Hochschulstraße 10 D Darmstadt, Germany ACL 2009 Reporter: Kan-Wen Tien Date:

Outlines Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

Introduction Lexical gap between queries and documents or questions and answers Several solutions : – Query reformulation, query paraphrasing – Query expansion – Semantic information retrieval

Introduction Several solutions : – Integrate monolingual statistical translation models in the retrieval process (1999) Drawback: limited availability of truly parallel monolingual corpora Training data often consist in question-answer pairs and usually extracted from the evaluation corpus itself

Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

Related Work Statistical translation models for retrieval Built synthetic training data Train translation models on Q&A pairs – Answers -> source language – Questions -> target language Select the most important terms to build compact translation models

Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

Parallel Datasets Different data resources: (1)Manually-tagged question reformulations and question-answer pairs from the WikiAnswers social Q&A site (2) Glosses from WordNet, Wiktionary, Wikipedia and Simple Wikipedia

Parallel Datasets (1) Manually-tagged question reformulations and question-answer pairs From social Q&A sites: WikiAnswers (WA) – Question-Answer Pairs (WAQA) – Question Reformulations (WAQ) [URL]

Parallel Datasets (2) Glosses from WordNet, Wiktionary, Wikipedia and Simple Wikipedia Lexical Semantic Resources (LSR) – Word sense alignment Example !

Parallel Datasets Example: “moon” – Wordnet (sense 1): The natural satellite of the Earth. – English Wiktionary: The Moon, the satellite of planet Earth. – English Wikipedia: The Moon (Latin: Luna) is Earth’s only natural satellite and the fifth largest natural satellite in the Solar System.

Parallel Datasets Three datasets: Question-Answer Pairs (WAQA) 1,227,362 parallel pairs Question Reformulations (WAQ) 4,379,620 parallel pairs Lexical Semantic Resources (LSR) 397,136 pairs

Parallel Datasets Translation Model Training – Pre-processing steps – GIZA++ SMT Toolkit -> word-to-word translation probabilities – IBM translation model 1

Parallel Datasets Combination of the datasets – Lin (combination of models after training) – Pool (concatenating the corpora before training)

Parallel Datasets

Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

Semantic Relatedness Experiments Goal: Word translation probabilities vs. Concept vector based measure Concept vector based measure relying on Explicit Semantic Analysis (Gabrilovich and Markovitch, 2007) Compare with traditional semantic relatedness measures

Semantic Relatedness Experiments

Testing data set: 353 word-to-word pairs – Created by Finkelstein et al. (2002) – Fin1-153: 153 pairs – Fin2-200: 200 pairs

Semantic Relatedness Experiments Testing data set: 353 word-to-word pairs – Created by Finkelstein et al. (2002) – Fin1-153: 153 pairs – Fin2-200: 200 pairs

Semantic Relatedness Experiments Use Spearman’s Rank Correlation Coefficients (-1, 0, +1) [URL]

Semantic Relatedness Experiments Use Spearman’s Rank Correlation Coefficients (-1, 0, +1) [URL]

Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

Answer Finding Experiments Goal: provide an extrinsic evaluation of the translation probabilities by employing them in an answer finding task. Using a ranking function to perform retrieval

Answer Finding Experiments Ranking function (β = 0.8, λ = 0.5)

Answer Finding Experiments Ranking function (β = 0.8, λ = 0.5)

Answer Finding Experiments Ranking function (β = 0.8, λ = 0.5) Query likelihood model Translation model

Answer Finding Experiments Testing data: Microsoft Research QA Corpus 1,364 questions, 9,780 answers 5 levels of relevance judgements: 0: No Judgement Made 1: Extract Answers 3: Off Topic 4: On Topic, Off Target 5: Partial Answer

Answer Finding Experiments Testing data: Microsoft Research QA Corpus 1,364 questions, 9,780 answers 5 levels of relevance judgements: 0: No Judgement Made 1: Extract Answers 3: Off Topic 4: On Topic, Off Target 5: Partial Answer

Answer Finding Experiments

Mean Average Precision (MAP) Mean R-Precision (R-prec) Baselines: – Query likelihood model (QLM) ---> β = 0 – Lucene Query likelihood model Translation model

Answer Finding Experiments

Introduction Related Work Parallel Datasets Semantic Relatedness Experiments Answer Finding Experiments Conclusion

Propose new kinds of datasets for training Provide the first intrinsic evaluation of word translation probabilities with respect to human relatedness rankings for reference word pairs Models based on translation probabilities for answer finding

Thank you !