“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

Improved TF-IDF Ranker
Semantic Access to Data from the Web Raquel Trillo *, Laura Po +, Sergio Ilarri *, Sonia Bergamaschi + and E. Mena * 1st International Workshop on Interoperability.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
Advance Information Retrieval Topics Hassan Bashiri.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Query Processing: Query Formulation Ling573 NLP Systems and Applications April 14, 2011.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
A Language Independent Method for Question Classification COLING 2004.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.
Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Overview of Statistical NLP IR Group Meeting March 7, 2006.
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
SENSEVAL: Evaluating WSD Systems
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Information Retrieval and Web Search
Information Retrieval and Web Search
ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:
WordNet WordNet, WSD.
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
A method for WSD on Unrestricted Text
CSE 635 Multimedia Information Retrieval
CS246: Information Retrieval
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
What is the Entrance Exams Task
Information Retrieval and Web Search
Presentation transcript:

“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E. Noguera, S. Vázquez, F. Llopis and A. Montoyo Natural Language Research Group (GPLSI) University of Alicante

Index 1. Introduction. 2. Question translation, disambiguation and expansion. 3. Baseline system: Passage Retrieval system (IR-n) improved with Name Entities Recognition. 4. Experimental system: A Q&A system based on syntactic-semantic similarity. 5. Results. 6. Conclusions and future work.

Introduction In order to decide about the correctness of an answer shown by an iQ&A system, the sources of information used by an user are: the context in which the (possible) answer appears, (previous) knowledge about the topic, the question itself. The context is the main source of information available for the user. According to the information provided by the context, he/she decides if the answer is the correct one or not, or if it is necessary a refinement of the question.

Introduction Problem: The language of the context is different from the language of the query and the language of the user. The user must deal with a language with null or passive knowledge about it. Two approaches to solve this problem: to translate the possible answer with its context to the language of the user with a Machine Translation system, or to look for other alternative methods of interaction. We are looking for alternative methods of interaction, avoiding the use of Machine Translation systems.

Objective of the experiment To know the optimum context size in an interactive cross-language QA framework. Baseline system shows a complete passage. Maximum context. It has been improved with a named entity recognition system. Experimental system shows only a clause. Minimum context Pilot version of a Q&A system based on syntactic- semantic similarity.

Objective of the experiment Secondary objectives: To know the usefulness of a WSD system based on Relevant Domains applied to question disambiguation. To develop a pilot evaluation of a Q&A system based on syntactic-semantic similarity (experimental system).

Index 1. Introduction. 2. Question translation, disambiguation and expansion. 3. Baseline system: Passage Retrieval system (IR-n) improved with Name Entities Recognition. 4. Experimental system: A QA system based on syntactic-semantic similarity. 5. Results. 6. Conclusions and future work.

Question translation, disambiguation and expansion The mother tongue of users is Spanish. The questions are written in Spanish The text (answers) are written in English. Users have passive knowledge of English: they can understand some words/sentences in English, but they can’t formulate a question in English correctly.

Question translation The questions have been translated to English with three machine translation system available on the web: Systran Babelfish, Reverso Soft., Google. We have selected the common words to two or three translation. If there isn’t any common word between the three translations, we have selected all words obtained.

Question disambiguation To obtain the correct sense of each word selected. WSD method Relevant Domains. Unsupervised method Relevant Domains are obtained from WN Domains (Magnini & Cavaglia 2000). Domains associated more frequently with a word The system compares context vector and sense vector: Context vector: representative domains of the context words (in the question) Sense vector: domains related with each sense of polysemic words (obtained from the glosses).

Question expansion Once we have obtained the correct sense of each word we intend to expand each question with a list of synonyms. We have only one sense per word. The list of synonyms is obtained from WordNet synset.

Index 1. Introduction. 2. Question translation, disambiguation and expansion. 3. Baseline system: Passage Retrieval system (IR-n) improved with Name Entities Recognition. 4. Experimental system: A QA system based on syntactic-semantic similarity. 5. Results. 6. Conclusions and future work.

Baseline system IR-n system: an passage retrieval system. (See poster) The system shows users the passage (in English) with a possible answer. If the correct answer doesn’t appear in the first passage, the user checks the next one, up to 50 passages.

Baselines system + NE system (I) In order to improve the interaction, baseline system uses DRAMNERI, a Named Entity recognition system (Toral et al 2005). Based on rules and gazetteers. All entities in the passage similar to the type of entity looking for in the question are shown in different color. Users can change the kind of entity, if it is not correct. Questions words that appears in the passages are shown in different color too.

Baseline system + NE system (II)

Baselines system + synonyms (I) In order to improve the retrieval process, users can refine the question with the set of synonyms extracted during question disambiguation. In any case, if user want, they can see the whole document.

Baseline system + synonyms (II)

Index 1. Introduction. 2. Question translation, disambiguation and expansion. 3. Baseline system: Passage Retrieval system (IR-n) improved with Name Entities Recognition. 4. Experimental system: A QA system based on syntactic-semantic similarity. 5. Results. 6. Conclusions and future work.

Experimental system Pilot QA system based on syntactic-semantic similarity. The system shows users only minimum context: a clause (in English) with the possible answer. Set of words related with a verb in a sentence. If the correct answer doesn’t appear in the first clause, the user checks the next one, up to 250 clauses.

Hypothesis Deep semantic relation between a question and its answer. Question is a clause (or more if it is a complex question) Answer appears in a clause. Objective: To calculate the syntactic-semantic similarity between the question and the clauses in which possible answer appears.

Syntactic-semantic patters Both question and possible answers are formally represented as a syntactic-semantic patterns. A syntactic-semantic pattern is the subcategorization frame of a verb: A verb: lemma + sense Arguments and adjuncts: head noun (lemma) and it sense(s). SS patterns are extracted from passages returned by IR-n. They are processed with a Pos-tagger (Tree-tagger, Schmidt 94) and a syntactic parser (SUPAR, Palomar et al 99) Senses are extracted from EuroWordNet (Vossen 98)

Process QA system calculates syntactic semantic similarity between question pattern and all possible answer patterns. The patter with high syntactic semantic similarity with the question represents the clause with the correct answer.

Process. Step 1 First of all, a filter of proper nouns is applied. Hypothesis:if a proper noun appears in the question, it must appear in the answer. User needs this information to decide about the correctness of the clause. At least, a proper noun of the question must appear in the answer.

Process. A syntactic-semantic measure of similarity is applied. Sim(Pq, Pa)=2(SimVpq, Vpa) + (NumAqa+NumPNqa)∕2 where: SimVpq, Vpa is the semantic similarity between each verb Based on semantic similarity of (Lin 98) (NumAqa+NumPNqa) represents the number of equal arguments: Equal lemma of head nouns and equal proper names.

Interaction The clauses selected are showed to the user from the most similar to the last one. Users must select the clause with the correct answer.

Interaction

Index 1. Introduction. 2. Question translation, disambiguation and expansion. 3. Baseline system: Passage Retrieval system (IR-n) improved with Name Entities Recognition. 4. Experimental system: A QA system based on syntactic-semantic similarity. 5. Results. 6. Conclusions and future work.

General results

User by user (lenient)

User by user (strict)

Time consuming

NE and Synonyms All user said that the information about names entity was useful to locate the correct answer. However, users didn’t use synonyms and the expansion of the query.

Index 1. Introduction. 2. Question translation, disambiguation and expansion. 3. Baseline system: Passage Retrieval system (IR-n) improved with Name Entities Recognition. 4. Experimental system: A QA system based on syntactic-semantic similarity. 5. Results. 6. Conclusions and future work.

Conclusions It is difficult to establish a fixed context size for an optimum interaction in iQ&A. In general, it is better wide context. However, for users with poor knowledge of the language of the answer it is more useful and fast interact with sort context.

Future work To improve the patter extraction To refine the syntactic-semantic measure of similarity. To apply semantic parser (semantic roles) in order to detect the correct answer.