Multilingual experiments of CLEF 2003 Eija Airio, Heikki Keskustalo, Turid Hedlund, Ari Pirkola University of Tampere, Finland Department of Information.

Slides:



Advertisements
Similar presentations
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Advertisements

CLIR: opening up possibilities for indigenous languages in South Africa? Research team: Erica Cosijn1, Heikki Keskustalo2, Ari Pirkola2, Karen de Wet1.
Module 1 Dictionary skills Part 1
Final Project of Information Retrieval and Extraction by d 吳蕙如.
The XLDB Group at GeoCLEF 2005 Nuno Cardoso, Bruno Martins, Marcírio Chaves, Leonardo Andrade, Mário J. Silva
Inverted Indices. Inverted Files Definition: an inverted file is a word-oriented mechanism for indexing a text collection in order to speed up the searching.
Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde.
Evaluating the Performance of IR Sytems
Advance Information Retrieval Topics Hassan Bashiri.
Evaluation of Hindi→English, Marathi→English and English→Hindi CLIR at FIRE 2008 Nilesh Padariya, Manoj Chinnakotla, Ajay Nagesh and Om P. Damani Center.
Aparna Kulkarni Nachal Ramasamy Rashmi Havaldar N-grams to Process Hindi Queries.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
What is LANGUAGE?.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
Concept Unification of Terms in Different Languages for IR Qing Li, Sung-Hyon Myaeng (1), Yun Jin (2),Bo-yeong Kang (3) (1) Information & Communications.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Web Page Language Identification Based on URLs Reporter: 鄭志欣 Advisor: Hsing-Kuo Pao 1.
“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
College Admissions Testing: What You Need to Know.
A merging strategy proposal: The 2-step retrieval status value method Fernando Mart´inez-Santiago · L. Alfonso Ure ˜na-L´opez · Maite Mart´in-Valdivia.
14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.
UA in ImageCLEF 2005 Maximiliano Saiz Noeda. Index System  Indexing  Retrieval Image category classification  Building  Use Experiments and results.
Lucene-Demo Brian Nisonger. Intro No details about Implementation/Theory No details about Implementation/Theory See Treehouse Wiki- Lucene for additional.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
CLEF2003 Forum/ August 2003 / Trondheim / page 1 Report on CLEF-2003 ML4 experiments Extracting multilingual resources from corpora N. Cancedda, H. Dejean,
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
1 Flexible and Efficient Toolbox for Information Retrieval MIRACLE group José Miguel Goñi-Menoyo (UPM) José Carlos González-Cristóbal (UPM-Daedalus) Julio.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Departamento de Lenguajes y Sistemas Informáticos Cross-language experiments with IR-n system CLEF-2003.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Diana Inkpen, University of Ottawa, CLEF 2005 Using various indexing schemes and multiple translations in the CL-SR task at CLEF 2005 Diana Inkpen, Muath.
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
LECTURE 6 Natural Language Processing- Practical.
CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
1 SINAI at CLEF 2004: Using Machine Translation resources with mixed 2-step RSV merging algorithm Fernando Martínez Santiago Miguel Ángel García Cumbreras.
ERP OVERVIEW PRESENTED BY JAYA AGRAWAL. TABLE OF CONTENTS WHAT IS ERP HOW DO ERP SYSTEMS WORK ERP COMPONENTS AN ERP EXAMPLE: BEFORE ERP AN ERP EXMAPLE:
Multilingual Search using Query Translation and Collection Selection Jacques Savoy, Pierre-Yves Berger University of Neuchatel, Switzerland
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
Indexing & querying text
Information Retrieval in Practice
Token generation - stemming


Presentation transcript:

Multilingual experiments of CLEF 2003 Eija Airio, Heikki Keskustalo, Turid Hedlund, Ari Pirkola University of Tampere, Finland Department of Information Studies

Multilingual indexing two possibilities to create a common index for all the languages to create separate index for each language UTA followed the approach of separate indexes

Our result merging strategies in CLEF 2003 the raw score approach as a baseline the dataset size based method 185 German, 81 French, 99 Italian, 106 English, 285 Spanish, 120 Dutch, 35 Finnish and 89 Swedish documents (sum = 1000 docs) the score difference based method every score is compared with the best score of the topic only documents with the difference of scores under the predefined value are taken to the final list e.g. if the best score of the topics is , and the difference value is 0.08, we will take with a document with score , but not a document with score the final ordering (1000 docs / topic) is done by raw score merging strategy

Indexing methods inflected index dataset words are stored as such employed by www search engines normalized index stemming morphological analysis we applied normalized indexing in our CLEF 2003 runs

Word normalization methods stemming suitable for languages with weak morphology several stemming techniques we applied in CLEF 2003 mostly stemmers based on the Porter stemmer morphological analysis full description of inflectional morphology large lexicon of basic vocabulary suitable for languages with strong morphology

UTA applied both stemmers and morphological analyzers in multilingual runs of CLEF 2003 we built both stemmed and morhologically analyzed indexes for English, Finnish and Swedish for Dutch, French, German, Italian and Spanish we built stemmed indexes UTA indexes

The UTACLIR process each source word is normalized utilizing a morphological analyzer source stop words are removed each normalized source word is translated translated words are normalized (by a morphological analyzer or a stemmer, depending on the target language code) target stop words are removed if the source word is untranslatable, two highest ranked words obtained in n- gram-matching are selected as query words from the target index

Our results index typemerging strategy average precis. % difference % morph./stem.raw score 18.6 morph./stem.dataset size morph./stem.score diff/top morph./stem.round robin stemmeddataset size stemmedscore diff/top stemmedraw score stemmedround robin

The results of our additional monolingual English, bilingual English-Finnish and bilingual English-Swedish runs languageindex type average precis. % difference % Englishmorphol.anal Englishstemmed Finnishmorphol.anal Finnishstemmed Swedishmorphol.anal Swedishstemmed

Conclusions all the result merging strategies we applied produced almost equal results the performance did not vary depending on the index type in the multilingual task

Conclusions II the impact of different word normalization methods on IR performance has not been investigated properly our monolingual and bilingual tests show that stemming is an adequate normalization method for English, but not for Finnish and Swedish so far, morphological analysis seems to offer a hard baseline for competing methods (e.g., stemming) in Finnish and Swedish the reasons why stemming is not adequate for Finnish and Swedish may be different and should be investigated