Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR.

Slides:

Advertisements

Similar presentations

European Commission DG Research SMcL Brussels SME-NCP 23 October 2002 THE 6th FRAMEWORK PROGRAMME Economic & Technological Intelligence S. McLaughlin.

Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China

Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.

Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim

Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi.

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

The Challenges of Multilingual Search Paul Clough The Information School University of Sheffield ISKO UK conference 8-9 July 2013.

Search Engines and Information Retrieval

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

Evaluating the Performance of IR Sytems

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Cross-Language Retrieval INST 734 Module 11 Doug Oard.

Training of National Judges INFO DAY Introduction to the new Call for Proposals 2014 Raffaella Battella - DG Competition.

Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Constructing Bilingual Resources for Digital Libraries Rim, Hae-Chang Korea University

The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,

August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.

Search Engines and Information Retrieval Chapter 1.

Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.

LREC 2008 From Research to Application in Multilingual Information Access: The Contribution of Evaluation Carol Peters ISTI-CNR, Pisa, Italy.

1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.

Company LOGO NELLIP Network of Language Label Projects and Initiatives Intercultural Horizons Conference, Siena, 7 October 2013.

Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.

1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.

DELOS NoE DELOS Network of Excellence on Digital Libraries Vittore Casarosa CNR-IEI, Pisa, Italy.

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

November 8, Global Competitive Internet Usage Forecasting Across Countries and Languages June Wei Department of Management/MIS College of Business.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.

“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.

Cross-Language Evaluation Forum CLEF Workshop 2004 Carol Peters ISTI-CNR, Pisa, Italy.

The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.

The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield

Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.

Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept.

MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.

1 SMEs – a priority for FP6 Barend Verachtert DG Research Unit B3 - Research and SMEs.

Saskia Sluiter and Erna Gille (CITO, The Netherlands) 3 June 2005 EALTA conference Voss EBAFLS : Building a European Bank of Anchor items for Foreign Language.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

UNED at iCLEF 2008: Analysis of a large log of multilingual image searches in Flickr Victor Peinado, Javier Artiles, Julio Gonzalo and Fernando López-Ostenero.

1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.

How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

Cross-Language Evaluation Forum CLEF 2003 Carol Peters ISTI-CNR, Pisa, Italy Martin Braschler Eurospider Information Technology AG.

1 Flexible and Efficient Toolbox for Information Retrieval MIRACLE group José Miguel Goñi-Menoyo (UPM) José Carlos González-Cristóbal (UPM-Daedalus) Julio.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.

NCSR “Demokritos” Institute of Informatics & Telecommunications CROSSMARC CROSS-lingual Multi Agent Retail Comparison Costas Spyropoulos & Vangelis Karkaletsis.

Performance Measurement. 2 Testing Environment.

Information Retrieval

Stiftung Wissenschaft und Politik German Institute for International and Security Affairs CLEF 2005: Domain-Specific Track Overview Michael Kluck SWP,

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

What’s happening in iCLEF? (the iCLEF Flickr Challenge) Julio Gonzalo (UNED), Paul Clough (U. Sheffield), Jussi Karlgren (SICS), Javier Artiles (UNED),

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

1 CS 430 / INFO 430 Information Retrieval Lecture 9 Evaluation of Retrieval Effectiveness 2.

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.

Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim

CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.

Multilingual Search Shibamouli Lahiri

1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,

Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,

CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.

From CLEF to TrebleCLEF Promoting Technology Transfer

IR Theory: Evaluation Methods

Presentation transcript:

Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen October 2001 Outline  Why IR System Evaluation is Important  Evaluation programs  An Example

SPINN Seminar, Copenhagen October 2001 What is an IR System Evaluation Campaign?  An activity which tests the performance of different systems on a given task (or set of tasks) under standard conditions  Permits contrastive analysis of approaches/technologies

SPINN Seminar, Copenhagen October 2001 How well does system meet information need?  System evaluation: how good are document rankings?  User-based evaluation: how satisfied is the user?

SPINN Seminar, Copenhagen October 2001 Why we need Evaluation  evaluation permits hypotheses to be validated and progress assessed  evaluation helps to identify areas where more R&D is needed  evaluation saves developers time and money CLIR systems are still in experimental stage Evaluation is particularly important!

SPINN Seminar, Copenhagen October 2001 CLIR System Evaluation is Complex CLIR systems consist of integration of components and technologies  need to evaluate single components  need to evaluate overall system performance  need to distinguish methodological aspects from linguistic knowledge

SPINN Seminar, Copenhagen October 2001 Technology vs. Usage Evaluation Usage Evaluation:  shows value of a technology for user  determines the technology thresholds that are indispensable for specific usage  provides directions for choice of criteria for technology evaluation Influence of language and culture on usability of technology needs to be understood

SPINN Seminar, Copenhagen October 2001 Organising an Evaluation Activity  select control task(s)  provide data to test and tune systems  define protocol and metrics to be used in results assessment Aim is an objective comparison between systems and approaches

SPINN Seminar, Copenhagen October 2001 Test Collection  Set of documents - must be representative of task of interest; must be large  Set of “topics” - statement of user needs from which system data structure (query) is extracted  Relevance judgments – judgments vary by assessor but no evidence that differences affect comparative evaluation of systems

SPINN Seminar, Copenhagen October 2001 Using Pooling to Create Large Test Collections Assessors create topics. Systems are evaluated using relevance judgments. Form pools of unique documents from all submissions which the assessors judge for relevance. A variety of different systems retrieve the top 1000 documents for each topic. Ellen Voorhees – CLEF 2001 Workshop

SPINN Seminar, Copenhagen October 2001 Cross-language Test Collections Consistency harder to obtain than for monolingual  parallel or comparable document collections  multiple assessors per topic creation and relevance assessment (for each language)  must take care when comparing different language evaluations (e.g., cross run to mono baseline) Pooling harder to coordinate  need to have large, diverse pools for all languages  retrieval results are not balanced across languages Taken from Ellen Voorhees – CLEF 2001 Workshop

SPINN Seminar, Copenhagen October 2001 Evaluation Measures  Recall: measures ability of system to find all relevant items recall =  Precision: measures ability of system to find only relevant items precision = no. of rel. items retrieved no. of rel. items in collection no. of rel. items retrieved total no. of items retrieved Recall-Precision Graph is used to compare systems

SPINN Seminar, Copenhagen October 2001 Main CLIR Evaluation Programs  TIDES: sponsors TREC (Text REtrieval Conferences) and TDT (Topic Detection and Tracking) - Chinese-English tracks in 2000; TREC focussing on English/French - Arabic in 2001  NTCIR: Nat.Inst. for Informatics, Tokyo. Chinese- English; Japanese-English C-L tracks  AMARYLLIS: focused on French; campaign included C-L track; 3rd campaign begins Sept.01  CLEF: Cross Language Evaluation Forum - C-L evaluation for European languages

SPINN Seminar, Copenhagen October 2001 Cross-Language Evaluation Forum  Funded by DELOS Network of Excellence for Digital libraries and US National Institute for Standards and Technology ( )  Extension of CLIR track at TREC ( )  Coordination is distributed - national sites for each language in multilingual collection

SPINN Seminar, Copenhagen October 2001 CLEF Partners ( )  Eurospider, Zurich, Switzerland (Peter Sch ä uble, Martin Braschler)  IEEC-UNED, Madrid, Spain (Felisa Verdejo, Julio Gonzalo)  IEI-CNR, Pisa, Italy (Carol Peters)  IZ Sozialwissenschaften, Bonn, Germany (Michael Kluck)  NIST, Gaithersburg MD, USA (Donna Harman, Ellen Voorhees)  University of Hildesheim, Germany (Christa Womser- Hacker)  University of Twente, The Netherlands (Djoerd Hiemstra)

SPINN Seminar, Copenhagen October 2001 CLEF - Main Goals Promote research by providing an appropriate infrastructure for:  CLIR system evaluation, testing and tuning  comparison and discussion of results  building of test-suites for system developers

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Task Description Four main evaluation tracks in CLEF 2001:  multilingual information retrieval  bilingual IR  monolingual (non-English) IR  domain-specific IR plus  experimental track for interactive C-L systems

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Data Collection  Multilingual comparable corpus of news agencies and newspaper documents for six languages (DE,EN,FR,IT,NL,SP). Nearly 1 million documents  Common set of 50 topics (from which queries are extracted) created in 9 European languages (DE,EN,FR,IT,NL,SP+FI,RU,SV) and 3 Asian languages (JP,TH,ZH)

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Creating the Queries  Title: European Industry  Description: What factors damage the competitiveness of European industry on the world's markets?  Narrative: Relevant documents discuss factors that render European industry and manufactured goods less competitive with respect to the rest of the world, e.g. North America or Asia. Relevant documents must report data for Europe as a whole rather than for single European nations. Queries are extracted from topics: 1 or more fields

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Creating the Queries  Distributed activity (Bonn, Gaithersburg, Pisa, Hildesheim, Twente, Madrid)  Each group produced queries (topics), 1/3 local, 1/3 European, 1/3 international  Topic selection at meeting in Pisa (50 topics)  Topics were created in DE, EN,FR,IT,NL,SP and additionally translated to SV,RU,FI and TH,JP,ZH  Cleanup after topic translation

SPINN Seminar, Copenhagen October 2001 Topics either DE,EN,FR,IT FI,NL,SP,SV, RU,ZH,JP,TH English GermanFrenchItalian Participant’s Cross-Language Information Retrieval System documents CLEF 2001 Multilingual IR One result list of DE, EN, FR,IT and SP documents ranked in decreasing order of estimated relevance Spanish

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Bilingual IR Task: query English or Dutch target document collections Goal: retrieve documents for target language, listing results in ranked list Easier task for beginners !

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Monolingual IR Task: querying document collections in FR|DE|IT|NL|SP Goal: acquire better understanding of language- dependent retrieval problems  different languages present different retrieval problems  issues involved include word order, morphology, diacritic characters, language variants

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Domain-Specific IR Task: querying a structured database from a vertical domain (social sciences) in German  German/English/Russian thesaurus and English translations of document titles  Monolingual or cross-language task Goal: understand implications of querying in domain-specific context

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Interactive C-L Task: interactive document selection in an “unknown” target language Goal: evaluation of results presentation rather than system performance

SPINN Seminar, Copenhagen October 2001 CLEF 2001: Participation N.AmericaAsia Europe 34 participants, 15 different countries

SPINN Seminar, Copenhagen October 2001 Details of Experiments Track# Participants# Runs/Experiments Multilingual826 Bilingual to EN1961 Bilingual to NL33 Monolingual DE1225 Monolingual ES1022 Monolingual FR918 Monolingual IT814 Monolingual NL919 Domain-specific14 Interactive36

SPINN Seminar, Copenhagen October 2001 Runs per Topic Language

SPINN Seminar, Copenhagen October 2001 Topic Fields

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Participation  CMU  Eidetica  Eurospider *  Greenwich U  HKUST  Hummingbird  IAI *  IRIT *  ITC-irst *  JHU-APL *  Kasetsart U  KCSL Inc.  Medialab  Nara Inst. of Tech.  National Taiwan U  OCE Tech. BV  SICS/Conexor  SINAI/U Jaen  Thomson Legal *  TNO TPD *  U Alicante  U Amsterdam  U Exeter  U Glasgow *  U Maryland * (interactive only)  U Montreal/RALI *  U Neuchâtel  U Salamanca *  U Sheffield * (interactive only)  U Tampere *  U Twente (*)  UC Berkeley (2 groups) *  UNED (interactive only) (* = also participated in 2000)

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Approaches All traditional approaches used:  commercial MT systems (Systran, Babelfish, Globalink Power Translator, )  both query and document translation tried  bilingual dictionary look-up (on-line and in-house tools)  aligned parallel corpora (web-derived)  comparable corpora (similarity thesaurus)  conceptual networks (Eurowordnet, ZH-EN wordnet)  multilingual thesaurus (domain-specific task)

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Techniques Tested Text processing for multiple languages:  Porter stemmer, Inxight commercial stemmer, on-site tools – simple generic “quick&dirty” stemming – language independent stemming  separate stopword lists vs single list  morphological analysis  n-gram indexing, word segmentation, decompounding (e.g. Chinese, German)  use of NLP methods, e.g. phrase identification, morphosyntactic analysis

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Techniques Tested Cross-language strategies included:  integration of methods (MT, corpora and MRDs)  pivot language to translate from L1 -> L2 (DE -> FR,SP,IT via EN)  N-gram based technique to match untranslatable words  prior and post-translation pseudo-relevance feedback (query expanded by associating frequent cooccurrences)  vector-based semantic analysis (query expanded by associating semantically similar terms)

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Techniques Tested  Different strategies experimented for results merging  This remains still an unsolved problem

SPINN Seminar, Copenhagen October 2001 CLEF 2001 Workshop  Results of CLEF 2001 campaign presented at Workshop, 3-4 September 2001, Darmstadt, Germany  50 researchers and system developers from academia and industry participated.  Working Notes containing preliminary reports and statistics on CLEF2001 experiments distributed.

SPINN Seminar, Copenhagen October 2001 CLEF-2001 vs. CLEF-2000  Most participants were back  Less MT  More Corpus-Based  People really start to try each other’s ideas/methods: corpus-based approaches (parallel web, alignments) n-grams combination approaches

SPINN Seminar, Copenhagen October 2001 “Effect” of CLEF  Many more European groups  Dramatic increase of work in stemming/decompounding (for languages other than English)  Work on mining the web for parallel texts  Work on merging (breakthrough still missing?)  Work on combination approaches

SPINN Seminar, Copenhagen October 2001 CLEF 2002 Accompanying Measure under IST programme: Contract No. IST October 2001 CLEF Consortium IEI-CNR, Pisa; ELRA/ELDA, Paris; Eurospider, Zurich; UNED, Madrid; NIST, USA; IZ Sozialwissenschaften, Bonn Associated Members University of Hildesheim, University of Twente, University of Tampere (?)

SPINN Seminar, Copenhagen October 2001 CLEF 2002 Task Description Similar to CLEF 2001:  multilingual information retrieval  bilingual IR (not to English!)  monolingual (non-English) IR  domain-specific IR  interactive track Plus feasibility study for spoken document track (within DELOS – results reported at CLEF) Possible cooordination with Amaryllis

SPINN Seminar, Copenhagen October 2001 CLEF 2002 Schedule  Call for Participation - November 2001  Document release – 1 February 2002  Topic Release – 1 April 2002  Runs received - 15 June 2002  Results communicated – 1 August 2002  Paper for Working Notes - 1 September 2002  Workshop September

SPINN Seminar, Copenhagen October 2001 Evaluation - Summing up  system evaluation is not a competition to find the best  evaluation provides opportunity to test, tune, and compare approaches in order to improve system performance  an evaluation campaign creates a community interested in examining the same issues and comparing ideas and experiences

SPINN Seminar, Copenhagen October 2001 Cross-Language Evaluation Forum For further information see: or contact: Carol Peters - IEI-CNR