Cross-Language Retrieval INST 734 Module 11 Doug Oard.

Slides:



Advertisements
Similar presentations
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Advertisements

Cross-Language Retrieval INST 734 Module 11 Doug Oard.
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda  Ranked retrieval Similarity-based ranking Probability-based ranking.
The Challenges of Multilingual Search Paul Clough The Information School University of Sheffield ISKO UK conference 8-9 July 2013.
Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information.
Information Retrieval Review
Cross-Language Retrieval LBSC 796/CMSC 828o Douglas W. Oard and Jianqiang Wang Session 10, April 5, 2004.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Cross Language IR Philip Resnik Salim Roukos Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12,
Cross-Lingual IR Salim Roukos IBM T. J. Watson Research Center 9/11/02.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
Advance Information Retrieval Topics Hassan Bashiri.
Information Access Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies Design Understanding.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Cross-Language Retrieval LBSC 796/INFM 718R Douglas W. Oard Session 12: November 26, 2007.
Collaborative Cross-Language Search Douglas W. Oard University of Maryland, College Park May 14, 2015SICS Workshop.
Overview of Search Engines
Evidence from Content INST 734 Module 2 Doug Oard.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
LREC Combining Multiple Models for Speech Information Retrieval Muath Alzghool and Diana Inkpen University of Ottawa Canada.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
1 Cross Language Information Retrieval (CLIR) Modern Information Retrieval Sharif University of Technology Fall 2005 Mohsen Jamali.
A Brief Survey on Cross-language Information Retrieval (CLIR) - Text Retrieval Perspective by Ying Alvarado ( ) CSE 8337 Lecturer : Dr. Margaret.
Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR.
Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng The Chinese.
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
1 Maintaining the momentum of OpenSearch in Earth Science data discovery Doug Newman (NASA ECHO) & Dr Chris Lynnes (GES DISC) 12/11/13 10:50am PT IN32A-03.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
IAEA International Atomic Energy Agency Agenda item 2.2 INIS Collection Search 13th INIS/ETDE Joint Technical Committee Meeting October 2011, Vienna.
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
A merging strategy proposal: The 2-step retrieval status value method Fernando Mart´inez-Santiago · L. Alfonso Ure ˜na-L´opez · Maite Mart´in-Valdivia.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
UNED at iCLEF 2008: Analysis of a large log of multilingual image searches in Flickr Victor Peinado, Javier Artiles, Julio Gonzalo and Fernando López-Ostenero.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling  Web search.
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Jane Reid, AMSc IRIC, QMUL, 30/10/01 1 Information seeking Information-seeking models Search strategies Search tactics.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Evidence from Content INST 734 Module 2 Doug Oard.
Evidence from Metadata INST 734 Doug Oard Module 8.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Multilingual Search Shibamouli Lahiri
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Web Search Module 6 INST 734 Doug Oard. Agenda  The Web Crawling Web search.
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval Representation  Retrieval Thanks for David Doermann for most of these.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Multilingual Information Access in a Digital Library
Cross Language Information Retrieval (CLIR)
Information Retrieval and Web Design
Recuperação de Informação
Introduction to Search Engines
Presentation transcript:

Cross-Language Retrieval INST 734 Module 11 Doug Oard

Agenda  CLIR Dictionary-Based CLIR Corpus-Based CLIR Interactive CLIR

Source: Ethnologue (1999) Source: International Monetary Fund (2014)

Multilingual Information Access Multilingual document –Document containing more than one language Multilingual collection –Collection of documents in different languages Multilingual IR system –Can retrieve from a multilingual collection Cross-language IR (CLIR) system –Query in one language finds document in another

Who needs Cross-Language IR? Polyglots: users who can read >1 language –Convenience:build a good query just once –Capability: query in most fluent language Monolingual users –If translations can be provided –If text is used to search for images, music, … –If it suffices to know that a document exists

One Approach: Multilingual Thesaurus Build a cross-cultural knowledge structure –Build it from scratch –Translate an existing thesaurus –Merge monolingual thesauri Assign descriptors to each content item –By design, descriptors are “interlingual” Create “lead-in vocabulary” in each language

Another Approach: Free-Text CLIR Language Identification English Term Selection Chinese Term Selection Cross- Language Retrieval Monolingual Chinese Retrieval 3: : : : : 0.48 Chinese Query Chinese Term Selection

Evidence for Language Identification Metadata –Included in HTTP and HTML Word-scale features –Which stopword list gets the most hits? Subword features –Character n-gram statistics

Merging Ranked Lists Types of Evidence –Rank –Score Evidence Combination –Weighted round robin –Score combination Parameter tuning –Condition-based –Query-based 1 EN EN EN … 1000 DE DE DE DE … 1000 DE DE EN DE2156 … 1000 EN4201

Query-Language CLIR English queries Chinese Document Collection Retrieval Engine Translation System English Document Collection Results select examine

Example (Modular) Document Translation Select a single query language Translate every document into that language Perform monolingual retrieval

Document-Language CLIR Retrieval Engine Translation System Chinese queries Chinese documents Results English queries select examine Chinese Document Collection

Which Approach to Use? “Document translation” (query-language CLIR) –Good choice when all queries are in one language –Cached translations can support user interaction “Query translation” (document-language CLIR) –Good choice when all documents are in one language –Commonly used for CLIR experiments

Agenda CLIR  Dictionary-Based CLIR Corpus-Based CLIR Interactive CLIR