Browsing by phrases: terminological information in interactive multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto.

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

Distinción semántica de compuestos léxicos en Recuperación de Información Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos,
Evaluating Hierarchical Clustering of Search Results Departamento de Lenguajes y Sistemas Informáticos UNED, Spain Juan Cigarrán Anselmo Peñas Julio Gonzalo.
Terminology Retrieval: towards a synergy between thesaurus and free text searching Anselmo Peñas, Felisa Verdejo and Julio Gonzalo Dpto. Lenguajes y Sistemas.
Corpus-based Terminology Extraction applied to Information Access Anselmo Peñas, Felisa Verdejo and Julio Gonzalo NLP Group, Dpto. Lenguajes y Sistemas.
Bogdan Vrusias © 2003 Scene of Crime Information System: Playing at St. Andrews 22nd August 2003 Bogdan Vrusias, Mariam Tariq, Lee Gillam Department of.
La indexación con técnicas lingüísticas en el modelo clásico de Recuperación de Información Julio Gonzalo, Anselmo Peñas y Felisa Verdejo Grupo de Procesamiento.
Automatic indexing and retrieval of crime-scene photographs Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield Scene of.
Website Term Browser Un sistema interactivo y multilingüe de búsqueda textual basado en técnicas lingüísticas Anselmo Peñas Padilla Directores Julio Gonzalo.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.
WMES3103 : INFORMATION RETRIEVAL
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Advance Information Retrieval Topics Hassan Bashiri.
Digital Library Service Integration (DLSI) --> Looking for Collections and Services to be DLSI Testbeds
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
With or without users? Julio Gonzalo UNEDhttp://nlp.uned.es.
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Practical approaches to standardizing vocabularies: the Cultural Heritage experience. Phil Carlisle English Heritage National Monuments Record and European.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
1 Query Operations Relevance Feedback & Query Expansion.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
Information Retrieval
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Using Semantic Relations to Improve Information Retrieval
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Search Engine Architecture
Irion Technologies (c)
Finding Out About I (Belew)
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Search Engine Architecture
Presentation transcript:

Browsing by phrases: terminological information in interactive multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto. Lenguajes y Sistemas Informáticos, Distance Learning University of Spain (UNED) Joint Conference on Digital Libraries 2001 Roanoke, VA

Goals to bridge the gap between users’ vocabulary and collection terminology even cross-language without needs of thesauri construction robust and efficient integration of NLP resources and tools Semantic network: EuroWordNet Tokeniser Morphological analyser POS tagger Shallow parser

Approach Perform Automatic Terminology Extraction to provide: –At indexing time: Criteria to add to the index a controlled set of phrases –At query time: Term browsing, to navigate through the terminology and access the documents from complex terms

Approach The task: To retrieve terminology –Lexical compounds are retrieved from mono-lexical terms Requires –A phrase indexing level –Query expansion –Query translation Phrasal information is used to reduce noise when expanding and translating (co-occurrence of words in the same phrase) LemmaDocument Phrase

Terminology Extraction and Indexing Processing Tokenising, Lemmatising,Tagging Shallow parsing (Syntactic pattern recognition) Results Terminological phrases for each language Term frequency Document frequency Component lemmas Patterns for Spanish and Catalan N N A N [A] Prep N [A] N [A] Prep Art N [A] N [A] Prep V N [A] Prep V N [A] Patterns for English A N [N] N N [N] A A N N A N N Prep N

Query Expansion and Translation Prohibición embargo entredicho interdicción interdicto proscripción ban interdiction prohibition proscription Pruebas cata, catadura degustación ensayo escandallo experimento gustación muestreo, tanteo demonstrate establish, exhibit experiment experimentation fall, fitting indicate, point present, proof prove, run sample, sampling shew,show, taste test, trial, try de Nucleares nuclear de Nuclear fitting interdiction manage? Nuclear taste proscription process? Expansion Translation Tratados acuerdo capitulación concertación convenio cuidar, pacto manejar procesar accord discourse handle manage pact process treat treatise treaty

Query in Spanish Hierarchy of terms Catalan English Spanish Ranking of documents

QUERY RECONSULT WITH PHRASE EXPLORE PHRASE EXPLORE DOCUMENT

Evaluation All queries 1 word queries >1 word queries First actionDOC 40.70% 45.49% 37.30% after QUERYPHRASE 51.14% 45.65% 55.05% RECONSULT 8.141% 8.846% 7.640% Last action before finishing QUERY 48.74% 53.38% 45.15% the session with PHRASE 42.95% 40.85% 44.57% explore DOC RECONSULT 8.306% 5.764% 10.27% 1523 sessions with interaction an average of 5.11 actions per session explore phrase is used in 65.13%

Conclusion s Development of a search engine based on terminology extraction –Using terminological phrases in an intermediate way between free-searching and thesaurus-guided searching –Without needs of thesaurus construction –Bridging the distance between the terms used in the query and the terminology used in the collection (even in different languages) Users appreciate phrasal information for document selection –Phrases give higher expectations of relevance than Google’s ranking –WTB phrasal information can substantially complement the document ranking provided by the search engines