Presentation is loading. Please wait.

Presentation is loading. Please wait.

Browsing by phrases: terminological information in interactive multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto.

Similar presentations


Presentation on theme: "Browsing by phrases: terminological information in interactive multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto."— Presentation transcript:

1 Browsing by phrases: terminological information in interactive multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto. Lenguajes y Sistemas Informáticos, Distance Learning University of Spain (UNED) Joint Conference on Digital Libraries 2001 Roanoke, VA

2 Goals to bridge the gap between users’ vocabulary and collection terminology even cross-language without needs of thesauri construction robust and efficient integration of NLP resources and tools Semantic network: EuroWordNet Tokeniser Morphological analyser POS tagger Shallow parser

3 Approach Perform Automatic Terminology Extraction to provide: –At indexing time: Criteria to add to the index a controlled set of phrases –At query time: Term browsing, to navigate through the terminology and access the documents from complex terms

4 Approach The task: To retrieve terminology –Lexical compounds are retrieved from mono-lexical terms Requires –A phrase indexing level –Query expansion –Query translation Phrasal information is used to reduce noise when expanding and translating (co-occurrence of words in the same phrase) LemmaDocument Phrase

5 Terminology Extraction and Indexing Processing Tokenising, Lemmatising,Tagging Shallow parsing (Syntactic pattern recognition) Results Terminological phrases for each language Term frequency Document frequency Component lemmas Patterns for Spanish and Catalan N N A N [A] Prep N [A] N [A] Prep Art N [A] N [A] Prep V N [A] Prep V N [A] Patterns for English A N [N] N N [N] A A N N A N N Prep N

6 Query Expansion and Translation Prohibición embargo entredicho interdicción interdicto proscripción ban interdiction prohibition proscription Pruebas cata, catadura degustación ensayo escandallo experimento gustación muestreo, tanteo demonstrate establish, exhibit experiment experimentation fall, fitting indicate, point present, proof prove, run sample, sampling shew,show, taste test, trial, try de Nucleares nuclear de Nuclear fitting interdiction manage? Nuclear taste proscription process? Expansion Translation Tratados acuerdo capitulación concertación convenio cuidar, pacto manejar procesar accord discourse handle manage pact process treat treatise treaty

7 Query in Spanish Hierarchy of terms Catalan English Spanish Ranking of documents

8 QUERY RECONSULT WITH PHRASE EXPLORE PHRASE EXPLORE DOCUMENT

9 Evaluation All queries 1 word queries >1 word queries First actionDOC 40.70% 45.49% 37.30% after QUERYPHRASE 51.14% 45.65% 55.05% RECONSULT 8.141% 8.846% 7.640% Last action before finishing QUERY 48.74% 53.38% 45.15% the session with PHRASE 42.95% 40.85% 44.57% explore DOC RECONSULT 8.306% 5.764% 10.27% 1523 sessions with interaction an average of 5.11 actions per session explore phrase is used in 65.13%

10 Conclusion s Development of a search engine based on terminology extraction –Using terminological phrases in an intermediate way between free-searching and thesaurus-guided searching –Without needs of thesaurus construction –Bridging the distance between the terms used in the query and the terminology used in the collection (even in different languages) Users appreciate phrasal information for document selection –Phrases give higher expectations of relevance than Google’s ranking –WTB phrasal information can substantially complement the document ranking provided by the search engines


Download ppt "Browsing by phrases: terminological information in interactive multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto."

Similar presentations


Ads by Google