Presentation is loading. Please wait.

Presentation is loading. Please wait.

SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.

Similar presentations


Presentation on theme: "SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer."— Presentation transcript:

1 SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer Science Department

2 Introduction Preliminary work of SINAI in GeoCLEF: –2006: query expansion using gazetteers and thesaurus [García-Vega et al., 2007] –2007: filtering documents based on manual rules [Perea-Ortega et al., 2007] GeoCLEF 2008: –Filtering documents using new manual rules and new approachs (query reformulation, keywords and hyponyms extraction, query geo-expansion) GeoCLEF 2008, Aarhus

3 Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview

4 Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview  Translates the queries from other languages into English  We have used SINTRAM (SINai TRAnslation Module) [García-Cumbreras et al., 2007]  It works with different online machine translators

5 Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview  Preprocessing: stemming, stopwords, POS  The toponyms are extracted (NER)  Two indexes are generated: Locations Keywords

6 Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview  Query Preprocessing: stemming, stopwords, removes irrelevant information  The toponyms are extracted (NER)  Spatial relations finder based on manual rules  Query reformulation based on POS tagging and query parsing subtask  Geo-expansion using a gazetteer  Keywords/Hyponyms detection

7 Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview  Lemur as index-search engine  Okapi with PRF as weighting function

8 Multilingual Query English collection IR Subsystem GeoNames Final Re-Ranked Documents retrieved TRANSLATOR QUERY ANALYZER English Query (Q) Q1 Q2 Q3 CollectionPreprocessingsubsystem GeoNames VALIDATOR Documents retrieved Keywords and geo-information extracted SINAI-GIR System overview  Filter the list of documents recovered by the IR subsystem, applying different manual rules and using the geographical data detected in the query  Re-rank the documents using predefined weights for each rule and the keywords/hyponyms detected in the query

9 Experiments description 15 experimentsSINAI has participated in mono and bilingual tasks with a total of 15 experiments: –MONO-EN: 9 experiments –BILI-X2EN: 6 experiments Combining the content of topic labels: TD or TDN BaselineBaseline: Q 1 without applying any filtering or re- ranking process Other experimentsOther experiments: –Filtering and re-ranking of the fusion list of the documents recovered by the Q 1, Q 2 and Q 3 –Using keywords and/or hyponyms in the re- ranking process GeoCLEF 2008, Aarhus

10 MONO-EN results GeoCLEF 2008, Aarhus baseline Best result: baseline (no filtering and no re-ranking) In some filtering experiments the use of keywords improves the results Best results using only the TD topic labels

11 BILI-X2EN results GeoCLEF 2008, Aarhus baseline Best result: baseline (no filtering and no re-ranking) with Portuguese topics Best results using only the TD topic labels

12 Conclusions The baseline experiment seems to work well because we include the geo-information in the retrieval process The filtering of documents does not seem to work well because we include the geo-information in the query and we are re-ranking documents which maybe are not relevant with respect to their content The use of keywords for re-ranking the documents retrieved could be interesting because in some experiments it improves the results obtained without using them Query reformulation could be also interesting because for some topics it retrieves valid documents which are not retrieved with the default query GeoCLEF 2008, Aarhus

13 TextMESS at GeoCLEF 2008 TextMESS projectSpanish TextMESS project (Intelligent, Interactive and Multilingual Text Mining based on Human Language Technologies): joint participation by the Polytechnic University of Valencia and University of Jaén (SINAI) merging algorithm based on fuzzy Borda voting scheme, taking as input the two document lists returned by both systemsMethod employed: merging algorithm based on fuzzy Borda voting scheme, taking as input the two document lists returned by both systems Second best result in the monolingual English task GeoCLEF 2008, Aarhus

14 Thank you GeoCLEF 2008, Aarhus sinai.ujaen.es

15 References –García-Vega, Manuel and García-Cumbreras, Miguel A. and Ureña- López, L.A. and Perea-Ortega, José M. GEOUJA System. The first participation of the University of Jaén at GEOCLEF In LNCS, volume 4730, pages Springer-Verlag, –Perea-Ortega, Jose M. and García-Cumbreras, Miguel A. and García- Vega, Manuel and Montejo-Ráez, Arturo. GEOUJA System. University of Jaén at GEOCLEF In Proceedings of the Cross Language Evaluation Forum (CLEF 2007), page 52, –García-Cumbreras, Miguel A. and Ureña-López, L. Alfonso and Martínez- Santiago, Fernando and Perea-Ortega, José M. BRUJA System. The University of Jaén at the Spanish task of In LNCS, volume 4730, pages Springer-Verlag, GeoCLEF 2008, Aarhus


Download ppt "SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer."

Similar presentations


Ads by Google