Download presentation
Presentation is loading. Please wait.
1
Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)
2
SPIRIT: Spatial awareness to information systems e.g. transport timetables routing system for motorists map-based web sites location based services Key Part: Extraction and use of geospatial information
3
Criteria Speed, Reliability, Flexibility, Multilingualism Geo-Parsing: - Identifying geographic references - Gazetteer lookup with context rules to filter out common-usage words and personal names Geo-Coding: - Assigning spatial coordinate - Based on information of geographic resource
5
SPIRIT SPatially-Aware Information Retrieval on the InterneT A search engine to find documents and datasets on the web relating to place or regions
6
Poor existing web search facilities find information related to a particular location. Vicinity: find other places within radius www.somewherenear.com Yellow pages services: find a specific place or post code Buyukkten: associated admin’s IP with telephone area code Stanford Research Institute: proposed ‘.geo’ with cells with latitude and longitude
7
Resources relating to place may not be found may not be places nearby may have another name Major Shortcoming: cannot recognize alternative name modern/historical variants informal name contained places name
8
SPIRIT Project Q uery expansion / relevance ranking procedures Machine learning techniques extraction of geographical context generating metadata Multi-modal user interface textual input interactive map feedback Spatial indices for web collections.
9
Sources of Spatial Data TGN, OS, SABE A large web collection of SPIRIT
13
Tokenization Issues Stop-words Named-Entitiy Recognition (NER) Gazetteers
15
Named-Entity Recognition (NER) Processing a text and identifying to particular categories of Named Entities(NE) People, Organization, Location. etc
16
Tokenization Procedure 1) Tokenized on whitespace @words = split(/s+/, $sentence); (Perl Regular Expressions) "Isn't it ashame.“ -> Isn't / it / ashame. 2) Stemming / Case conversion. isn't / it / asham 3) Removing stop-words
17
Default setting in indexing and retrieving - Case sensitivity: Off - Stop-word removal: Off - Stemming: Off Stop-word removal / stemming -> Reduce the size of index files But, can be useful: Stop-words : ‘in’, ‘inside’, or ‘of’ Stemming: “London” from “London” &“Londoner”.
18
Filtering candidate locations using context rules to remove stop-words references to people and organizations, and links to emails/URLs
19
Geo-Parsing method could be improved by enhancing the gazetteer matching and filtering False hits would be reduced by generating better list of stop-words and using further context rules could reduce Need for creating rules would be alleviate by generating further context rules with features on machine learning
20
[3] Jones C.B., R. Purves, A. Ruas, M. Sanderson, M. Sester, M.J. van Kreveld, R. Weibel (2002). Spatial information retrieval and geographical ontologies an overview of the SPIRIT project. SIGIR 2002: In SIGI’02, Tampere, Finland, 387-388. [6] Joho, H. and Sanderson, M. (2004) The SPIRIT collection: an overview of a large web collection. In SIGIR Forum, 38(2), 57-61. [8] Mikheev A., Moens M. and Grover C. (1999) Named Entity recognition without gazetteers. In Proceedings of the Annual Meeting of the European Association for Computational Linguistics EACL'99, Bergen, Norway, 1-8. Spatially-Aware Information Retrieval on the Internet - A Working Searching System
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.