Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Similar presentations


Presentation on theme: "Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)"— Presentation transcript:

1 Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

2  SPIRIT: Spatial awareness to information systems e.g. transport timetables routing system for motorists map-based web sites location based services Key Part: Extraction and use of geospatial information

3  Criteria Speed, Reliability, Flexibility, Multilingualism  Geo-Parsing: - Identifying geographic references - Gazetteer lookup with context rules to filter out common-usage words and personal names  Geo-Coding: - Assigning spatial coordinate - Based on information of geographic resource

4

5  SPIRIT SPatially-Aware Information Retrieval on the InterneT A search engine to find documents and datasets on the web relating to place or regions

6  Poor existing web search facilities find information related to a particular location. Vicinity: find other places within radius www.somewherenear.com Yellow pages services: find a specific place or post code Buyukkten: associated admin’s IP with telephone area code Stanford Research Institute: proposed ‘.geo’ with cells with latitude and longitude

7  Resources relating to place may not be found may not be places nearby may have another name  Major Shortcoming: cannot recognize alternative name modern/historical variants informal name contained places name

8  SPIRIT Project Q uery expansion / relevance ranking procedures Machine learning techniques extraction of geographical context generating metadata Multi-modal user interface textual input interactive map feedback Spatial indices for web collections.

9  Sources of Spatial Data TGN, OS, SABE  A large web collection of SPIRIT

10

11

12

13  Tokenization Issues Stop-words Named-Entitiy Recognition (NER) Gazetteers

14

15  Named-Entity Recognition (NER) Processing a text and identifying to particular categories of Named Entities(NE) People, Organization, Location. etc

16  Tokenization Procedure 1) Tokenized on whitespace @words = split(/s+/, $sentence); (Perl Regular Expressions) "Isn't it ashame.“ -> Isn't / it / ashame. 2) Stemming / Case conversion. isn't / it / asham 3) Removing stop-words

17  Default setting in indexing and retrieving - Case sensitivity: Off - Stop-word removal: Off - Stemming: Off Stop-word removal / stemming -> Reduce the size of index files But, can be useful: Stop-words : ‘in’, ‘inside’, or ‘of’ Stemming: “London” from “London” &“Londoner”.

18  Filtering candidate locations using context rules to remove stop-words references to people and organizations, and links to emails/URLs

19  Geo-Parsing method could be improved by enhancing the gazetteer matching and filtering  False hits would be reduced by generating better list of stop-words and using further context rules could reduce  Need for creating rules would be alleviate by generating further context rules with features on machine learning

20 [3] Jones C.B., R. Purves, A. Ruas, M. Sanderson, M. Sester, M.J. van Kreveld, R. Weibel (2002). Spatial information retrieval and geographical ontologies an overview of the SPIRIT project. SIGIR 2002: In SIGI’02, Tampere, Finland, 387-388. [6] Joho, H. and Sanderson, M. (2004) The SPIRIT collection: an overview of a large web collection. In SIGIR Forum, 38(2), 57-61. [8] Mikheev A., Moens M. and Grover C. (1999) Named Entity recognition without gazetteers. In Proceedings of the Annual Meeting of the European Association for Computational Linguistics EACL'99, Bergen, Norway, 1-8. Spatially-Aware Information Retrieval on the Internet - A Working Searching System

21


Download ppt "Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)"

Similar presentations


Ads by Google