Presentation is loading. Please wait.

Presentation is loading. Please wait.

Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña,

Similar presentations


Presentation on theme: "Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña,"— Presentation transcript:

1 Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña, Spain Miguel R. Luaces, Ángeles S. Places, Francisco J. Rodríguez, Diego Seco

2 20th October, 2008Barcelona - SeCoGIS 20082/29 Motivation The Database Laboratory at the University of A Coruña works in two very active research fields: — Geographic Information Systems (GIS) EIEL Project (http://www.dicoruna.es/webeiel)http://www.dicoruna.es/webeiel — Information Retrieval (IR) Galician Virtual Library (http://bvg.udc.es)http://bvg.udc.es GISIR GIR Retrieval of geographically and thematically relevant documents in response to a query of the form

3 20th October, 2008Barcelona - SeCoGIS 20083/29 Motivation Many of the documents stored in digital libraries and document databases include geographic references — Example: “…the hurricane touched land at Veracruz…” Few index structures or retrieval algorithms take into account these geographic references Some proposals have appeared recently. But… There are some specific particularities of geographic space that are not taken into account by these proposals: — Hierarchical nature of geographic space — Topological relationships between the geographic objects

4 20th October, 2008Barcelona - SeCoGIS 20084/29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

5 20th October, 2008Barcelona - SeCoGIS 20085/29 Related Work Many index structures and techniques have been proposed in each area: — IR: Inverted Index Geographic references are normal words — GIS: R-Tree These structures do not take into consideration the hierarchy of space The combination of both types of indexes (SPIRIT project): — Text-First — Geo-First They do not take into account the relationships between the geographic objects that they are indexing An Ontology can describe the specific characteristics of geographic space: — Ontologies are used in query expansion, relevance rankings, and web resource annotation Nobody has ever tried to combine it with other types of indexes to have a hybrid structure

6 20th October, 2008Barcelona - SeCoGIS 20086/29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

7 20th October, 2008Barcelona - SeCoGIS 20087/29 Architecture Administration User InterfaceQuery User Interface Geographic Information Retrieval Module Geometry Supplier Service Gazetteer Service WMS Query Evaluation Service Document Abstraction Index constructionIndex Structure Geographic Space Ontology Service Continent Country Administrative Region Populated Place  contains Ontology Europe Spain Galicia Coruña Ontology Instance

8 20th October, 2008Barcelona - SeCoGIS 20088/29 Architecture Document storage workflow: — Keyword extraction Classic IR techniques (removing stopwords, stemmers) — Build the index structure Natural Language Processing Techniques Gazetteer Service Geographic Space Ontology Service Processing services: — Query solving service — Web Map Service (WMS) — Geographic information retrieval module User Interface: — Administration (manage the document collection) — Query

9 20th October, 2008Barcelona - SeCoGIS 20089/29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

10 20th October, 2008Barcelona - SeCoGIS 200810/29 Index Structure …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Inverted Index …… Germany Spain …… Place Name Hash Table DocIds2,3,… Europe Asia … Spain Germany MadridGalicia … … Coruña

11 20th October, 2008Barcelona - SeCoGIS 200811/29 Index Structure Based on a spatial ontology — It models both the vocabulary and the spatial structure of places for purposes of information retrieval Tree composed by nodes that represent place names — Each node: Keyword (a place name) Bounding box Document identifiers Children nodes — These nodes are connected by means of inclusion relationships — If the list of children nodes exceeds a threshold, an R-Tree is used Auxiliary structures: — Place name hash table — Traditional Inverted Index

12 20th October, 2008Barcelona - SeCoGIS 200812/29 Index Structure Advantages: — All textual queries and spatial queries can be efficiently processed — Queries combining textual and spatial aspects are supported — Updates and optimizations in each index are handled independently Drawbacks: — The tree that supports the structure is possibly unbalanced — The structure is static

13 20th October, 2008Barcelona - SeCoGIS 200813/29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

14 20th October, 2008Barcelona - SeCoGIS 200814/29 Supported Query Types Pure textual queries — “Retrieve all documents where the words hotel and sea appear” — How do we solve it? Inverted Index Pure spatial queries — “Retrieve all documents that refer to the following geographic area” — How do we solve it? Descend the structure + refine the result The same algorithm that is used with spatial indexes

15 20th October, 2008Barcelona - SeCoGIS 200815/29 Supported Query Types Textual queries over a geographic area — “Retrieve all documents with the word hotel that refer to the following area” — How do we solve it? Example — Geographic references can be given using place names

16 20th October, 2008Barcelona - SeCoGIS 200816/29 Supported Query Types …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Inverted Index Europe Asia … Spain Portugal MadridGalicia … … Index Structure Text Result Spatial Result Query Result …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Text Result1,3,7,8,12,… Spatial Result Query Result Text Result1,3,7,8,12,… Spatial Result12,14,… Query Result Query Window Europe Portugal Spain Galicia Coruña … DocIds12,14,… Text Result1,3,7,8,12,… Spatial Result12,14,… Query Result12,… Text Result1,3,7,8,12,… Spatial Result12,14,… Query Result12,…

17 20th October, 2008Barcelona - SeCoGIS 200817/29 Supported Query Types Textual queries with place names — “Retrieve all documents with the word hotel that refer to Spain” — How do we solve it? Example — We save some time by avoiding a tree traversal

18 20th October, 2008Barcelona - SeCoGIS 200818/29 Supported Query Types …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Inverted Index Europe Asia … Spain Germany MadridGalicia … … Index Structure …… Spain Germany …… Place Name Hash Table Text Result Spatial Result Query Result DocIds2,3,… DocIds5,7,… DocIds12,14,… …… hotel1,3,7,8,12,… sea3,5,6,9,10,… …… Text Result1,3,7,8,12,… Spatial Result2,3,5,7,12,14,… Query Result3,7,12,… Text Result1,3,7,8,12,… Spatial Result2,3,5,7,12,14,… Query Result Text Result1,3,7,8,12,… Spatial Result2,3,5,7,… Query Result Text Result1,3,7,8,12,… Spatial Result Query Result Text Result1,3,7,8,12,… Spatial Result2,3,… Query Result …… Spain Germany …… Spain GaliciaMadrid Text Result1,3,7,8,12,… Spatial Result2,3,5,7,12,14,… Query Result3,7,12,…

19 20th October, 2008Barcelona - SeCoGIS 200819/29 Supported Query Types Another improvement: QUERY EXPANSION — “retrieve all documents that refer to Spain” — How do we solve it? The Query Evaluation Service discovers that Spain is a geographic reference The Place Name Hash Table locates quickly the internal node that represents the geographic object Spain All the documents associated to this node are part of the result All the documents associated to the subtree are part of the result — The result contains not only those documents that include the term Spain, but also all documents that contain the name of a geographic object included in Spain

20 20th October, 2008Barcelona - SeCoGIS 200820/29 Demo

21 20th October, 2008Barcelona - SeCoGIS 200821/29 Contents Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

22 20th October, 2008Barcelona - SeCoGIS 200822/29 Experiments Document collections FT-91FT-94 # documents5,36871,489 # textual terms64,711251,057 # place names candidates23,550299,661 # documents with candidates4,65260,823 # documents with place names4,18254,899 # place names167,7742,274,146 # spatial nodes21,28264,843

23 20th October, 2008Barcelona - SeCoGIS 200823/29 Experiments Randomly generated pure spatial queries Ontology-based index versus R-Tree [FT-91] Query area0.001%0.01%0.1%1% Ontology-based index0.0130.0170.0520.360 R-Tree0.0100.0160.0570.370

24 20th October, 2008Barcelona - SeCoGIS 200824/29 Experiments Ontology-based index versus R-Tree (zones of high document density) [FT-91] Query area0.001%0.01%0.1%1% Ontology-based index0.030.111.059.84 R-Tree0.070.221.6412.85 Ontology-based index versus R-Tree (zones of low document density) [FT-91] Query area0.001%0.01%0.1%1% Ontology-based index0.020.030.090.4 R-Tree0.020.030.070.2

25 20th October, 2008Barcelona - SeCoGIS 200825/29 Experiments Improved R-Tree — List of documents for each location Query area0.001%0.01%0.1%1% Ontology-based index0.0130.0170.0520.360 Improved R-Tree0.0070.0080.0220.145 Ontology-based index versus Improved R-Tree [FT-91] Ontology-based index versus Improved R-Tree [FT-94] Query area0.001%0.01%0.1%1% Ontology-based index0.0160.0360.2233.017 Improved R-Tree0.0070.0170.0971.056

26 20th October, 2008Barcelona - SeCoGIS 200826/29 Contents Introduction Motivation Related Work Architecture Index Structure Supported Query Types Experiments Conclusions and Future Work

27 20th October, 2008Barcelona - SeCoGIS 200827/29 Conclusions We have defined an architecture for Geographic Information Retrieval systems It takes into account both textual and geographic references in the documents This is achieved by a new index structure that combines an inverted index, a spatial index and an ontology Traditional queries and new types of queries can be solved

28 20th October, 2008Barcelona - SeCoGIS 200828/29 Future Work Evaluate the performance of the index structure Define algorithms to rank the retrieved documents Improve the disambiguation process of place names Explore the use of different ontologies Include other types of spatial relationships (e.g., adjacency)

29 Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña, Spain Miguel R. Luaces, Ángeles S. Places, Francisco J. Rodríguez, Diego Seco  Contact: luaces@udc.es


Download ppt "Retrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies Database Laboratory University of A Coruña A Coruña,"

Similar presentations


Ads by Google