Presentation on theme: "1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC."— Presentation transcript:
1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC Programa de Engenharia de Sistemas e Computação
2 Motivation and Goal The Problem Related works The Proposal SQB Architecture Preliminary Tests Remarks Outline
3 Motivation and Goal The Problem Related works The Proposal SQB Architecture Preliminary Tests Remarks Outline
4 Motivation The dissemination of GIS systems, associated with the improvement of channels bandwidth, is increasing quickly and the interactions between data producers and consumers are becoming more frequent, complex and dynamic. Some hot points in these relationships: Huge amount of data spread by many different geographic places Complexity of spatial data Demand for sophisticated services delivered by web The high price that shared resources may have in some federations (CPU time, storage space,...) Integration problems (many levels of heterogeneity) Distributed spatial operations and methods to improve their efficiency take an important role in this context. There are a lot of works involving spatial operations in a centralized context, but few in a distributed context. The Grid computig paradigm aggregate many characteristics that can improve the execution of distributed spatial operations. Distributed spatial operations and methods to improve their efficiency take an important role in this context. There are a lot of works involving spatial operations in a centralized context, but few in a distributed context. The Grid computig paradigm aggregate many characteristics that can improve the execution of distributed spatial operations.
5 This work aim at improving the efficiency of distributed spatial join by means of an architecture that permits the allocation of non-specialized computers in execution of the operation, reducing the overall response time. Spatial join was focused because it is a very common operation in GIS systems and has a high processing cost. The architecture also offers condictions to make experiments with new algorithms (filter/refine, scheduler,...) Goal
6 Motivation and Goal The Problem Related works The Proposal SQB Architecture Preliminary Tests Remarks Outline
7 The Problem How to proceed with a spatial join in a pool of data providers that share a huge amount of spatial data, in order to have the response time bellow a limit stated by some quality criteria? The data fragmentation may be spatial and/or thematic (ie a hybrid schema) and there are local spatial indexes on each dataset This scenario could be depicted by a pool of regional governmental agencies responsible by cartographic data generation, offering query-services that run over their data by mean of the internet. Themes related with: Transport Hydrography Infra-structure,... Themes related with: Transport Hydrography Infra-structure,...
8 Motivation and Goal The Problem Related works The Proposal SQB Architecture Preliminary Tests Remarks Outline
9 Related Work Many important works in spatial query processing are related with the filter / refine strategy . Some of them are mentioned bellow: Multi-Step processing of spatial joins Brinkhoff et al  Raster signatures in spatial joins (4CRS) Zimbrao et al  Multi-Steps with remote indexes (MR2) Ramirez and Souza  On the other hand, the execution of the query plan in a distributed context may emphasize the parallelism as a manner to reduce the overall response time. MR2 Ramirez  Grid Greedy Node, Porto et al  OGSA-DQP, Smith et al  The need of a scheduler module in some of these strategies should guarantee an adequate load balance among the selected local SDBMS
10 Motivation and Goal The Problem Related works The Proposal SQB Architecture Preliminary Tests Remarks Outline
11 The Proposal In this work, the grids ability in offering resources on-demand is used to reduce the overall response time during distributed spatial query join operations in databases. The parallelism in previous works involves only those nodes that are storing spatial data mentioned in the query. Our proposal is involve also generic computational resources in the most expensive step of the filter / refine strategy: the exact geometry processing. Multi-step filter / refine strategy  Geometric Filtering Filtering step: SDBMSs Exact processing step: generic computacional resources MBR join Dataset 1 Dataset 2 Exact processing Results
12 The Proposal The follow picture gives an overview of the context: Receives the global query and checks the user rights Auxiliary services Specialized CEs Generic CEs Meta-schedulers
13 The Proposal A specialized meta-scheduler, named Spatial Query Broker (SQB), is being proposed to deal with all spatial query processing, in a similar way as conventional Resource Brokers in grid environments. ItemSQBOGSA-DQPGridWayWMS Unit of workQuery Job App domainDatabases Generic jobs Dynamic scheduling YesNoYesNo Spatial queries?YesNo-- Use generic nodes? YesNo--
14 Motivation and Goal The Problem Related works The Proposal SQB Architecture Preliminary Tests Remarks Outline
15 Resources shared by organizations SQB Architecture The SQB is composed by the following modules: Manages all data flow and the sequence of events Analysis and simplification of the query To find data providers that store needed data and to acquire CEs status Selects the SDBMs and manages the filtering steps Manages the exact geometry step over CEs Is the interface with Ces, submitting and monitoring tasks Delivers information about resources and data partitioning
16 SQB Architecture SDBMS CE1 SDBMS CE2 MBRs + 4CRS Region r Theme 1 Region r Theme 2 T1T1 T2T2 MBR + Geometric filtering SDBMS CE1 Execution Monitor Inconclusive pairs and some positive hits (ids + # vertices) Steps managed by the optimizer
17 SQB Architecture The Execution Monitor builds two queues to store the inconclusive pairs in order to deliver them to the CEs. One of them are shared among faster CEs, while the other among slower ones. The total number of vertices is adopted as indicator to the complexity of the processing. A throughput indicator is previously picked up from the CEs and registered in the Information server (MDS) It isnt necessary to sort the pairs
19 Motivation and Goal The Problem Related works The Proposal SQB Architecture Preliminary Tests Remarks Outline
20 Preliminary Tests Despite a prototype is under construction, a few tests were done with synthetic spatial datasets consisting of polygons in order to give us some relative parameters to guide our work while dealing with spatial joins among polygons (overlap predicate). Spatial join operations were performed over servers that have both datasets R- Tree indexed. The original datasets were partitioned in four and nine regular parts and the response time (RT) on each situation was taken: RT = T MSG * #messages + T TX * # bytes + T CPU + T I/O Objets that cross boundaries were replicated on involved datasets (they werent split). The tests were executed in three situations: The whole query at once in a single SDBMS The query over the same region broken in four parts and executed by four identical machines The query over the same region broken in nine parts and executed by nine identical machines
22 Preliminary Tests RT = T MSG * #messages + T TX * # bytes + T CPU + T I/O Comm CPUI/O This operation is CPU bound and the communication cost has a low impact in the final response time. * Communications cost based on a 256kbps bandwidth + T remove replicas 10992 11009
23 1 4 9 The processing cost and the communication cost tend to reach a same magnitude when the number of servers increase. Preliminary Tests The superlinear speedup means, in this case, that computational resources available in a single machine were insufficient to reach good response time # servers
24 Test conditions The preliminary tests were executed under the following conditions: Spatial Database: Secondo Grid Middleware: Globus GT4 Datasets: Two datasets composed by 10060 triangles indexed Hardware: Sempron 2800, 1GB RAM, 80GB HD OS: Fedora Linux The overall architecture is under construction and is based on web services (WSRF)
25 Motivation and Goal The Problem Related works The Proposal SQB Architecture Preliminary Tests Remarks Outline
26 Remarks This work presents an architecture based on grid infrastructure tailored to cover some needs of a distributed geographic information system. The focus was on offering a strategy to execute spatial queries over spatial databases managed by several organizations that are gathered in a federation The filter/refine approach was adopted and tried to use some pre-existent spatial index in datasets. A global ID structure must be proposed in order to: Easily reduce the multi-processing of objects crossing boundaries after filtering step (avoiding to move them unnecessarily to CEs) Isolate the processing in SQB from local IDs, improving the scalability As next steps Specify new cost models to help the optimizer and the scheduler taken into account the dynamic of the environment Research the scheduling process in order to improve the reliability of the architecture Compare the response time of a join, executed over a benchmark dataset, with that one executed in similar distributed environments
27 References 1.Adzigogov, L., Soldatos, J., and Polymenakos, L. (2005). "EMPEROR: An OGSA Grid Meta-Scheduler based on Dynamic Resource." Journal of Grid Computing, 3, 19-37. 2.Afgan, E. (2004). "Role of the Resource Broker in the Grid." ACM, Huntsville, Alabama, USA. 3.Andretto, P. e. a. (2004). "Practical approaches to Grid workload and resource management in the EGEE project.". 4.Azevedo, L. G., Monteiro, R. S., Zimbrão, G., and Souza, J. M. (2004). "Approximate Spatial Query Processing Using Raster Signature.". 5.Brinkhoff, T., Kriegel, H. and Seeger B.(1993). Efficient Processing of Spatial Joins Using R-Trees, In: Proceedings of the 1993 ACM SIGMOD, Washington,DC. 6.Brinkhoff, T., Kriegel, H., and Schneider, R. (1994). "Multi-Step Processing of Spatial Joins." Washington,DC - USA, 237-246. 7.Buyya, R., and Venegupal, S. (2004). "The Gridbus Toolkit for Service Oriented Grid and Utility Computing: An overview and Status Report.". 8.Câmara, G., and Queiroz, G. (2002). "GeoBR: Intercâmbio Sintático e Semântico de Dados Espaciais.". 9.Di, L., Chen, A., Yang, W., and Zhao, P. (2003). "The Integration of Grid Technology with OGC Web Services (OWS) in NWGISS for NASA EOS Data.". 10.EGEE.(2006) "GLite - Installation and Configuration Guide v 3.0 (rev 2)", European Union. 11.Egenhofer, M. J., and Herring, J. R. (1994) "Categorizing Binary Topological Relations Between Regions, Lines and Point in Geographical Databases", NCGIA. 12."Globus Toolkit 4."(2005). www.gridbus.org/escience/051205GlobusTutorialeScience.ppt, July/2006. 13.Foster, I., and Kesselman, C. (1999). "Computational grids." The Grid: Blueprint for a New Computing Infrastructure, Morgan- Kaufman. 14.Foster, I., Kesselman, C., and Tuecke, S. (2001). "The Anatomy of the Grid Enabling Scalable Virtual Organizations." Lecture Notes in Computer Science, 2150. 15.Gistafson, J. L. (1990). "Fixed Time, Tiered Memory, and Superlinear Speedup.".
28 References 16.GridWay Team.(2006) "GridWay 5 Documentation: User Guide" Madrid, Spain, Universidad Complutense de Madrid. 17.Güting, R. H., Behr, T., Almeida, V., Ding, Z., Hoffmann, F., and Spiekermann, M. (2004) "Secondo: An Extensible DBMS Architecture and Prototype" Hagen, Germany, Fernuniversität Hagen. 18.Hanssen, G. (2005). "The Filter/Refine Strategy: A Study on the Land-Use Resource Dataset in Norway.". 19.Ilya, Z., Memon, A., Petropoulos, M., and Baru, C. (2003). "Online Querying of Heterogeneous Distributed Spatial Data on a Grid." Brno, Cz, 813-823. 20.Kang, M.-S., and Choy, Y.-C. (2002). "Deploying parallel spatial join algorithm for network environment." IEEE, 177-181. 21.Meyer, W. S., and Souza, J. M. (2006). "Overlapped Regions with Distributed Spatial Databases in a Grid Environment." Rio de Janeiro, Brazil. 22.Meyer, W. S., Souza, J. M., and Ramirez, M. R. (2005). "Secondo-grid:An Infrastructure to Study Spatial Databases in Computational Grids." Campos do Jordão, SP, Brazil. 23.Mondal, A., Goda, K., and Kitsuregawa, M. (2003). "Effective Load-Balancing via Migration and Replication in Spatial Grids." Lecture Notes in Computer Science, 2736, 202-211. 24.Özsu, M. T., and Valduriez, P. (2001). "Principles of Distributed Database Systems." Prentice-Hall. 25.Porto, F., Silva, V. F. V., Dutra, M. L., and Shulze, B. (2005). "An adaptive distributed query processing grid service." Trondheim, Norway. 26.Ramirez, M. R. (2001) "Spatial Distributed Query Processing" Rio de Janeiro, RJ, COPPE/UFRJ. 27.Smith, J., Gounaris, A., Watson, P., Paton, N. W., Fernandes, A. A. A., and Sakellariou, R. (2002) "Distributed Query Processing on the Grid" 28."OGSA-DQP 3.1 User's Documentation."(2006). http://www.ogsadai.org.uk/documentation/ogsa-dqp_3.1/, July/2006. 29.Venegupal, S., Buyya, R., and Winton, L. (2004). "A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids.". 30.Zimbrão, G., and Souza, J. M. (1998). "A Raster Approximation for the Processing of Spatial Joins." New York - USA, 558- 569.