Presentation is loading. Please wait.

Presentation is loading. Please wait.

OLAP Query Processing in Grids Nelson Kotowski Federal University of Rio de Janeiro, Brazil Alexandre A. B. Lima University of Grande Rio, Brazil Esther.

Similar presentations


Presentation on theme: "OLAP Query Processing in Grids Nelson Kotowski Federal University of Rio de Janeiro, Brazil Alexandre A. B. Lima University of Grande Rio, Brazil Esther."— Presentation transcript:

1 OLAP Query Processing in Grids Nelson Kotowski Federal University of Rio de Janeiro, Brazil Alexandre A. B. Lima University of Grande Rio, Brazil Esther Pacitti, Patrick Valduriez INRIA and University of Nantes, France Marta Mattoso Federal University of Rio de Janeiro, Brazil DMG 2007

2 2 Agenda OLAP in Grids Database clusters GParGRES Preliminary experimental results Conclusion

3 3 OLAP using Grids Problem  How to fulfill OLAP needs within current grid software infrastructure ? -Grid Services ? -Adapting database cluster techniques to grids ? Grid Figure thanks to Peter Kacsuk and Gergely Sipos

4 4 Using Database Clusters in Grids  A sequential “black-box” DBMS runs at each node  It is based on database replication  The middleware coordinates parallel query execution  Applications and databases are easily migrated from sequential environments  Both inter and intra-query parallelism can be explored Middleware DBMS PC Cluster Clients

5 5 DBMS Q4 Inter-query Parallelism DBMS Q1 Q2 Q3 Node 1 Node 2 Node 3 Node 4 Improves overall system throughput Good for OLTP applications Not adequate for OLAP

6 6 DBMS Intra-query Parallelism DBMS Q1 Q1 2 Q1 4 Q1 3 Q1 1 Q4 Q2 Q3 Node 1 Node 2 Node 3 Node 4 Reduces individual query execution time Required for high-performance OLAP Virtual Partitioning

7 7 ParGRES Database cluster middleware developed by our research group Optimized for OLAP support Provides inter and intra-query parallelism Offers high-performance for heavy-weight query processing over large databases -using non-expensive components -in a non-intrusive way -Making no changes to database applications -Keeping the same DBMS -Keeping the same logical database schema Shows super-linear speedup

8 GParGRES

9 9 GParGRES: a Database Grid Middleware Middleware that provides  Transparent access to distributed databases in a grid  Intra-query parallelism during heavy-weight query processing Based on ParGRES  Assumes that grid nodes are PC clusters running ParGRES instances Intra-query parallelism is achieved through virtual partitioning Two levels of query splitting  Grid-level splitting: implemented by GParGRES  Node-level splitting: implemented by ParGRES

10 10 GParGRES: Architecture

11 11 GParGRES: Architecture Concentrates metadata concerning GParGRES services, such as the state of each FS and DQS instance, and ParGRES execution in the nodes

12 12 GParGRES: Architecture GParGRES entry point, responsible for creating new instances of DQS

13 13 GParGRES: Architecture Manages global query execution. Receives the query and splits it into subqueries by using virtual partitioning to implement intra-query parallelism. It also performs final result composition

14 14 GParGRES: Architecture Grid Local Query Service (GLQS) – local component responsible for receiving subqueries from DQS and passing them to the local ParGRES instance

15 15 GParGRES: Architecture

16 16 GParGRES: a Database Grid Middleware

17 17 GParGRES: a Database Grid Middleware

18 18 GParGRES: a Database Grid Middleware

19 19 GParGRES: a Database Grid Middleware

20 20 GParGRES: a Database Grid Middleware select o_orderpriority, count(*) from orders where o_orderdate >= date ' ' group by o_orderpriority;

21 21 GParGRES: a Database Grid Middleware create table temp_result_1 ( o_orderpriority varchar(2), order_count integer);

22 22 GParGRES: a Database Grid Middleware select o_orderpriority, count(*) from orders where o_orderdate >= date ' ' and o_orderkey >= ? and o_orderkey < ? group by o_orderpriority;

23 23 GParGRES: a Database Grid Middleware

24 24 GParGRES: a Database Grid Middleware

25 25 GParGRES: a Database Grid Middleware

26 26 GParGRES: a Database Grid Middleware insert into temp_result_1 values (?,?);

27 27 GParGRES: a Database Grid Middleware select o_orderpriority, sum(order_count) from temp_result_1 group by o_orderpriority;

28 28 GParGRES: a Database Grid Middleware

29 29 GParGRES: Preliminary Experimental Results A preliminary GParGRES prototype has been implemented in Java  Simple versions of DQS and GLQS (using ParGRES components) were implemented Experimental Setup  Two clusters from Grid’5000 -Parasol cluster: 64 nodes, each with 2 Opteron 2.2GHz CPUs, 2GB RAM and 73 GB HD -Paraquad cluster: 64 nodes, each with 2 Dual Core Xeon 2.33GHz CPUs, 4GB RAM and 160GB HD  Kadeploy -Generate customized images of operating systems and applications  PostgreSQL  ParGRES  TPC-H database and queries -SF = 1

30 30 GParGRES: Preliminary Experimental Results (cont.) Two kinds of experiments  Isolated clusters  Mixed Configuration

31 31 GParGRES: Preliminary Experimental Results (cont.) Isolated cluster - Parasol

32 32 GParGRES: Preliminary Experimental Results (cont.) Isolated cluster - Paraquad

33 33 GParGRES: Preliminary Experimental Results (cont.) Mixed Configuration

34 34 GParGRES – Implementation Issues Goals  To implement all components as grid services  WSRF-compliant components: RS, FS and GLQS When running in a grid managed by Globus Toolkit 4, RS can be implemented by Web Service Monitoring and Discovery Service (WS MDS) Techniques employed in OGSA-DAI will help implementing some components (e.g. FS)

35 35 Related Work OGSA-DAI  Open Grid Services Architecture - Data Access and Integration OGSA-DQP  Open Grid Services Architecture - Distributed Query Processing New data models for grid warehouses  Wehrle et al. propose a data model for distributing and querying a data warehouse in computing grids -The warehouse is formed by data “chunks” -Special structures are needed (e.g. X-Tree)

36 36 Conclusion GParGRES is a grid service for OLAP query processing  It provides transparent inter and intra-query processing with -No need for application migration -No need for database schema migration -DBMS independence GParGRES explore successful techniques implemented in ParGRES Two levels of query splitting  Grid-level splitting: implemented by GParGRES  Node-level splitting: implemented by ParGRES Components are WSRF-compliant, easing the compatibility with existing grid solutions Preliminary results obtained in Grid’5000 show good performance

37 37 Future Work Integration with OGSA-DAI Support for partial database replication Support for top-k queries  Extension of best position algorithms

38 A different view of the Grid DMG 2007 Kandinsky the Grid, 1923 Albertina Museum Vienna Thanks!


Download ppt "OLAP Query Processing in Grids Nelson Kotowski Federal University of Rio de Janeiro, Brazil Alexandre A. B. Lima University of Grande Rio, Brazil Esther."

Similar presentations


Ads by Google