Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Similar presentations


Presentation on theme: "Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins."— Presentation transcript:

1 Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins University

2 Streaming Evaluation Method Linear data requirements of the computation allow for: – Incremental evaluation – Streaming over the data – Concurrent evaluation of batch queries

3 Motivation Heavy DB usage slows down the service by a factor of 10 to 20 Query evaluation techniques adapted from simulation code do not access data coherently Substantial storage overhead incurred to localize each computation 95% of queries perform Lagrange Polynomial interpolation

4 Turbulence Database Cluster

5 MHD Database Stores velocity, magnetic field, magnetic vector potential and pressure fields – 10 attributes, 4 bytes each – 1024 time-steps over a 1024 3 grid – 40TB total size In order to reduce total amount of I/O: – Smaller atoms (4 3 voxel) – No replication

6 Lagrange Polynomial Interpolation Lagrange coefficients Data

7 Processing a Batch Query

8 Additional Optimizations Process the computation of values that are stored together concurrently Iterate in the appropriate order Compute the Lagrange coefficients with the procedures described by Purser and Leslie* *R. J. Purser and L. M. Leslie. An Efficient Interpolation Procedure for High-Order Three- Dimensional Semi-Lagrangian Models. Monthly Weather Review, 119:2492–+, 1991.

9 Experimental Evaluation Random workloads: – across the entire cube space – a 128 3 subset of the entire space Workload derived from the usage log of the Turbulence Database cluster Compare with: – Direct methods of evaluation

10 Setup Experimental version of the MHD database – ~300 timesteps of the velocity fields of the MHD DNS – Two 2.33 GHz dual quad-core Windows 2003 servers with SQL Server 2008 and 8GB of memory – Data tables striped across 7 disks

11

12 Questions/Comments


Download ppt "Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins."

Similar presentations


Ads by Google