Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.

Slides:



Advertisements
Similar presentations
A Peer-to-Peer Database Server based on BitTorrent John Colquhoun Paul Watson John Colquhoun Paul Watson.
Advertisements

Choosing an Order for Joins
The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.
Johns Hopkins University Xiaodan Wang Eric Perlman Randal Burns Tamas Budavari Charles Meneveau Alexander Szalay Purdue University Tanu Malik JAWS: J ob-
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Native Monitoring packsOps Mgr SP1Ops Mgr R2Ops Mgr 2012 Ops Mgr 2012 Feature PacksN/AOps Mgr 2012 Product Ship.
ArcGIS Geodatabase Miles Logsdon Spatial Information Technologies, UW Garry Trudeau - Doonesbury.
FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri.
Peter Dinda Department of Computer Science Northwestern University Beth Plale Department.
5 Creating the Physical Model. Designing the Physical Model Phase IV: Defining the physical model.
Sensor Data Management with Model-based View LSIR, EPFL.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
Some key-value stores using log-structure Zhichao Liang LevelDB Riak.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
Ohio State University Department of Computer Science and Engineering 1 Supporting SQL-3 Aggregations on Grid-based Data Repositories Li Weng, Gagan Agrawal,
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Indexing HDFS Data in PDW: Splitting the data from the index VLDB2014 WSIC、Microsoft Calvin
Retrospective computation makes past states available inline with current state in a live system What is the language for retrospective computation? What.
Component 4: Introduction to Information and Computer Science Unit 6a Databases and SQL.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Ohio State University Department of Computer Science and Engineering An Approach for Automatic Data Virtualization Li Weng, Gagan Agrawal et al.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Computational Research in the Battelle Center for Mathmatical medicine.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
SharePoint Saturday Quito Marzo 7, 2015 SharePoint 2013 Performance Improvements COMUNIDAD SHAREPOINT DE COLOMBIA.
Performance. Performance Performance is a critical issue especially in a multi-user environment. Benchmarking is one way of testing this.
1 Workload Analysis of Globus’ GridFTP Nicolas Kourtellis Joint Work with:Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi, Dan Fraser University of South.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Making a Difference with Azure Storage Solutions Dudu Sinai.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Scalability of Local Image Descriptors Björn Þór Jónsson Department of Computer Science Reykjavík University Joint work with: Laurent Amsaleg (IRISA-CNRS)
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr.
Table General Guidelines for Better System Performance
CPS216: Data-intensive Computing Systems
LCG 3D Distributed Deployment of Databases
CS422 Principles of Database Systems Course Overview
SQL Server 2000 and Access 2000 limits
Windows Azure Migrating SQL Server Workloads
Database Performance Tuning and Query Optimization
Selectivity Estimation of Big Spatial Data
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Physical Database Design
Managing batch processing Transient Azure SQL Warehouse Resource
Table General Guidelines for Better System Performance
Benchmarking Cloud Serving Systems with YCSB
CLUSTER BY: A NEW SQL EXTENSION FOR SPATIAL DATA AGGREGATION
RDBMS Chapter 4.
Chapter 11 Database Performance Tuning and Query Optimization
CS122B: Projects in Databases and Web Applications Spring 2018
CS122B: Projects in Databases and Web Applications Winter 2018
Presentation transcript:

Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins University

Streaming Evaluation Method Linear data requirements of the computation allow for: – Incremental evaluation – Streaming over the data – Concurrent evaluation of batch queries

Motivation Heavy DB usage slows down the service by a factor of 10 to 20 Query evaluation techniques adapted from simulation code do not access data coherently Substantial storage overhead incurred to localize each computation 95% of queries perform Lagrange Polynomial interpolation

Turbulence Database Cluster

MHD Database Stores velocity, magnetic field, magnetic vector potential and pressure fields – 10 attributes, 4 bytes each – 1024 time-steps over a grid – 40TB total size In order to reduce total amount of I/O: – Smaller atoms (4 3 voxel) – No replication

Lagrange Polynomial Interpolation Lagrange coefficients Data

Processing a Batch Query

Additional Optimizations Process the computation of values that are stored together concurrently Iterate in the appropriate order Compute the Lagrange coefficients with the procedures described by Purser and Leslie* *R. J. Purser and L. M. Leslie. An Efficient Interpolation Procedure for High-Order Three- Dimensional Semi-Lagrangian Models. Monthly Weather Review, 119:2492–+, 1991.

Experimental Evaluation Random workloads: – across the entire cube space – a subset of the entire space Workload derived from the usage log of the Turbulence Database cluster Compare with: – Direct methods of evaluation

Setup Experimental version of the MHD database – ~300 timesteps of the velocity fields of the MHD DNS – Two 2.33 GHz dual quad-core Windows 2003 servers with SQL Server 2008 and 8GB of memory – Data tables striped across 7 disks

Questions/Comments