Johns Hopkins University Xiaodan Wang Eric Perlman Randal Burns Tamas Budavari Charles Meneveau Alexander Szalay Purdue University Tanu Malik JAWS: J ob-

Slides:



Advertisements
Similar presentations
1. The Problem Context: Interactive data exploration Consecutive queries are typically correlated The workload is characterized by phases Selection predicates.
Advertisements

Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.
Yahoo! Research Johns Hopkins University Chris Olston Anish Das Sarma Xiaodan Wang Randal Burns Shared Scan Batch Scheduling in Cloud Computing.
The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.
Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads Debabrata Dash, Anastasia Ailamaki, Neoklis Polyzotis 1.
A SYSTEM PERFORMANCE MODEL CSC 8320 Advanced Operating Systems Georgia State University Yuan Long.
1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.
Distributed Computations
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Data-Intensive Computing in the Science Community Alex Szalay, JHU.
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.
Distributed Computations MapReduce
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
Introduction to Hadoop and HDFS
Data Structures & Algorithms and The Internet: A different way of thinking.
1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Xiaodan Wang Department of Computer Science Johns Hopkins University Processing Data Intensive Queries in Scientific Database Federations.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Xiaodan Wang, Randal Burns Department of Computer Science Johns Hopkins University Tanu Malik Cyber Center Purdue University LifeRaft: Data-Driven, Batch.
Workshop on Networking Meets Databases (NetDB’07) Throughput-Optimized, Global-Scale Join Processing in Scientific Federations Xiaodan Wang, Randal Burns,
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
A System Performance Model Distributed Process Scheduling.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
AQWA Adaptive Query-Workload-Aware Partitioning of Big Spatial Data Dimosthenis Stefanidis Stelios Nikolaou.
File Grouping for Scientific Data Management: Lessons from Experimenting with Real Traces Shyamala Doraimani* and Adriana Iamnitchi University of South.
Bigtable: A Distributed Storage System for Structured Data
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Hopkins Storage Systems Lab, Department of Computer Science Network-Aware Join Processing in Global-Scale Database Federations X. Wang, R. Burns, A. Terzis.
Application-Aware Traffic Scheduling for Workload Offloading in Mobile Clouds Liang Tong, Wei Gao University of Tennessee – Knoxville IEEE INFOCOM
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Department of Computer Science Johns Hopkins University Xiaodan Wang Advisor: Randal Burns Processing Data-Intensive Queries in Petabyte-Scale Scientific.
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
CSCI5570 Large Scale Data Processing Systems
Kyriaki Dimitriadou, Brandeis University
Measurement-based Design
A Black-Box Approach to Query Cardinality Estimation
So far we have covered … Basic visualization algorithms
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Xia Zhao*, Zhiying Wang+, Lieven Eeckhout*
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Predictive Performance
Module 5: CPU Scheduling
Admission Control and Request Scheduling in E-Commerce Web Sites
(A Research Proposal for Optimizing DBMS on CMP)
Resource-Efficient and QoS-Aware Cluster Management
Gary M. Zoppetti Gagan Agrawal
Database System Architectures
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
Towards Predictable Datacenter Networks
Presentation transcript:

Johns Hopkins University Xiaodan Wang Eric Perlman Randal Burns Tamas Budavari Charles Meneveau Alexander Szalay Purdue University Tanu Malik JAWS: J ob- A ware W orkload S cheduling for the Exploration of Turbulence Simulations

JAWS: Job-Aware Workload Scheduling Problem Ensure high throughput for concurrent accesses to peta- scale Scientific datasets Turbulence Database Cluster – A new approach to data exploration Traditionally analyze dynamics on the fly Large simulations out of reach for many Scientists – Stores complete space-time histories of DNS – Exploration by querying simulation result – 27TB (velocity and pressure data on grid) – Available to wide community over the Web

JAWS: Job-Aware Workload Scheduling Pitfalls of Success Enable new class of applications – Iterative exploration over large space-time – Correlate, mine, extract at petabyte scale Heavily used and data intensive queries – 50,275,005,460 points queried – Hundreds of thousands of queries/month – I/O bound queries (79-88% time on loading data) – Scan large portions of DB lasting hours-days Single user can occupy the entire system for hours

JAWS: Job-Aware Workload Scheduling Addressing I/O Challenges I/O contention and congestion from concurrent use Significant data reuse between queries – Many large queries access the same data – Lends to batch scheduling – I.e. particles may cluster in turbulence structures

JAWS: Job-Aware Workload Scheduling A Batch Scheduling Approach Co-schedule queries accessing the same data – Eliminate redundant accesses to the disk – Amortize I/O cost over multiple queries Job-aware schedule for queries w/ data dependencies Trade-offs b/w arrival order and throughput Scales with workload saturation – Up to 4x improvement in throughput

JAWS: Job-Aware Workload Scheduling Architecture Universal addressing scheme for partitioning, addressing, and scheduling Data organization – 64 3 atoms (8MB) – Morton order index – Spatial and temporal partitioning JAWS scheduling at each node

JAWS: Job-Aware Workload Scheduling LifeRaft: Data-Driven Batch Scheduling Decompose into sub-queries based on data access Co-schedule sub-queries to amortize I/O Evaluate data atoms based on utility metric – Amount of contention (queries per data atom) – Age (queuing time) of oldest query (arrival order) – Balance contention with age via tunable parameter Turbulence DB R1R2R3 R2R3R4 R1R2 Q1 Q2 Q3 Decomposition Data Access by Query Q1Q2Q3 Q1Q3 Q1Q2 R2 R1 R3 Q2 R3 Co-schedule by Sub-query Batch Sched. Query Results Query Results

JAWS: Job-Aware Workload Scheduling A Case for Job-Aware Scheduling Job-awareness yields additional I/O savings – Greedy LifeRaft miss data sharing between jobs – Incorporate data-dependency to identify redundancy Execution Time Job 1 R1R1R3R3R4R4 LifeRaft Job 2 Job 3 R2R2 R2R2 R3R3 R3R3 R4R4 R4R4 Job 1 R1R1R3R3R4R4 JAWS Job 2 Job 3 R2R2 R2R2 R3R3 R3R3 R4R4 R4R4

JAWS: Job-Aware Workload Scheduling JAWS: Poly-Time Greedy Algorithm j1j1j1j1 j2j2j2j2 R2R2R4R4R5R5R1R1 R3R3R4R4R2R2R6R6 j3j3j3j3 R4R4R5R5R1R1R6R6 Precedence Edge ( ): Subsequent queries in a job must wait for predecessors Gating Edge ( ): Queries with data sharing and are evaluated at the same time Scheduler evaluate queries in the graph from left to right

JAWS: Job-Aware Workload Scheduling JAWS: Poly-Time Greedy Algorithm Dynamic program phase: identify data sharing b/w job pairs – DP based on Needleman-Wunsch algorithm for every pair of jobs – Maximize score (i.e. data sharing): 1 if two queries exhibit data sharing and are co-scheduled, 0 otherwise – Complexity O(n 2 m 2 )

JAWS: Job-Aware Workload Scheduling JAWS: Poly-Time Greedy Algorithm Merge phase: merge pairwise DP solutions – Sort job pairs based on # of gating edges – Merge gating edges b/w pairs of jobs greedily – Complexity O(n 3 m 2 ) (typically sparse graphs up to ~3000 edges) j1j1j1j1 R2R2R4R4R5R5R1R1 j3j3j3j3 R4R4R5R5R1R1R6R6 j2j2j2j2 R3R3R4R4R2R2R6R6 j1j1j1j1 R2R2R4R4R5R5R1R1 j2j2j2j2 R3R3R4R4R2R2R6R6 j3j3j3j3 R4R4R5R5R1R1R6R6 j1j1j1j1 j2j2j2j2 R2R2R4R4R5R5R1R1 R3R3R4R4R2R2R6R6 j3j3j3j3 R4R4R5R5R1R1R6R6 j1j1j1j1 j2j2j2j2 R2R2R4R4R5R5R1R1 R3R3R4R4R2R2R6R6 j3j3j3j3 R4R4R5R5R1R1R6R6

JAWS: Job-Aware Workload Scheduling JAWS: Scheduling Example j1j1j1j1 j2j2j2j2 WAITWAITWAITWAITWAITWAIT QUEUEQUEUE WAITWAITWAITWAITREADYREADYWAITWAIT Gating Edge Precedence Edge j3j3j3j3 WAITWAITWAITWAIT QUEUEQUEUE WAITWAITR1R2R4R5R6R4R3R2 R1R4R5R6 Example Three jobs j 1, j 2, j 3 Three jobs j 1, j 2, j 3 No caching Single region at a time Single region at a time

JAWS: Job-Aware Workload Scheduling JAWS: Scheduling Example j1j1j1j1 j2j2j2j2 QUEUEQUEUE WAITWAITWAITWAIT DONEDONE WAITWAITWAITWAIT QUEUEQUEUE WAITWAIT Gating Edge Precedence Edge j3j3j3j3 READYREADYWAITWAIT DONEDONE WAITWAIT R1R2R4R5 R6R4R3R2 R1R4R5R6 Time 1 j1j1j1j1 R1R1 j3j3j3j3 R1R1

JAWS: Job-Aware Workload Scheduling JAWS: Scheduling Example j1j1j1j1 j2j2j2j2 DONEDONE READYREADYWAITWAIT DONEDONE QUEUEQUEUE WAITWAIT DONEDONE WAITWAIT Gating Edge Precedence Edge j3j3j3j3 READYREADYWAITWAIT DONEDONE WAITWAIT R1R2R4R5 R6R4R3R2 R1R4R5R6 Time 2 j1j1j1j1 j2j2j2j2 R2R2 R2R2

JAWS: Job-Aware Workload Scheduling JAWS: Scheduling Example j1j1j1j1 j2j2j2j2 DONEDONEQUEUEQUEUE WAITWAIT DONEDONE DONEDONEQUEUEQUEUE DONEDONE WAITWAIT Gating Edge Precedence Edge j3j3j3j3 QUEUEQUEUE WAITWAIT DONEDONE WAITWAIT R1R2R4R5 R6R4R3R2 R1R4R5R6 Time 3 j2j2j2j2 R3R3

JAWS: Job-Aware Workload Scheduling JAWS: Scheduling Example j1j1j1j1 j2j2j2j2 DONEDONEDONEDONEQUEUEQUEUEDONEDONE DONEDONEDONEDONE DONEDONE READYREADY Gating Edge Precedence Edge j3j3j3j3 DONEDONEQUEUEQUEUE DONEDONE WAITWAIT R1R2R4R5 R6R4R3R2 R1R4R5R6 Time 4 j1j1j1j1 j2j2j2j2 j3j3j3j3 R4R4 R4R4 R4R4

JAWS: Job-Aware Workload Scheduling JAWS: Scheduling Example j1j1j1j1 j2j2j2j2 DONEDONEDONEDONEDONEDONEDONEDONE DONEDONEDONEDONE DONEDONEQUEUEQUEUE Gating Edge Precedence Edge j3j3j3j3 DONEDONEDONEDONE DONEDONEQUEUEQUEUE R1R2R4R5 R6R4R3R2 R1R4R5R6 Time 5 j1j1j1j1 j3j3j3j3 R5R5 R5R5

JAWS: Job-Aware Workload Scheduling JAWS: Scheduling Example j1j1j1j1 j2j2j2j2 DONEDONEDONEDONEDONEDONEDONEDONE DONEDONEDONEDONE DONEDONEDONEDONE Gating Edge Precedence Edge j3j3j3j3 DONEDONEDONEDONE DONEDONEDONEDONE R1R2R4R5 R6R4R3R2 R1R4R5R6 Time 6 j2j2j2j2 j3j3j3j3 R6R6 R6R6 In comparison, LifeRaft requires time 8

JAWS: Job-Aware Workload Scheduling Additional Optimizations Two-level scheduling – Exploit locality of reference – Group and evaluate multiple data atoms Adaptive Starvation Resistance – Trade-offs b/w query throughput and response time – Incremental changes by workload saturation (i.e. query arrival rate) Coord. Cache Replacement w/ Scheduling

JAWS: Job-Aware Workload Scheduling Experimental Setup 800GB sample DB: 31 time steps (0.062 sec of simulation time) Workload – 8 million queries (11/ /2009), 83k unique jobs – 63% of jobs persist between 1 and 30 min – 88% of jobs access data from one time step, 3% iterate over 0.2 sec of simulation time (10% of DB) – Use 50k query trace (1k jobs) from week of 07/20/2009 Algorithms compared – NoShare: queries in arrival order with no I/O sharing – LifeRaft 1 (arrival order) and LifeRaft 2 (contention order) – JAWS 1 : JAWS without job awareness – JAWS 2 : includes all optimizations

JAWS: Job-Aware Workload Scheduling Query Throughput 3x improvement 30% from job-awareness 12% from 2-level sched. 22% from qry reordering

JAWS: Job-Aware Workload Scheduling Sensitivity to Workload Saturation - JAWS 2 scales with workload - NoShare and LifeRaft Gap insensitive to saturation changes - JAWS 2 keeps response time low and adapts to workload saturation

JAWS: Job-Aware Workload Scheduling Future Directions Quality of service guarantees – Supporting interactive queries – Bounded completion time in proportion to query size Declarative style interfaces for job optimizations – Explicitly link related queries – Pre-declare time and space of interest – Pre-packaged op. that iterate over space/time inside DB Job-awareness crucial for Scientific workloads – Alleviates I/O contention across jobs – Up to 4x increase in throughput – Scales with workload

JAWS: Job-Aware Workload Scheduling Questions?

JAWS: Job-Aware Workload Scheduling Sensitivity to Batch Size k Small k fails to exploit locality of reference in the computation Small k fails to exploit locality of reference in the computation Large k impacts cache reuse and conforms less to workload throughput Large k impacts cache reuse and conforms less to workload throughput

JAWS: Job-Aware Workload Scheduling Sensitivity to Cache Replacement Compare w/ SQL Server’s LRU-K based replacement – Workload knowledge improves cache hit modestly – URC and SLRU improves performance by 16% and 4% – Low overhead optimizations for data intensive queries