Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University.

Slides:

Advertisements

Similar presentations

Starfish: A Self-tuning System for Big Data Analytics.

Advertisements

Three Perspectives & Two Problems Shivnath Babu Duke University.

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.

SLA-Oriented Resource Provisioning for Cloud Computing

Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003.

A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores Pengcheng Xiong (NEC Labs America)

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.

Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.

1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

Predicting Sequential Rating Elicited from Humans Aviv Zohar & Eran Marom.

1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

Data Mining: A Closer Look Chapter Data Mining Strategies 2.

A Hadoop MapReduce Performance Prediction Method

1 Data Mining over the Deep Web Tantan Liu, Gagan Agrawal Ohio State University April 12, 2011.

University of Maryland Automatically Adapting Sampling Rates to Minimize Overhead Geoff Stoker.

Time-Series Analysis and Forecasting – Part V To read at home.

Efficient Model Selection for Support Vector Machines

Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.

Access Path Selection in a Relational Database Management System Selinger et al.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,

1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.

Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.

Bug Localization with Machine Learning Techniques Wujie Zheng

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Profiling and Modeling Resource Usage.

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

Real-Time Systems Mark Stanovich. Introduction System with timing constraints (e.g., deadlines) What makes a real-time system different? – Meeting timing.

Towards Automatic Optimization of MapReduce Programs (Position Paper) Shivnath Babu Duke University.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.

Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.

Resource Predictors in HEP Applications John Huth, Harvard Sebastian Grinstein, Harvard Peter Hurst, Harvard Jennifer M. Schopf, ANL/NeSC.

Performance evaluation on grid Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.

Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.

Chapter 10 Verification and Validation of Simulation Models

Learning Application Models for Utility Resource Planning Piyush Shivam, Shivnath Babu, Jeff Chase Duke University IEEE International Conference on Autonomic.

CompSci Self-Managing Systems Shivnath Babu.

Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.

Performance Debugging for Distributed Systems of Black Boxes Marcos K. Aguilera Jeffrey C. Mogul Janet L. Wiener HP Labs Patrick Reynolds, Duke Athicha.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.

Flat clustering approaches

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Classification Ensemble Methods 1

Data Mining and Decision Support

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.

CompSci Self-Managing Systems Shivnath Babu.

Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER.

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

Sunpyo Hong, Hyesoon Kim

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

A Quantitative Framework for Pre-Execution Thread Selection Gurindar S. Sohi University of Wisconsin-Madison MICRO-35 Nov. 22, 2002 Amir Roth University.

CSE 340 Computer Architecture Summer 2016 Understanding Performance.

Evolutionary Computation Evolving Neural Network Topologies.

OPERATING SYSTEMS CS 3502 Fall 2017

Applying Control Theory to Stream Processing Systems

A Black-Box Approach to Query Cardinality Estimation

Lecture 24: Process Scheduling Examples and for Real-time Systems

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

Laura Bright David Maier Portland State University

A Data Partitioning Scheme for Spatial Regression

Presentation transcript:

Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University

C3C3 C1C1 C2C2 Site A Site B Site C Task scheduler Task workflow A network of clusters or grid sites Networked Computing Utility Each site is a pool of heterogeneous resources Jobs are task workflows Challenge: choose good resource assignments for the jobs

C3C3 C1C1 C2C2 Site A Site B Site C home file server P1 P2 P3 A workflow with a single task Example: Assigning Resources to Run Tasks P1Site A Task input data at Site A Execution plan Ξ Resource assignment P2Site BSite A P3Site B PlanCPUStorage

Plan Selection Problem Choose Best Plan PlansCPUStorage P1Site A P2Site BSite A ……… Task workflow Plan Enumeration Cost T1 T2 … Cost: Plan Execution Time Challenge: Need cost models to estimate plan execution time

Generating Cost Models is Hard Non-declarative –Scientific workflow tasks are usually scripts (matlab, perl) –Such tasks are not database operators like join or select –Hence: task is a black box with no prior knowledge Heterogeneous resources –Computational grid setting –Performance varies a lot across resource assignments Data dependency –Performance can vary significantly based on properties of input data & parameters to scripts

Problem Setting Scientific workflows at DSCR (Duke Shared Cluster Resource) Important scientific workflows are run repeatedly –Opportunity to observe & learn task behavior –Better plan selection for subsequent runs Sequential scientific workflows –Each task runs on a single node –>90% of workflows at DSCR are sequential

NIMO System NonInvasive Modeling for Optimization NIMO learns cost models for task workflows –End-to-end cost models Incorporate properties of tasks, resources, & data –Non-invasive No changes to tasks –Automated and active Automatically collects training data for learning cost models C3C3 C1C1 C2C2 Site A Site B Site C Scheduler NIMO NIMO System NonInvasive Modeling for Optimization

NIMO Fills a Gap WorkFlow Management Systems (WFMSs) –WFMSs use database technology for managing all aspects of scientific workflows [Liu ‘04, Shankar ‘05] Batch scheduling systems –Knowledge of plan execution time is assumed for optimizing resource assignments [Casanova ‘00, Phan ‘05, Kelly ‘03] NIMO generates cost models for these systems

Roadmap Cost models NIMO: active learning of cost models Experimental evaluation Related work Conclusions Future work

Cost Model Task Execution time Resource assignment Cost Model for Task Input data Total workflow execution time can be derived using the cost models for individual tasks Task workflow

O a (compute occupancy) O s (stall occupancy) Task Cost Model compute phase (compute resource busy) stall phase (compute resource stalled on I/O) O d (storage occupancy) O n (network occupancy) ++ ) ( T=D * total data exec. time occupancy: average time spent per unit of data

Cost Model Task Execution time Resource assignment Cost Model Input data T = D * (O a + O n + O d ) Resource profile Data profile Task profile

Learning Cost Models Learning the cost model = Learning profiles + Learning predictors

Independent variables Resource profile ( ) Data profile ( ) Statistical Learning of Predictors Dependent variables Ex: Learn each predictor as a regression model from the training data

Challenges in Learning Cost of sample acquisition Coverage of system operating range Curse of dimensionality –Suppose: 10 profile attributes X 10 values per attribute, and 5 minutes for a task run (sample)  We sample 1% of space and build cost model Passive learning Elapsed Time Accuracy of current best model 951 years! Active & Accelerated Learning Best accuracy possible

Active (and Accelerated) Learning Which predictors are important? Which profile attributes should each predictor have? What values to consider for each profile attribute during training? Resource profile Data profile

WAN emulator (nistnet) NIMO workbench Training set database Active & Accel. learning C3C3 C1C1 C2C2 Site A Site B Site C Scheduler NIMO System Task profiler Resource profiler Run standard benchmarks Data profiler

Active Learning Algorithm Initialization While( ) { }

Relearn predictors with the new set of training samples Compute current prediction error of each predictor –Fixed test set –Cross-validation Active Learning Algorithm Initialization While( ) { } Pick a new assignment Run task on chosen assignment Relearn predictors Relearn Predictors 10ms256M1GHz 1G 512MB 6 8 T4T4 4

Active Learning Algorithm Initialization While( ) { } Run task on chosen assignment Relearn predictors 10ms256M1GHz 1G 512MB 6 8 T4T4 4 Choose a predictor to refine Choose attributes for the predictor Choose attribute values for the run Predictor Choice Predictors – fa, fn, fd, fD Order predictors + Traverse this order –Ex: relevance-based order (Plackett-Burman) –Ex: choose predictor with current max. error

Active Learning Algorithm Initialization While( ) { } Run task on chosen assignment Relearn predictors 10ms256M1GHz 1G 512MB 6 8 T4T4 4 Choose a predictor to refine Choose attributes for the predictor Choose attribute values for the run Attribute Choice Each predictor takes profile attributes as input Not all attributes are equally relevant Order attributes + Traverse this order

Active Learning Algorithm Initialization While( ) { } Run task on chosen assignment Relearn predictors 10ms256M1GHz 1G 512MB 6 8 T4T4 4 Choose a predictor to refine Choose attributes for the predictor Choose attribute values for the run Value Choice Cover the operating range of attributes Expose main interactions with other attributes

Experimental Results Biomedical workflows (from DSCR) –BLAST, fMRI, NAMD, CardioWave –Single task workflows Plan space in the heterogeneous networked utility –5 CPU speeds, 6 Network latencies, 5 Memory sizes –5 X 6 X 5 = 150 resource plans Goal: Converge quickly to a fairly-accurate cost model –We use regression models for the predictors –Model validation details in previous work (ICAC 2005)

Performance Summary Error: Mean absolute % error in predicted execution time A separate test set for evaluating the error

BLAST Application: Predictor Choice

BLAST Application: Attribute Choice

Related Work Workflow Management Systems (WFMSs) –[Shankar ’05, Liu ’04 etc.] Performance prediction in scientific applications –[Carrington ’05, Rosti ’02, etc.] Learning cost models using statistical techniques –[Zhang ’05, Zhu ’96, etc.] NIMO is end-to-end, noninvasive, and active (acquires model learning data automatically)

Conclusions NIMO: –Learns cost models for scientific workflows –Noninvasive and end-to-end –Active and accelerated learning: Learns accurate cost models quickly –Fills a gap in Workflow Management Systems

NIMO + SHIRAKO –A policy-based resource- leasing system that can slice- and-dice virtualized resources NIMO + Fa –Processing system- management queries (e.g., root-cause diagnosis, forecasting performance problems, capacity-planning) C3C3 C1C1 C2C2 Site A Site B Site C Scheduler NIMO Future Work

Backup Slides for Explanation

See Paper for Details of Steps Each algorithm step has sub-algorithms Example: Choosing the predictor to refine in current step –Goal: learn most relevant predictors first –Static Vs. dynamic ordering Static: –Define total order: a priori or using estimates of influence (Plackett-Burman) –Traverse the order: round-robin Vs. improvement-threshold-based Dynamic: choose the predictor with maximum current prediction error

Active and Accelerated Learning

Latency hiding

Saturation