GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.

Slides:



Advertisements
Similar presentations
Network Weather Service Sathish Vadhiyar Sources / Credits: NWS web site: NWS papers.
Advertisements

Hadi Goudarzi and Massoud Pedram
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
1 of 14 1 /23 Flexibility Driven Scheduling and Mapping for Distributed Real-Time Systems Paul Pop, Petru Eles, Zebo Peng Department of Computer and Information.
Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.
WS-VLAM: Towards a Scalable Workflow System on the Grid V. Korkhov, D. Vasyunin, A. Wibisono, V. Guevara-Masis, A. Belloum Institute.
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.
“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Basic Concepts Maximum CPU utilization.
Performance Evaluation of Parallel Processing. Why Performance?
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Rio de Janeiro, October, 2005 SBAC Portable Checkpointing for BSP Applications on Grid Environments Raphael Y. de Camargo Fabio Kon Alfredo Goldman.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 5: CPU Scheduling Basic.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.
Static Process Scheduling
A System Performance Model Distributed Process Scheduling.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Network Weather Service. Introduction “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing.
1 / 21 Providing Differentiated Services from an Internet Server Xiangping Chen and Prasant Mohapatra Dept. of Computer Science and Engineering Michigan.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.
Kriging for Estimation of Mineral Resources GISELA/EPIKH School Exequiel Sepúlveda Department of Mining Engineering, University of Chile, Chile ALGES Laboratory,
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
OPERATING SYSTEMS CS 3502 Fall 2017
Introduction to Load Balancing:
Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Chapter 6: CPU Scheduling
CS 143A - Principles of Operating Systems
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Presentation transcript:

GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology SC/APART Nov. 22, 2002

Outline Introduction Concept and challenge The Grid Harvest Service (GHS) System –Design methodology –Measurement system –Scheduling algorithms –Experimental testing Conclusion Scalable Computing Software Laboratory

Parallel Processing –Two or more working entities work together toward a common goal for a better performance Grid Computing –Use distributed resources as a unified compute platform for a better performance New Challenges of Grid Computing –Heterogeneous system, Non-dedicated environment, Relative large data access delay Introduction

Degradations of Parallel Processing Unbalanced Workload Communication Delay Overhead Increases with the Ensemble Size

Degradations of Grid Computing Unbalanced Computing Power and Workload Shared Computing and Communication Resource Uncertainty, Heterogeneity, and Overhead Increases with the Ensemble Size

Performance Evaluation (Improving performance is the goal) Performance Measurement –Metric, Parameter Performance Prediction –Model, Application-Resource, Scheduling Performance Diagnose/Optimization –Post-execution, Algorithm improvement, Architecture improvement, State-of-the-art

Parallel Performance Metrics (Run-time is the dominant metric) Run-Time (Execution Time) Speed: mflops, mips, cpi Efficiency: throughput Speedup Parallel Efficiency Scalability: The ability to maintain performance gain when system and problem size increase Others: portability, programming ability,etc

Parallel Performance Models (Predicting Run-time is the dominant goal) PRAM (parallel random-access model) –EREW, CREW, CRCW BSP (bulk synchronous parallel) Model –Supersteps, phase parallel model Alpha and Beta Model – comm. startup time, data trans. time per byte Scalable Computing Model –Scalable speedup, scalability Log(P) Model –L-latency, o-overhead, g-gap, P-the number of processors Others

Research Projects and Tools Parallel Processing –Paradyn, W3 (why, when, and where) –TAU, tuning and analysis utilities –Pablo, Prophesy, SCALEA, SCALA, etc –for dedicated systems – instrumentation, post-execution analysis, visualization, prediction, application performance, I/O performance

Research Projects and Tools Grid Computing –NWS (Network Weather Service) monitors and forecasts resource performance –RPS (Resource Prediction System) predicts CPU availability of a Unix system –AppLeS (Application-Level Scheduler) A application-level scheduler extended to non- dedicated environment based on NWS –Short-term system-level prediction

New Metric for Computation Grid ? –???? New Model for Computation Grid ? –Yes –Application-level performance prediction New Model for other Technical Advance? – Yes –Date access in hierarchical memory systems Do We Need

The Grid Harvest Service (GHS) System A long-term application-level performance prediction and scheduling system for non-dedicated (Grid) environments A new prediction model derived by probability analysis and simulation Non-intrusive measurement and scheduling algorithms Implementation and testing Sun/Wu 02

Performance Model (Gong,Sun,Watson,02) Remote job has low priority Local job arriving and service time based on extensive monitoring and observation ws(k)t

Predication Formula U k (S)|S k >0 Gamma distribution Arrival of local jobs follow a Poisson distribution with rate Execution time of the owner job follows a general distribution with mean and standard deviation Simulate the distribution of the local service rate, approaches with a know distribution

Prediction Formula Parallel task completion time Homogeneous parallel task completion time Mean time balancing partition

Measurement Methodology A parameter has a population with a mean and a standard deviation, a confidence interval for the population mean is given The smallest sample size n with a desired confidence interval and a required accuracy r is given

Measurement and Prediction of Parameters Utilization Job Arrival Standard Deviation of Service Rate Least-Intrusive Measurement

Select previous days, in the system measurement history; For each day, where means the set of measured during the time interval beginning from the day ; End For Select previous continuous time interval before, calculate where means the set of measured during ; output while and

List a set of lightly loaded machines ; List all possible sets of machines, such as For each machine set, Use mean time balancing partition to partition the task Use the formula to calculate the mean and coefficient of variation If >, then ; End For Assign parallel task to the machine set ; Scheduling Algorithm Scheduling with a Given Number of Sub-tasks

List a set of lightly loaded machines ; While do Scheduling with Sub-tasks If >, then ; End If End while Assign parallel task to the machine set. Optimal Scheduling Algorithm

List a set of lightly loaded machines ; Sort the machines in a decreasing order with ; Use the task ratio to find the upper limit q ; Use bi-section search to find the p such as is minimum Heuristic Scheduling Algorithm

Embedded in Grid Run-time System

Application-level Prediction Remote task completion time on single machine Experimental Testing

Prediction of parallel task completion time Prediction of a multi-processor with local scheduler

Partition and Scheduling Comparison of three partition approaches

Performance Gain with Scheduling Execution time with different scheduling strategies

Cost and Gain Measurement reduces when system steady

The calculation time of the prediction component Node Number Time (s)

The GHS System A Good Sample and Successful Story –Performance modeling –Parameter measurement and prediction schemes –Application-level performance prediction –Partition and Scheduling It has its limitation too –Communication and data access delay

What We Know, What We Do Not We know there is no deterministic prediction in a non-deterministic shared environment. We do not know how to reach a fussy engineering solution Heuristic algorithms Rule of thumb Stochastic AI Data Mining Statistic etc Innovative method etc

Conclusion Application-level Performance Evaluation –Code-machine versus machine, alg., alg.-machine New Requirement under New Environments We know we are making progress. We do not know if we can keep up with the technology improvement