Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Workshop on HPC in India Challenges of Garuda : The National Grid Computing Initiative of India Subrata Chattopadhyay C-DAC Knowledge Park Bangalore, India.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
Topics in Grid Computing Orientation Sathish Vadhiyar.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Senior Design Project: Parallel Task Scheduling in Heterogeneous Computing Environments Senior Design Students: Christopher Blandin and Dylan Machovec.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Computer Science Department 1 Load Balancing and Grid Computing David Finkel Computer Science Department Worcester Polytechnic Institute.
Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
Workload Management Massimo Sgaravatto INFN Padova.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
Checkpoint & Restart for Distributed Components in XCAT3 Sriram Krishnan* Indiana University, San Diego Supercomputer Center & Dennis Gannon Indiana University.
Panel Abstractions for Large-Scale Distributed Systems Henri Bal Vrije Universiteit Amsterdam.
DISTRIBUTED COMPUTING
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Xiao Liu CS3 -- Centre for Complex Software Systems and Services Swinburne University of Technology, Australia Key Research Issues in.
A Proposal of Application Failure Detection and Recovery in the Grid Marian Bubak 1,2, Tomasz Szepieniec 2, Marcin Radecki 2 1 Institute of Computer Science,
Presented by Reliability, Availability, and Serviceability (RAS) for High-Performance Computing Stephen L. Scott and Christian Engelmann Computer Science.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
SERC Research Seminar Day August 18, 2007 Predictions for Parallel Applications and Systems Sathish Vadhiyar Grid Applications Research Laboratory (GARL)
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Service-oriented Resource Broker for QoS-Guaranteed in Grid Computing System Yichao Yang, Jin Wu, Lei Lang, Yanbo Zhou and Zhili Sun Centre for communication.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Introduction Application of parallel programming to the KAMM model
7. Grid Computing Systems and Resource Management
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Roadmap to Next Generation Internet: Indian Initiatives Subbu C-DAC, India.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Fault Tolerance and Checkpointing - Sathish Vadhiyar.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Workload Management Workpackage
OPERATING SYSTEMS CS 3502 Fall 2017
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
Grid Computing.
SDM workshop Strawman report History and Progress and Goal.
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Basic Grid Projects – Condor (Part I)
Presented By: Darlene Banta
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education and Research Centre (SERC) Indian Institute of Science (IISc) Bangalore ATIP 1 st Workshop on HPC in SC-09

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-092 Grid Applications Research Lab  Grid and Parallel Computing with primary focus on  developing grid applications,  building strategies for checkpointing, migration, rescheduling, and fault-tolerance for parallel applications on grid systems, and  performance modeling of parallel applications on grids

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-093 Motivation  Developing solutions for deployment and use of large-scale scientific applications on grids  Will result in exploration of large- sized problems and long-running applications

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-094 Grid Applications Climate Modeling  Enable efficient executions of long-running climate modeling simulations on grid systems with the objective of solving climate science problems  Community Climate System Model (CCSM) – a multi- component global general circulation model  Analyzed the benefits of executing different components with checkpointing and rescheduling in different batch systems of a grid with a novel execution model CCSM

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-095 Grid Applications Climate Modeling – General Idea IJHPCA, FGCS  Job submission to a batch system incurs queue waiting time  Waiting time depends on processor requirements  How about decomposing a job into small subjobs with small processor requirements and submitting the subjobs to multiple batch systems of a grid?  Efficiency depends on effective system utilization using checkpointing, migration and rescheduling  Leads to 55% average increase in throughput Novel Execution Model

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-096  Predictions of future sequences in an evolutionary tree important for drug discovery, pharmaceutical research and disease control  Different ways of an ancestor sequence to transform to a progeny sequence  Formulated as a search-space exploration problem and used computational grids for explorations of the huge space of possible mutations  Used popular mutations to predict future evolutionary paths.  Performed predictions for hiv sequences and other protein sequences  40% better than random methods Grid Applications DNA Sequence Evolutions JPDC, escience 2009 Master-Worker Architecture for Analyzing Mutations 40% Better Predictions

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-097 Rescheduling  It is necessary to adapt application execution to grid resource and application dynamics  SRS – a checkpointing library for malleable applications  Can allow processor reconfiguration between migrations  Supports different data distributions, storage infrastructure, active migration and fault tolerance

Sathish VadhiyarATIP 1 st Workshop on HPC in SC N Cluster N N Interval 1 (t 1 ) Interval 2 (t 2 ) Interval 3 (t 3 ) Interval i (t i )  To find {I 1, I 2, …,I Lopt } such that is minimized where L opt – number of intervals; t i – predicted execution time of each interval; rcost – rescheduling cost  Developed 3 novel algorithms for deriving a rescheduling plan  Incremental algorithm, division heuristic and genetic algorithm  Given a parallel application consisting of multiple phases and given a set of resources, the problem is to derive a rescheduling plan  Where to execute the different phases and when to migrate/reschedule Application Phases Division heuristic Resheduling Strategies

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-099 Rescheduling Strategies  Performed experiments with five large- scale multi-phase parallel applications  Molecular dynamics, n-body simulations, astrophysical gas dynamics, crack propagation, electromagnetics. Rescheduling MethodTime (hours) Incremental6.8 Division6.58 Genetic5.97 Single Schedule68.77 Huge Benefits due to Rescheduling

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-0910 Performance Modeling JPDC,CPE  It is imperative to automatically derive “knowledge” (performance characteristics) of applications  Can be used for effective mapping of applications to resources  Built techniques for automatically deriving performance model functions for predicting execution costs of parallel applications on grids  First effort to deal with load changes during application executions  Less than 30% modeling errors – best reported for non-dedicated systems  Have also developed novel scheduling algorithms that use the model functions  Generates 80% better schedules than existing approaches Box Elimination (BE) [red bars] 50-80% more efficient! Performance Model Accuracy for Parallel QR Scheduling Method Scheduling Results

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-0911 Grid Middleware  Created a grid middleware for parallel multi-phase applications with rescheduling capabilities  Have successfully run multi-phase applications on grid consisting of multiple batch and interactive clusters in two geographically distributed sites  Also created a grid middleware for multi-component applications for coordinating the executions of the components on the different systems Grid Middleware for Multi-Phase Applications Grid Middleware for Multi-Component Applications

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-0912 Other Research  Checkpointing Interval Selection  For efficient execution in the presence of failures  A Markov Model consisting of 3 kinds of states for performance prediction  Extensive simulations with 9-year real supercomputer failure traces on 8 parallel systems, 3 rescheduling policies, and 3 parallel applications  Our model’s checkpointing intervals lead to high amount of useful work by the applications in the presence of failures  Compiler-aided checkpointing instrumentation  A source-to-source precompiler for automatic insertion of checkpointing calls  Performs live-variable analysis for determining data and wrappers for finding data sizes  Can handle parallel applications with block-distribution (molecular dynamics)

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-0913 Summary  Primary endeavor to aid scientific advancement in different domain areas using grid systems  Grid research in two different application areas that resulted in significant application benefits using grids  Contributed novel scheduling and rescheduling algorithms, performance modeling strategies and robust grid middleware for use by scientific community

Sathish VadhiyarATIP 1 st Workshop on HPC in SC-0914 Areas of Collaborations  Scalability of large-scale and peta applications  Fault tolerance in high performance systems  Setting up Indo-US grids  Grid middleware collaborations