Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

Slides:

Advertisements

Similar presentations

Three types of remote process invocation

Advertisements

Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.

Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.

Distributed Systems 1 Topics  What is a Distributed System?  Why Distributed Systems?  Examples of Distributed Systems  Distributed System Requirements.

GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.

Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.

A Computation Management Agent for Multi-Institutional Grids

USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.

1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.

Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.

Workload Management Massimo Sgaravatto INFN Padova.

Company LOGO Development of Resource/Commander Agents For AgentTeamwork Grid Computing Middleware Funded By Prepared By Enoch Mak Spring 2005.

Inter-cluster Job Deployment by AgentTeamwork Sentinel Agents Emory Horvath CSS497 Spring 2006 Advisor: Dr. Munehiro Fukuda.

Grids and Globus at BNL Presented by John Scott Leita.

Globus Computing Infrustructure Software Globus Toolkit 11-2.

Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Building Coupled Parallel and Distributed Scientific Simulations with InterComm Alan Sussman Department of Computer Science & Institute for Advanced Computer.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.

Dynamic Firewalls and Service Deployment Models for Grid Environments Gian Luca Volpato, Christian Grimm RRZN – Leibniz Universität Hannover Cracow Grid.

Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.

GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.

Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.

COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.

SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,

G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)

Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.

Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou

July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.

GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.

8/25/2005IEEE PacRim The Design Concept and Initial Implementation of AgentTeamwork Grid Computing Middleware Munehiro Fukuda Computing & Software.

Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,

Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.

1 Grid Activity Summary » Grid Testbed » CFD Application » Virtualization » Information Grid » Grid CA.

Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.

Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.

WebFlow High-Level Programming Environment and Visual Authoring Toolkit for HPDC (desktop access to remote resources) Tomasz Haupt Northeast Parallel Architectures.

2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.

Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,

CSS497 Undergraduate Research Performance Comparison Among Agent Teamwork, Globus and Condor By Timothy Chuang Advisor: Professor Munehiro Fukuda.

Background Computer System Architectures Computer System Software.

Mobile Analyzer A Distributed Computing Platform Juho Karppinen Helsinki Institute of Physics Technology Program May 23th, 2002 Mobile.

PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.

Parallel Computing Globus Toolkit – Grid Ayaka Ohira.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Agent Teamwork Research Assistant

Introduction to Distributed Platforms

Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016

NGS computation services: APIs and Parallel Jobs

University of Technology

#01 Client/Server Computing

Internet Protocols IP: Internet Protocol

MPJ: A Java-based Parallel Computing System

#01 Client/Server Computing

Presentation transcript:

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007 Job Dispatch and Termination Performance Agent Teamwork VS. Globus/OpenPBS Framework Execution Performance Agent Teamwork VS. MPIJava Terminology Grid vs. Cluster A computing grid is commonly distinguished from a computing cluster by the geographic distance between members. A cluster would be a group of computers in the same room or building and connected to the same physical network, while the members of grid could be located anywhere and may connected over several different networks. Platform I define an HPDC platform as software that provides Infrastructure and Scheduling services. Infrastructure services include authentication and authorization, job submission, and file transfer for job deployment. Scheduling services include dynamic resource identification and allocation, scheduling policies, and coordinating job execution. Framework I define a framework as a related set of software libraries that are used to write software in a particular programming model. The Single Program Multiple Data (SPMD) programming model is commonly used to achieve data level parallelism in HPDC. MPIJava is a Java implementation of the Message Passing Interface standard which provides a framework for programming in the SPMD model. Agent Teamwork AgentTeamwork is a mobile-agent-based job coordination system that targets a mixture of computing nodes, some directly connected to the public Internet, and others simply clustered in a private IP domain but not managed by a commodity job scheduler. 1 Globus Toolkit The Globus Toolkit is an open source software toolkit used for building Grid systems and applications. 2 OpenPBS OpenPBS is the original version of the Portable Batch System. It is a flexible batch queueing system developed for NASA in the early to mid- 1990s 3. The purpose of the OpenPBS system is to provide additional controls over initiating or scheduling execution of batch jobs; and to allow routing of those jobs between different hosts. 4 Message Passing Interface (MPI) MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users. MPI was designed for high performance on both massively parallel machines and on workstation clusters. 5 MPICH-G2 A grid-enabled implementation of the MPI v1.1 standard. It uses services from the Globus Toolkit (e.g., job startup, security), MPICH-G2 allows you to couple multiple machines, potentially of different architectures, to run MPI applications. 6 MPIJava mpiJava is an object-oriented Java interface to the standard Message Passing Interface (MPI). 7 1 Fault-Tolerant Job Execution over Multi-Clusters using Mobile agents, Munehiro Fukuda gca07.pdf Overview of the OpenPBS, 5 What is MPI, 6 What is MPICH-G The Clusters Overview Technology AgentTeamwork My goal as a research assistant was to evaluate Agent Teamwork’s “Job Dispatch & Termination” and “Framework” performance against a contemporary alternative. Job Dispatch & Termination Evaluation: I built a reference platform to compare Agent Teamwork against by integrating the Globus Toolkit with the OpenPBS scheduler and the MPICH-G2 MPI framework. Framework Function Evaluation: To evaluate the framework performance I wrote three benchmark programs in the Agent Teamwork MPI framework and the MPIJava framework and compared their runtimes. Reference Platform Overview Results: These graphs compare job dispatch & termination time when submitting a test program to different numbers of cluster nodes in either a depth or breadth first distribution. Agent Teamwork’s job dispatch and termination performance was comparable with the reference platform in the depth first distribution And agent teamwork outperformed the reference platform with a large number of nodes in a breadth first distribution. 1 In order to run a job you generate a job definition file using the Resource Specification Language (RSL) and submit it along with your user certificate using globusrun. The gram client submits the job to a gatekeeper on the cluster head, which uses the GSI to authenticate and authorize the job submission. It then starts a job manager which issues a callback to the gram client to connect std error and std out back to the client. The job manager then submits the job details to the PBS Server. The PBS Scheduler selects appropriate nodes from the cluster and transfers the executable to the PBS mom on the cluster nodes. The PBS mom launches the application. Applications are written in the MPICH-G2 framework which uses the grid infrastructure to coordinate the parallel execution. 2 3 Framework Results: Currently two of the Agent Teamwork versions of the benchmark programs cannot be run across the clusters due to outstanding bugs in the framework. One of the benchmark programs, Wave2D, was able to run on a limited number of nodes. The graphs to the right show these partial results which indicate that the Agent Teamwork version is at least one order of magnitude slower than MPIJava. At this point however framework debugging is ongoing. The following tables describe the hardware that was used. There were a total of 66 machines divided into two clusters. Medusa ClusterPhoebe Cluster a 32-node cluster for research use a 32-node cluster for instructional use Head Node: specification outbound 1.8GHz Xeon x2, 512MB memory, and 70GB HD 100Mbps Head node: specification outbound 1.5 memory, and 40GB HD 100Mbps Computing nodes: #nodes specification inbound GHz Xeon, 512MB memory, and 36GB HD 1Gbps 8 2.8GHz Xeon, 512MB memory, and 60GB HD 2Gbps Computing nodes: #nodes specification inbound GHz Xeon, 512MB memory, and 30GB HD 100Mbps GHz Xeon, 512MB memory, and 30GB HD 1Gbps