HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Computer networks Fundamentals of Information Technology Session 6.
Copyright GeneGo CONFIDENTIAL »« MetaCore TM (System requirements and installation) Systems Biology for Drug Discovery.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
NSF Site Visit HYDRA Using Windows Desktop Systems in Distributed Parallel Computing.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
Dinker Batra CLUSTERING Categories of Clusters. Dinker Batra Introduction A computer cluster is a group of linked computers, working together closely.
Module 5: Configuring Access for Remote Clients and Networks.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
Windows HPC Server 2008 Presented by Frank Chism Windows and Condor: Co-Existence and Interoperation.
IT:Network:Applications VIRTUAL DESKTOP INFRASTRUCTURE.
Introduction to windows operating system i
Keeping Everyone Happy: Serving Windows Users in a Macintosh Environment Laurie Sutch, University of Michigan.
IBIS System: Requirements and Components Lois M. Haggard Office of Public Health Assessment.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 22, 2011assignprelim.1 Assignment Preliminaries ITCS 6010/8010 Spring 2011.
Cyberaide Virtual Appliance: On-demand Deploying Middleware for Cyberinfrastructure Tobias Kurze, Lizhe Wang, Gregor von Laszewski, Jie Tao, Marcel Kunze,
Chapter 4 Operating Systems and File Management. 4 Chapter 4: Operating Systems and File Management 2 Chapter Contents  Section A: Operating System Basics.
Web Servers Web server software is a product that works with the operating system The server computer can run more than one software product such as .
STRATEGIES INVOLVED IN REMOTE COMPUTATION
Chapter 7: Using Windows Servers to Share Information.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
1 Web Server Administration Chapter 1 The Basics of Server and Web Server Administration.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
SUSE Linux Enterprise Desktop Administration Chapter 12 Administer Printing.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Distributed Systems: Concepts and Design Chapter 1 Pages
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Grid Computing I CONDOR.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Grid MP at ISIS Tom Griffin, ISIS Facility. Introduction About ISIS Why Grid MP? About Grid MP Examples The future.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Example: Sorting on Distributed Computing Environment Apr 20,
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Remote Access Using Citrix Presentation Server December 6, 2006 Matthew Granger IT665.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Hands-On Microsoft Windows Server Implementing Microsoft Internet Information Services Microsoft Internet Information Services (IIS) –Software included.
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Processes Introduction to Operating Systems: Module 3.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
SMBL and Blast Joe Rinkovsky Unix Systems Support Group Indiana University.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services.
CMPF124 Basic Skills For Knowledge Workers Chapter 1 – Part 1 Introduction To Windows Operating Systems.
I NTRODUCTION TO N ETWORK A DMINISTRATION. W HAT IS A N ETWORK ? A network is a group of computers connected to each other to share information. Networks.
Introduction TO Network Administration
I NTRODUCTION TO N ETWORK A DMINISTRATION. W HAT IS A N ETWORK ? A network is a group of computers connected to each other to share information. Networks.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Application Layer Functionality and Protocols Abdul Hadi Alaidi
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
2. OPERATING SYSTEM 2.1 Operating System Function
VIRTUALizing the lab Richard Toeniskoetter
Grid Computing.
SUBMITTED BY: NAIMISHYA ATRI(7TH SEM) IT BRANCH
PHP / MySQL Introduction
University of Technology
Chapter 1: Introduction
Chapter 3: Windows7 Part 1.
From Prototype to Production Grid
Internet Protocols IP: Internet Protocol
APACHE WEB SERVER.
IS 4506 Configuring the FTP Service
Presentation transcript:

HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve Simms, Adam Sweeny, Peng Wang University Information Technology Services Indiana University

SC'05, Seattle, WA Problem Description Turn Windows desktop systems at IUB student labs into a scientific resource systems, 3 year replacement cycle 1.5 Teraflops Fast ethernet or better Harvest idle cycles.

SC'05, Seattle, WA Constraints Systems dedicated to students using desktop office applications — not parallel scientific computing Microsoft Windows environment Daily software rebuild Systems dedicated to students using desktop office applications — not parallel scientific computing Microsoft Windows environment Daily software rebuild

SC'05, Seattle, WA What could these systems be used for? Many small computations and a few small messages Master-worker Parameter studies Monte Carlo

SC'05, Seattle, WA Assembling small ephemeral resources Different parallel libraries have constraints of some form or the other MPI not designed to handle ephemeral resources

SC'05, Seattle, WA Solution Simple Message Brokering Library (SMBL) Limited replacement for MPI Process and Port Manager (PPM) … Plus … Condor NT Job management Web portal Job submission

SC'05, Seattle, WA The Big Picture We’ll discuss each part in more detail next… The shaded box indicates components hosted on multiple desktop computers

SC'05, Seattle, WA Portal Creates and submits Condor files, handles data files Apache based PHP web interface  (IU users)  (Teragrid users)

SC'05, Seattle, WA Condor Condor for Windows NT/2000/XP “Vanilla universe” -- no support for check- pointing or parallelism Provides: Security Match-making Fair sharing File transfer Job submission, suspension, preemption, restart

SC'05, Seattle, WA SMBL In charge of message delivery for each parallel session Client library implements selected MPI-like calls Both server and client library based on TCP socket abstraction

SC'05, Seattle, WA SMBL (Contd … ) Managing Temporary Workers SMBL server maintains a dynamic pool of client process connections Worker job manager hides details of ephemeral workers at the application level Porting from MPI is fairly straight forward

SC'05, Seattle, WA Process and Port Manager (PPM) Assigns port/host to each parallel session Starts the SMBL server and application processes on demand Directs workers to their servers

SC'05, Seattle, WA Once again … the big picture The shaded box indicates components hosted on multiple desktop computers

SC'05, Seattle, WA System Layout PPM, SMBL server and web portal running on Linux server -- Dual Intel Xeon 1.7 GHz, 2 GB memory and GigE inter-connect STC Windows worker machines -- Combination of different OS (Windows 2000/XP) and network inter-connect speeds (GigE/100 Mbps/10 Mbps)

SC'05, Seattle, WA Applications FastDNAml-p Parallel application, master-worker model, small granularity of work Provides generic interface for parallel communication library (MPI, PVM, SMBL) Reliability built in: Foreman process copes with delayed or lost workers Blast Meme

SC'05, Seattle, WA Portal

SC'05, Seattle, WA Applications – FastDNAml

SC'05, Seattle, WA FastDNAml-p Performance

SC'05, Seattle, WA Other Applications – Parallel MEME

SC'05, Seattle, WA Other Applications – BLAST

SC'05, Seattle, WA Utilization of Idle Cycles Red: total owner Blue: total idle Green: total Condor

SC'05, Seattle, WA Recent Development Hydra cluster Teragrid’ized! (Oct 2005) Allow TG users to use resource Virtual Host based solution – two different URLs for IU and Teragrid users Teragrid users authenticate against different Kerberos server (PSC) Still to-do Usage accounting

SC'05, Seattle, WA Work in Progress/Future Direction Once again … Teragrid’ization of Hydra cluster Usage Accounting – Report usage byTeragrid users New Portal – JSR 168 compliant, certificate based authentication capability Range of applications – Virtual machines, so forth

SC'05, Seattle, WA Summary Large parallel computing facility created at very low cost SMBL parallel message passing library that can deal with ephemeral resources PPM port broker that can handle multiple parallel sessions SMBL (Open Source) Home –

SC'05, Seattle, WA Links and References Hydra Portal (IU users) (Teragrid users) SMBL home page – Condor home page -- IU Teragrid home page – Parallel FastDNAml – Blast -- Meme --