Presented by Open Source Cluster Application Resources (OSCAR) Stephen L. Scott Thomas Naughton Geoffroy Vallée Network and Cluster Computing Computer.

Slides:



Advertisements
Similar presentations
Experiences with GridWay on CRO NGI infrastructure / EGEE User Forum 2009 Experiences with GridWay on CRO NGI infrastructure Emir Imamagic, Srce EGEE User.
Advertisements

Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Cluster Computing Applications Project: Parallelizing BLAST The field of Bioinformatics needs faster string matching algorithms. What Exactly is BLAST?
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.
Simo Niskala Teemu Pasanen
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… John L. Mugler Stephen L. Scott Oak Ridge National Laboratory.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
OnTimeMeasure Integration with Gush Prasad Calyam, Ph.D. (PI) Tony Zhu (Software Programmer) Alex Berryman (REU Student) GEC10 Selected.
Open Source Cluster Applications Resources. Overview What is O.S.C.A.R.? History Installation Operation Spin-offs Conclusions.
High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.
SC04 Release, API Discussions, SDK, and FastOS Al Geist August 26-27, 2004 Chicago, ILL.
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
DISTRIBUTED COMPUTING
Bright Cluster Manager Advanced cluster management made easy Dr Matthijs van Leeuwen CEO Bright Computing Mark Corcoran Director of Sales Bright Computing.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… Stephen L. Scott Oak Ridge National Laboratory Computer.
CoG Kit Overview Gregor von Laszewski Keith Jackson.
Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Jan 25-26, 2005 Washington D.C.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
A View from the Top Preparing for Review Al Geist February Chicago, IL.
Working Group updates, SSS-OSCAR Releases, API Discussions, External Users, and SciDAC Phase 2 Al Geist May 10-11, 2005 Chicago, ILL.
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Presented by Reliability, Availability, and Serviceability (RAS) for High-Performance Computing Stephen L. Scott and Christian Engelmann Computer Science.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
Progress on Release, API Discussions, Vote on APIs, and Quarterly Report Al Geist May 6-7, 2004 Chicago, ILL.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.
An Overview of Berkeley Lab’s Linux Checkpoint/Restart (BLCR) Paul Hargrove with Jason Duell and Eric.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
ArcGIS Server for Administrators
N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1October 9, 2001 Sun in Scientific & Engineering Computing Grid Computing with Sun Wolfgang Gentzsch Director Grid Computing Cracow Grid Workshop, November.
OS and System Software for Ultrascale Architectures – Panel Jeffrey Vetter Oak Ridge National Laboratory Presented to SOS8 13 April 2004 ack.
Presented by Open Source Cluster Application Resources (OSCAR) Stephen L. Scott Thomas Naughton Geoffroy Vallée Computer Science Research Group Computer.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Oak Ridge National Laboratory -- U.S. Department of Energy 1 SSS Deployment using OSCAR John Mugler, Thomas Naughton & Stephen Scott May 2005, Argonne,
Cluster Software Overview
PerfSONAR-PS Functionality February 11 th 2010, APAN 29 – perfSONAR Workshop Jeff Boote, Assistant Director R&D.
A View from the Top Al Geist June Houston TX.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Process Manager Specification Rusty Lusk 1/15/04.
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
ENABLING HIGHLY AVAILABLE GRID SITES Michael Bryant Louisiana Tech University.
SciDAC CS ISIC Scalable Systems Software for Terascale Computer Centers Al Geist SciDAC CS ISIC Meeting February 17, 2005 DOE Headquarters Research sponsored.
OSCAR Symposium – Quebec City, Canada – June 2008 Proposal for Modifications to the OSCAR Architecture to Address Challenges in Distributed System Management.
XtreemOS IP project is funded by the European Commission under contract IST-FP Scientific coordinator Christine Morin, INRIA Presented by Ana.
J2EE Platform Overview (Application Architecture)
GWE Core Grid Wizard Enterprise (
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Computing Experience…
Distributed System Concepts and Architectures
Scalable Systems Software for Terascale Computer Centers
Grid Aware: HA-OSCAR By
Presentation transcript:

Presented by Open Source Cluster Application Resources (OSCAR) Stephen L. Scott Thomas Naughton Geoffroy Vallée Network and Cluster Computing Computer Science and Mathematics Division

2 Scott_OSCAR_0611 OSCAR Snapshot of best known methods for building, programming and using clusters. Open Source Cluster Application Resources Consortium of academic, research and industry members.

3 Scott_OSCAR_0611 Over 5-years of OSCAR January 2000 Concept first discussed April 2000 Organizational meeting  Cluster assembly is time consuming and repetitive  Nice to offer a toolkit to automate  Leverage wealth of open source components April 2001 First public release OSCAR V5.0 Nov 2006 Released at SC06

4 Scott_OSCAR_0611 What does OSCAR do?  Wizard based cluster software installation  Operating system  Cluster environment  Automatically configures cluster components  Increases consistency among cluster builds  Reduces time to build / install a cluster  Reduces need for expertise

5 Scott_OSCAR_0611 Design goals Modular meta- package system / API – “OSCAR Packages” Keep it simple for package authors Open Source to foster reuse and community participation Fosters “spin-offs” to reuse OSCAR framework Native package systems Existing distributions Management, system and applications Keep the interface simple Provide basic operations of cluster software and node administration Enable others to re-use and extend system – deployment tool Extensibility for new Software and Projects Leverage “best practices” whenever possible Reduce overhead for cluster management

6 Scott_OSCAR_0611 OSCAR overview Framework for cluster management Simplifies installation, configuration and operation Reduces time/learning curve for cluster build – Requires: pre-installed headnode with supported Linux distribution – Thereafter: wizard guides user through setup/install of entire cluster Package-based framework Content: Software + Configuration, Tests, Docs Types: – Core: SIS, C3, Switcher, ODA, OPD, APItest, Support Libs – Non-core: selected & third-party (PVM, LAM/MPI, Toque/Maui,...) Access: repositories accessible via OPD/OPDer

7 Scott_OSCAR_0611 OSCAR packages  Simple way to wrap software & configuration  “Do you offer package Foo version X?”  Basic Design goals  Keep simple for package authors  Modular packaging (each self contained)  Timely release/updates  Leverage RPM + meta file + scripts, tests, docs, …  Recently extended to better support RPM, Debs, etc.  Repositories for downloading via OPD/OPDer

8 Scott_OSCAR_0611 OSCAR – Cluster Installation Wizard Step 1 Step 2 Step 3 Step 4 Step 6 Step 5 Step 7 Step 8 Done! Start Cluster deployment monitor Cluster deployment monitor

9 Scott_OSCAR_0611 Administration /Configuration HPC Services/ Tools OSCAR components  SIS, C3, OPIUM, Kernel-Picker & cluster services (dhcp, nfs, ntp,...)  Security: Pfilter, OpenSSH Core Infrastructure/ Management  Parallel Libs: MPICH, LAM/MPI, PVM, Open MPI  OpenPBS/MAUI, Torque, SGE  HDF5  Ganglia, Clumon  Other 3 rd party OSCAR Packages  System Installation Suite (SIS), Cluster Command & Control (C3), Env-Switcher  OSCAR DAtabase (ODA), OSCAR Package Downloader (OPD)

10 Scott_OSCAR_0611 C3 Power Tools  Command-line interface for cluster system administration and parallel user tools  Parallel execution cexec  Execute across a single cluster or multiple clusters at same time  Scatter/gather operations cpush / cget  Distribute or fetch files for all node(s)/cluster(s)  Used throughout OSCAR  mechanism for cluster wide operations

11 Scott_OSCAR_0611 Highlights OSCAR v5.0 Improved distribution and architecture support Node installation monitor New network setup options and other GUI enhancements “Use Your Own Kernel” (UYOK) for SystemImager New OSCAR Packages: – Open MPI, SC3, SGE, YUME, NetbootMgr Supported platforms: x86: fc4, fc5, mdv2006, rhel4, suse10.0 (tentative) x86_64: fc4, fc5, rhel4 Diskless OSCAR & LiveCD OSCAR on Debian (OoD) OSCAR Command Line Interface (CLI) Native package management – network installation In progress

12 Scott_OSCAR_0611 OSCAR: proven scalability Selected machines registered at OSCAR website Based on data taken on 11/2/2006:  OSCAR Cluster Registration Page  ORNL OIC User Guide Endeavor232 nodes with 928 CPUs ORNL OIC440 nodes with 880 CPUs McKenzie264 nodes with 528 CPUs SUN-CLUSTER128 nodes with 512 CPUs Cacau205 nodes with 410 CPUs Barossa184 nodes with 368 CPUs Smalley66 nodes with 264 CPUs OSCAR-SSC130 nodes with 130 CPUs

13 Scott_OSCAR_0611 More OSCAR information… Open Cluster Group Home Page Development Page Mailing Lists oscar.OpenClusterGroup.org svn.oscar.openclustergroup.org/trac/oscar OSCAR OSCAR Research supported by the Mathematics, Information and Computational Sciences Office, Office of Advanced Scientific Computing Research, Office of Science, U. S. Department of Energy, under contract No. DE-AC05-00OR22725 with UT-Battelle, LLC. OSCAR Symposium

14 Scott_OSCAR_0611 HA- OSCAR NEC's OSCAR- Pro SSS- OSCAR SSI- OSCAR OSCAR “flavors”

15 Scott_OSCAR_0611 HA-OSCAR:  The first known field-grade open source HA Beowulf cluster release  Self-configuration Multi-head Beowulf system  HA and HPC clustering techniques to enable critical HPC infrastructure  Services: Active/Hot Standby  Self-healing with 3-5 sec automatic failover time RAS Management for HPC cluster: Self-Awareness

16 Scott_OSCAR_0611 NEC's OSCAR-Pro Presented at OSCAR'06 keynote by Erich Focht (NEC) Presented at OSCAR'06 keynote by Erich Focht (NEC) Commercial Enhancements Leverage open source tool Joined project / contributions to OSCAR core Integrate additions when applicable Feedback and direction based on user needs

17 Scott_OSCAR_0611 Scalable System Software Computer centers use incompatible, ad hoc set of systems tools Tools are not designed to scale to multi-Teraflop systems Duplication of work to try and scale tools System growth vs. Administrator growth Define standard interfaces for system components Create scalable, standardized management tools (Subsequently) reduce costs & improve efficiency at centers DOE Labs: ORNL, ANL, LBNL, PNNL, SNL, LANL, Ames Academics: NCSA, PSC, SDSC Industry: IBM, Cray, Intel, SGI Problems Goals Participants

18 Scott_OSCAR_0611 SSS project overview Map out functional areas Standardize the system interfaces  Schedulers, job managers  System monitors  Accounting and user management  Checkpoint/restart  Build and configure systems  Open forum of universities, labs, industry representatives  Define component interfaces in XML  Develop communication infrastructure

Beckerman_ Beckerman_Biomed_0611 Components written in any mixture of C, C++, Java, Perl, and Python can be integrated into the Scalable Systems Software Suite Node state manager Meta scheduler Meta monitor Meta manager Standard XML interfaces Meta services Service directory Event manager Allocation management System and job monitor Accounting Node configuration and build manager Usage reports Process manager Checkpoint restart Hardware infrastructure manager Authentication Communication Scheduler Job queue manager

20 Scott_OSCAR_0611 SSS-OSCAR Components Bamboo BLCR Gold MAUI-SSS SSSLib Warehouse MPD2 LAM/MPI (w/ BLCR) Queue/Job Manager Berkeley Checkpoint/Restart Accounting & Allocation Management System Checkpoint/Restart enabled MPI Job Scheduler SSS Communication library Includes: SD, EM, PM, BCM, NSM, NWI Distributed System Monitor MPI Process Manager

21 Scott_OSCAR_0611 Single System Image Open Source Application Resources (SSI-OSCAR)  Easy use thanks to SSI systems  SMP illusion  High performance  Fault Tolerance  Easy management thanks to OCSAR  Automatic cluster install/update

Beckerman_ Beckerman_Biomed_0611 Contacts Thomas Naughton Network and Cluster Computing Computer Science and Mathematics Division (865) Stephen L. Scott Network and Cluster Computing Computer Science and Mathematics Division (865) Geoffroy Vallée Network and Cluster Computing Computer Science and Mathematics Division (865) Scott_OSCAR_0611