Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Slides:



Advertisements
Similar presentations
Clusters, Grids and their applications in Physics David Barnes (Astro) Lyle Winton (EPP)
Advertisements

Jeffrey P. Gardner Pittsburgh Supercomputing Center
Computing Infrastructure
© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.
1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.
Hardware & the Machine room Week 5 – Lecture 1. What is behind the wall plug for your workstation? Today we will look at the platform on which our Information.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Altix ccNUMA Architecture Distributed Memory - Shared address space.
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Today’s topics Single processors and the Memory Hierarchy
Information Technology Center Introduction to High Performance Computing at KFUPM.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
History of Distributed Systems Joseph Cordina
Novell Server Linux vs. windows server 2008 By: Gabe Miller.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Simo Niskala Teemu Pasanen
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace.
Introduction to Computers Personal Computing 10. What is a computer? Electronic device Performs instructions in a program Performs four functions –Accepts.
Computer System Architectures Computer System Software
Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Sun Fire™ E25K Server Keith Schoby Midwestern State University June 13, 2005.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
CCS Overview Rene Salmon Center for Computational Science.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC.
Interconnection network network interface and a case study.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Outline Why this subject? What is High Performance Computing?
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
April 23, 2002 Parallel Programming Techniques Intro to PSC Tom Maiden
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Background Computer System Architectures Computer System Software.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
CLAN A High Performance, Dual Operating System Compute Server Based on Intel Processors Kathy Benninger John Kochmar Mike Levine Juan Leon Paul Nowoczynski.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
Clouds , Grids and Clusters
Jeffrey P. Gardner Pittsburgh Supercomputing Center
Super Computing By RIsaj t r S3 ece, roll 50.
BlueGene/L Supercomputer
TeraScale Supernova Initiative
Presentation transcript:

Microsoft Keyboard

Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Pittsburgh Supercomputing Center Who We Are Cooperative effort of –Carnegie Mellon University –University of Pittsburgh –Westinghouse Electric Research Department of Carnegie Mellon Offices in Mellon Institute, Oakland –On CMU campus –Adjacent to University of Pittsburgh campus.

Westinghouse Electric Company Energy Center, Monroeville, PA

Agenda HPC Clusters Large Scale Clusters Commodity Clusters Cluster Software Grid Computing

TOP500 Benchmark Completed October 1, 2001 MayAugustDecember FebruaryAprilMayAugust October 2000 March 2001 August - October 2001

Three Systems in the Top 500 HP AlphaServer SC ES40 TCSINI Ranked 246 with GFlops Linpack Performance Cray T3E900 Jaromir Ranked 182 with 341 GFlops Linpack Performance HP AlphaServer SC ES45 LeMieux Ranked 6 with TFlops Linpack Performance Top Academic System

Cluster Node Count RankInstallation SiteNodes 1Earth Simulator Center640 2Los Alamos National Laboratory1024 3Los Alamos National Laboratory1024 4Lawrence Livermore National Laboratory512 5Lawrence Livermore National Laboratory128 6Pittsburgh Supercomputing Center750 7Commissariat a l'Energie Atomique680 8Forecast Systems Laboratory - NOAA768 9HPCx40 10National Center for Atmospheric Research40

One Year of Production lemieux.psc.edu

Its Really all About Applications Single CPU with common data stream Large Shared Memory Jobs Multi-CPU Jobs …but, lets talk systems!

HPC Systems Architectures

HPC Systems Larger SMPs MPP- Massively Parallel Machines Non Uniform Memory Access (NUMA) machines Clusters of smaller machines

Larger SMPs Pros: –Use existing technology and management techniques –Maintain parallelization paradigm (threading) –Its what users really want! Cons: –Cache coherency gets difficult –Increased resource contention –Pin counts add up –Increased incremental cost

HPC Clusters Rationale –If one box, cant do it, maybe 10 can… –Commodity hardware is advancing rapidly –Potentially far less costly than a single larger system –Big systems are only so big

HPC Clusters Central Issues –Management of multiple systems –Performance Within each node Interconnections –Effects on parallel programming methodology Varying communication characteristics

The Next Contender? CPU 128 Bit CPU System Clock Frequency MHz 32MB Main Memory direct RDRAM Embedded Cache VRAM 4MB I/O Processor CD-ROM and DVD-ROM

Why not let everyone play?

Whats a Cluster? Base Hardware Commodity Nodes –Single, Dual, Quad, ??? –Intel, AMD –Switch port cost vs cpu Interconnect –Bandwidth –Latency Storage –Node local –Shared filesystem

Terascale Computing System Hardware Summary 750 ES45 Compute Nodes 3000 EV68 1 GHz 6 Tflop 3 TB memory 41 TB node disk, ~90GB/s Multi-rail fat-tree network Redundant Interactive nodes Redundant monitor/ctrl WAN/LAN accessible File servers: 30TB, ~32 GB/s Mass Store buffer disk, ~150 TB Parallel visualization ETF coupled Quadrics Control LAN Compute Nodes File Servers /tmp WAN/LAN Interactive /usr

Compute Nodes AlphaServer ES45 –5 nodes per cabinet –3 local disks /node

Row upon row…

PSC/HP Grid Alliance A strategic alliance to demonstrate the potential of the National Science Foundation's Extensible TeraGrid 16 Node HP Itanium2/Linux cluster Through this collaboration, PSC and HP expect to further the TeraGrid goals of enabling scalable, open source, commodity computing on IA64/Linux to address real-world problems

Whats a Cluster? Base Hardware Commodity Nodes –Single, Dual, Quad, ??? –Switch port cost vs cpu Interconnect –Bandwidth –Latency Storage –Node local –Shared filesystem

Cluster Interconnect Low End 10/100 Mbit Ethernet –Very cheap –Slow with High Latency Gigabit Ethernet –Sweet Spot –Especially with: Channel Bonding Jumbo Frames

Cluster Interconnect, cont. Mid-Range Myrinet – –High speed with Good (not great) latency –High port count switches –Well adopted and supported in the Cluster Community Infiniband –Emerging –Should be inexpensive and pervasive

Cluster Interconnect, cont. Outta Sight! Quadrics Elan – –Very High Performance Great Speed Spectacular Latency –Software RMS QSNET –Becoming more Commodity

way switch. (4096 & 8192-way same but bigger!) Switches.. 8: 8*(16-way) 8-16: 64U64D (13 for TCS) Federated switch

Overhead Cables

n Fully wired switch cabinet n 1 of 24. n Wires up & down Wiring: Quadrics

Whats a Cluster? Base Hardware Commodity Nodes –Single, Dual, Quad, ??? –Switch port cost vs cpu Interconnect –Bandwidth –Latency Storage –Node local –Shared filesystem

Commodity Cache Servers Linux Custom Software –libtcom/tcsiod –Coherency Manager (SLASH) Special Purpose DASP –Connection to Outside –Multi-Protocol *ftp SRB Globus 3Ware SCSI/ATA Disk Controllers

Whats a Cluster? System Software Installation Replication Consistency Parallel File System Resource Management Job Control

Installation Replication Consistency

Users Job Management Software queues submit Batch Job Management Simon scheduler TCS scheduling practices PBS/RMS Job invocation usage accounting database Monitoring Whats next? supply process distribution execution, control PSC NSF Visualization Nodes tcscomm Checkpoint / restart user file servers HSM tcscopy / hsm tcscopy requeue tcscomm node event management call tracking and field service db user notification demand Compute Nodes CPRCPR CPRCPR CPRCPR CPRCPR PSC Terascale Computing System

Monitoring Non-Contiguous Scheduling

Whats a Cluster? Application Support Parallel Execution MPI – Shared Memory Other… –Portals –Global Arrays

Building Your Cluster Pre-Built –PSSC – Chemistry –Tempest Roll-your-Own –Campus Resources –Web Use PSC –Rich Raymond –

Open Source Cluster Application Resources Cluster on a CD – automates cluster install process Wizard driven Nodes are built over network OSCAR <= 64 node clusters for initial target Works on PC commodity components RedHat based (for now) Components: Open source and BSD style license NCSA Cluster in a Box base Enable application scientists to build and manage their own resources –Hardware cost is not the problem –System Administrators cost money, and do not scale –Software can replace much of the day-to-day grind of system administration Train the next generation of users on loosely coupled parallel machines –Current price-performance leader for HPC –Users will be ready to step up to NPACI (or other) resources when needed Rocks scales to Top500 sized resources –Experiment on small clusters –Build your own supercomputer with the same software! scary technology

GriPhyN and European DataGrid Virtual Data Tools Request Planning and Scheduling Tools Request Execution Management Tools Transforms Distributed resources (code, storage, computers, and network) Resource Management Services Resource Management Services Security and Policy Services Security and Policy Services Other Grid Services Other Grid Services Interactive User Tools Production Team Individual InvestigatorOther Users Raw data source Illustration courtesy C. Catlett, ©2001 Global Grid Forum

Extensible Terascale Facility - ETF "TeraGrid"

Grid Building Blocks Middleware: Hardware and software infrastructure to enable access to computational resources Services: Security Information Services Resource Discovery / Location Resource Management Fault Tolerance / Detection

Thank You lemieux.psc.edu