Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Similar presentations


Presentation on theme: "Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)"— Presentation transcript:

1 Microsoft Keyboard

2 Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

3 Pittsburgh Supercomputing Center Who We Are Cooperative effort of –Carnegie Mellon University –University of Pittsburgh –Westinghouse Electric Research Department of Carnegie Mellon Offices in Mellon Institute, Oakland –On CMU campus –Adjacent to University of Pittsburgh campus.

4 Westinghouse Electric Company Energy Center, Monroeville, PA

5 Agenda HPC Clusters Large Scale Clusters Commodity Clusters Cluster Software Grid Computing

6 TOP500 Benchmark Completed October 1, 2001 MayAugustDecember FebruaryAprilMayAugust 1999 2000 October 2000 March 2001 August - October 2001

7 Three Systems in the Top 500 HP AlphaServer SC ES40 TCSINI Ranked 246 with 263.6 GFlops Linpack Performance Cray T3E900 Jaromir Ranked 182 with 341 GFlops Linpack Performance HP AlphaServer SC ES45 LeMieux Ranked 6 with 4.463 TFlops Linpack Performance Top Academic System

8 Cluster Node Count RankInstallation SiteNodes 1Earth Simulator Center640 2Los Alamos National Laboratory1024 3Los Alamos National Laboratory1024 4Lawrence Livermore National Laboratory512 5Lawrence Livermore National Laboratory128 6Pittsburgh Supercomputing Center750 7Commissariat a l'Energie Atomique680 8Forecast Systems Laboratory - NOAA768 9HPCx40 10National Center for Atmospheric Research40

9 One Year of Production lemieux.psc.edu

10 Its Really all About Applications Single CPU with common data stream –seti@home Large Shared Memory Jobs Multi-CPU Jobs …but, lets talk systems!

11 HPC Systems Architectures

12 HPC Systems Larger SMPs MPP- Massively Parallel Machines Non Uniform Memory Access (NUMA) machines Clusters of smaller machines

13 Larger SMPs Pros: –Use existing technology and management techniques –Maintain parallelization paradigm (threading) –Its what users really want! Cons: –Cache coherency gets difficult –Increased resource contention –Pin counts add up –Increased incremental cost

14 HPC Clusters Rationale –If one box, cant do it, maybe 10 can… –Commodity hardware is advancing rapidly –Potentially far less costly than a single larger system –Big systems are only so big

15 HPC Clusters Central Issues –Management of multiple systems –Performance Within each node Interconnections –Effects on parallel programming methodology Varying communication characteristics

16 The Next Contender? CPU 128 Bit CPU System Clock Frequency 294.912 MHz 32MB Main Memory direct RDRAM Embedded Cache VRAM 4MB I/O Processor CD-ROM and DVD-ROM

17 Why not let everyone play?

18 Whats a Cluster? Base Hardware Commodity Nodes –Single, Dual, Quad, ??? –Intel, AMD –Switch port cost vs cpu Interconnect –Bandwidth –Latency Storage –Node local –Shared filesystem

19 Terascale Computing System Hardware Summary 750 ES45 Compute Nodes 3000 EV68 CPUs @ 1 GHz 6 Tflop 3 TB memory 41 TB node disk, ~90GB/s Multi-rail fat-tree network Redundant Interactive nodes Redundant monitor/ctrl WAN/LAN accessible File servers: 30TB, ~32 GB/s Mass Store buffer disk, ~150 TB Parallel visualization ETF coupled Quadrics Control LAN Compute Nodes File Servers /tmp WAN/LAN Interactive /usr

20 Compute Nodes AlphaServer ES45 –5 nodes per cabinet –3 local disks /node

21 Row upon row…

22 PSC/HP Grid Alliance A strategic alliance to demonstrate the potential of the National Science Foundation's Extensible TeraGrid 16 Node HP Itanium2/Linux cluster Through this collaboration, PSC and HP expect to further the TeraGrid goals of enabling scalable, open source, commodity computing on IA64/Linux to address real-world problems

23 Whats a Cluster? Base Hardware Commodity Nodes –Single, Dual, Quad, ??? –Switch port cost vs cpu Interconnect –Bandwidth –Latency Storage –Node local –Shared filesystem

24

25 Cluster Interconnect Low End 10/100 Mbit Ethernet –Very cheap –Slow with High Latency Gigabit Ethernet –Sweet Spot –Especially with: Channel Bonding Jumbo Frames

26 Cluster Interconnect, cont. Mid-Range Myrinet –http://www.myrinet.com/http://www.myrinet.com/ –High speed with Good (not great) latency –High port count switches –Well adopted and supported in the Cluster Community Infiniband –Emerging –Should be inexpensive and pervasive

27 Cluster Interconnect, cont. Outta Sight! Quadrics Elan –http://www.quadrics.com/http://www.quadrics.com/ –Very High Performance Great Speed Spectacular Latency –Software RMS QSNET –Becoming more Commodity

28

29 512-1024 way switch. (4096 & 8192-way same but bigger!) Switches.. 8: 8*(16-way) 8-16: 64U64D 8 8-16 (13 for TCS) Federated switch

30 Overhead Cables

31 n Fully wired switch cabinet n 1 of 24. n Wires up & down Wiring: Quadrics

32 Whats a Cluster? Base Hardware Commodity Nodes –Single, Dual, Quad, ??? –Switch port cost vs cpu Interconnect –Bandwidth –Latency Storage –Node local –Shared filesystem

33

34

35 Commodity Cache Servers Linux Custom Software –libtcom/tcsiod –Coherency Manager (SLASH) Special Purpose DASP –Connection to Outside –Multi-Protocol *ftp SRB Globus 3Ware SCSI/ATA Disk Controllers

36 Whats a Cluster? System Software Installation Replication Consistency Parallel File System Resource Management Job Control

37 Installation Replication Consistency

38 Users Job Management Software queues submit Batch Job Management Simon scheduler TCS scheduling practices PBS/RMS Job invocation usage accounting database Monitoring Whats next? supply process distribution execution, control PSC NSF Visualization Nodes tcscomm Checkpoint / restart user file servers HSM tcscopy / hsm tcscopy requeue tcscomm node event management call tracking and field service db user notification demand Compute Nodes CPRCPR CPRCPR CPRCPR CPRCPR PSC Terascale Computing System

39

40

41 Monitoring Non-Contiguous Scheduling

42

43

44 Whats a Cluster? Application Support Parallel Execution MPI –http://www.mpi-forum.org/http://www.mpi-forum.org/ Shared Memory Other… –Portals –Global Arrays

45 Building Your Cluster Pre-Built –PSSC – Chemistry –Tempest Roll-your-Own –Campus Resources –Web Use PSC –Rich Raymond (raymond@psc.edu)raymond@psc.edu –http://www.psc.edu/homepage_files/state_funding.html

46 Open Source Cluster Application Resources Cluster on a CD – automates cluster install process Wizard driven Nodes are built over network OSCAR <= 64 node clusters for initial target Works on PC commodity components RedHat based (for now) Components: Open source and BSD style license NCSA Cluster in a Box base www.oscar.sourgeforge.net Enable application scientists to build and manage their own resources –Hardware cost is not the problem –System Administrators cost money, and do not scale –Software can replace much of the day-to-day grind of system administration Train the next generation of users on loosely coupled parallel machines –Current price-performance leader for HPC –Users will be ready to step up to NPACI (or other) resources when needed Rocks scales to Top500 sized resources –Experiment on small clusters –Build your own supercomputer with the same software! www.rockscluster.org scary technology

47

48 GriPhyN and European DataGrid Virtual Data Tools Request Planning and Scheduling Tools Request Execution Management Tools Transforms Distributed resources (code, storage, computers, and network) Resource Management Services Resource Management Services Security and Policy Services Security and Policy Services Other Grid Services Other Grid Services Interactive User Tools Production Team Individual InvestigatorOther Users Raw data source Illustration courtesy C. Catlett, ©2001 Global Grid Forum

49

50 Extensible Terascale Facility - ETF "TeraGrid"

51 Grid Building Blocks Middleware: Hardware and software infrastructure to enable access to computational resources Services: Security Information Services Resource Discovery / Location Resource Management Fault Tolerance / Detection

52 www.globus.org

53 Thank You lemieux.psc.edu


Download ppt "Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)"

Similar presentations


Ads by Google