Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC.

Similar presentations


Presentation on theme: "1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC."— Presentation transcript:

1 1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC

2 2 Outline n Overview of TCS, the US-NSF’s Terascale Computing System. n Answering 3 questions: w Is your machine living up to performance expectations? … w What is the MTBI? … w What is the primary complaint, if any, from users? n [See also PSC web pages & Rolf’s info.]

3 3 Q1: Performance n Computational and communications performance is very good! w Alpha processors & ES45 servers: very good w Quadrics bw & latency: very good. w ~74% of peak on Linpack; >76% on LSMS n More work on disk IO. n This has been a very ease “port” for most users. w Easier than some Cray  Cray upgrades.

4 4 Q2: MTBI (Monthly Average) Compare with theoretical prediction of 12 hrs. Expect further improvement (fixing systematic problems).

5 5 Time Lost to Unscheduled Events Purple: nodes requiring cleanup Worst case is ~3%

6 6 Q3: Complaints n #1: “I need more time” (not a complaint about performance) w Actual usage >80% of wall clock w Some structural improvements still in progress. w Not a whole lot more is possible! n Work needed on w Rogue OS activity. [recall Prof. Kale’s comment] w MPI & global reduction libraries. [ditto] w System debugging and fragility. w IO performance. wWe have delayed full disk deployment to avoid data corruption & instabilities. w Node cleanup wWe detect & hold out problem nodes until staff clean. n All in all, the users have been VERY pleased. [ditto]

7 7 Full Machine Job n This system is capable of doing big science

8 8 TCS (Terascale Computing System) & ETF n Sponsored by the U.S. National Science Foundation n Serving the “very high end” for US academic computational science and engineering w Designed to be used, as a whole, on single problems. (recall full machine job) w Full range of scientific and engineering applications. w Compaq AlphaServer SC hardware and software technology w In general production since April, 2002 n #6 in Top 500; (largest open facility in the world: Nov 2001) n TCS-1: in general production since April, 2002 n Integrated into the PACI program (Partnerships for Academic Computing Infrastructure) w DTF project to build and integrate multiple systems –NCSA, SDSC, Caltech, Argonne. Multi-lamba, transcontinental interconnect w ETF aka Teratrid (Extensible Terascale Facility) integrating TCS with DTF forming –A heterogeneous, extensible scientific/engineering cyberinfrastructure Grid

9 9 Infrastructure: PSC - TCS machine room ( @ Westinghouse) (Not require a new building; just a pipe & wire upgrade; not maxed out) n ~8k ft 2 n Use ~2.5k n Existing room. n (16 yrs old.)

10 10 Floor Layout n Geometrical constraints invariant twixt US & Japan Full System: Physical Structure

11 11 Compute Nodes Terascale Computing System Compute Nodes 750 ES45 4-CPU servers 750 ES45 4-CPU servers +13 inline spares +13 inline spares (+2 login nodes) (+2 login nodes) 4 - EV68’s /node 4 - EV68’s /node 1 GHz = 2.Gf [6 Tf] 1 GHz = 2.Gf [6 Tf] 4 GB memory [3.0 TB] 4 GB memory [3.0 TB] 3*18.2 GB disk [41 TB] 3*18.2 GB disk [41 TB] System System User temporary User temporary Fast snapshots Fast snapshots [~90 GB/s] [~90 GB/s] Tru64 Unix Tru64 Unix

12 12 n ES45 nodes ä 5 nodes per cabinet ä 3 local disks /node

13 13 Quadrics Compute Nodes Terascale Computing System Quadrics Network 2 “rails” 2 “rails” Higher bandwidth Higher bandwidth (~250 MB/s/rail) (~250 MB/s/rail) Lower latency Lower latency 2.5  s put latency 2.5  s put latency 1 NIC/node/rail 1 NIC/node/rail Federated switch (/rail) Federated switch (/rail) “Fat-tree” (bbw ~0.2 TB/s) “Fat-tree” (bbw ~0.2 TB/s) User virtual memory mapped User virtual memory mapped Hardware retry Hardware retry Heterogeneous Heterogeneous (Alpha Tru64 & Linux, Intel Linux) (Alpha Tru64 & Linux, Intel Linux)

14 14 Central Switch Assembly n 20 cabinets in center n Minimize max internode distance n 3 out of 4 rows shown n 21 st LL switch, outside (not shown)

15 15 Quadrics wiring overhead (view towards ceiling)

16 16 Quadrics Control LAN Compute Nodes Terascale Computing System Management & Control Quadrics switch control: Quadrics switch control: Internal SBC & Ethernet Internal SBC & Ethernet “Insight Manager” on PC’s “Insight Manager” on PC’s Dedicated systems Dedicated systems Cluster/node monitoring & control Cluster/node monitoring & control RMS database RMS database Ethernet & Ethernet & Serial Link Serial Link

17 17 Quadrics Control LAN Compute Nodes WAN/LAN Terascale Computing System Interactive Nodes Dedicated: 2*ES45 Dedicated: 2*ES45 +8 on compute nodes +8 on compute nodes Shared function nodes Shared function nodes User access User access Gigabit Ethernet to WAN Gigabit Ethernet to WAN Quadrics connected Quadrics connected /usr & indexed store (ISMS) /usr & indexed store (ISMS) Interactive /usr

18 18 Quadrics Control LAN Compute Nodes File Servers /tmp WAN/LAN Interactive /usr Terascale Computing System File Servers 64, on compute nodes 64, on compute nodes 0.47 TB/server [30 TB] 0.47 TB/server [30 TB] ~500 MB/s [~32 GB/s] ~500 MB/s [~32 GB/s] Temporary user storage Temporary user storage Direct IO Direct IO /tmp /tmp [Each server has [Each server has 24 disks on 24 disks on 8 SCSI chains on 8 SCSI chains on 4 controllers 4 controllers sustain full drive bw.] sustain full drive bw.]

19 19 Terascale Computing System Summary 750 + ES45 Compute Nodes 750 + ES45 Compute Nodes 3000 EV68 CPU’s @ 1 GHz 3000 EV68 CPU’s @ 1 GHz 6 Tf 6 Tf 3. TB memory 3. TB memory 41 TB node disk, ~90GB/s 41 TB node disk, ~90GB/s Multi-rail fat-tree network Multi-rail fat-tree network Redundant monitor/ctrl Redundant monitor/ctrl WAN/LAN accessible WAN/LAN accessible File servers: 30TB, ~32 GB/s File servers: 30TB, ~32 GB/s Buffer disk store, ~150 TB Buffer disk store, ~150 TB Parallel visualization Parallel visualization Mass store, ~1 TB/hr, > 1 PB Mass store, ~1 TB/hr, > 1 PB ETF coupled (hetero) ETF coupled (hetero) Quadrics Control LAN Compute Nodes File Servers /tmp WAN/LAN Interactive /usr

20 20 Quadrics Terascale Computing System TCS Application Gateways VizBuffer Disk 340 GB/s (1520Q) 4.5 GB/s (20Q) 3.6 GB/s (16Q) Visualization Intel/Linux Intel/Linux Newest software Newest software ~16 nodes ~16 nodes Parallel rendering Parallel rendering HW/SW compositing HW/SW compositing Quadrics connectedQuadrics connected Image output Image output  Web pages +  Web pages + WAN coupled

21 21 Buffer Disk & HSM n Quadrics coupled (~225 MB/s/link) n Intermediate between TCS & HSM n Independently managed. n Private transport from TCS. Quadrics Terascale Computing System TCS Application Gateways VizBuffer Disk 340 GB/s (1520Q) 4.5 GB/s (20Q) 3.6 GB/s (16Q) HSM - LSCi >360 MB/s to tape Archive disk WAN/LAN & SDSC

22 22 Application Gateways n Quadrics coupled (~225 MB/s/link) Coupled to ETF backbone by GigE 30 Gb/s Quadrics Terascale Computing System TCS Application Gateways VizBuffer Disk 340 GB/s (1520Q) 4.5 GB/s (20Q) 3.6 GB/s (16Q) Multi GigE to ETF Backbone @ 30 Gb/s

23 23 The Front Row n Yes, those are Pittsburgh sports’ colors.


Download ppt "1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC."

Similar presentations


Ads by Google