Presentation is loading. Please wait.

Presentation is loading. Please wait.

Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager 925-423-3141 ICCD ADH for Advanced Technology Lawrence Livermore.

Similar presentations


Presentation on theme: "Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager 925-423-3141 ICCD ADH for Advanced Technology Lawrence Livermore."— Presentation transcript:

1 Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager seager@llnl.gov 925-423-3141 ICCD ADH for Advanced Technology Lawrence Livermore National Laboratory This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48.

2 Q1: What is unique in structure and function of your machine?  Purple’s unique structure is fat SMPs with 16 rails of Federation interconnect  MCR+ALC’s unique structure is the shared global file system  However, most important point is that applications are highly mobile between Purple, MCR+ALC, White, Q and other clusters of SMP systems…

3 Purple’s unique structure is fat SMPs with 16 rails of interconnect Purple System 100 TF/s + 30-45 TF/s delivered on sPPM+UMT2000 50 TB memory, 2.0 PB of disk @ 108 GB/s delivered 197 x 64-way Armada SMP w 16 Federation Links 4 Login/network nodes Login/network nodes for login/NFS 8x10 Gb/s for parallel FTP on each Login All external networking is 1-10 Gb/s Ethernet Clustered I/O services for cluster wide file system Fibre Channel2 I/O attach does not extend Programming/Usage Model Application launch over all compute nodes up to 8,192 tasks 1 MPI task/CPU and Shared Memory, full 64b support Scalable MPI (MPI_allreduce, buffer space) Likely usage multiple MPI tasks/node with 4-16 OpenMP/MPI task Single STDIO interface Parallel I/O to single file, multiple serial I/O (1 file/MPI task) 191 Parallel Batch/Interactive/Visualization Nodes System Data and Control Networks … I/O … NFS Login Net NFS Login Net NFS Login Net NFS Login Net 16 Federation links per SMP in four switch planes I/O Fibre Channel 2 I/O Network

4 Unique feature of ALC+MCR is Lustre Lite shared file system † † Cluster wide file system leverages DOE/NNSA ASCI PathForward Open Source Lustre development OST QsNet Elan3, 100BaseT Control 1,116 P4 Compute Nodes 2 Login nodes with 4 Gb-Enet 2 Service Aggregated OST for Single Lustre file system GW 2 MetaData (fail-over) Servers 32 Gateway nodes @ 140 MB/s delivered Lustre I/O over 2x1GbE GW 1,152 Port (10x96D32U+4x96D32U) QsNet Elan3 MDS 2 Service QsNet Elan3, 100BaseT Control 924 P4 Compute Nodes 960 Port (10x96D32U+4x80D48U) QsNet Elan3 GW MDS GbEnet Federated Switch OST GbEnet Federated Switch

5 Q2: What characterizes your applications? Examples are: Intensities of message passing, memory utilization, computing, IO, and data.  Applications characterized as multi-physics package simulations  All applications compute/comms intensive  Each package pushes performance envelope along a different dimension –Some packages are MPI latency dominated –Some packages are MPI BW dominated –Memory BW is critical factor, but expensive memory subsystems don’t perform much better than commodity ones…

6 Q3: What prior experience guided you to this choice?  Mission and Applications  Budgets  Politics  Delivered performance  Balanced risk and cost performance

7 Strategic Approach: straddle multiple curves to balance risk and opportunity of new disruptive technologies  Three complementary curves… 1.Delivers to today’s stockpile’s demanding needs  Production environment  For “must have” deliverables now 2.Delivers transition for next generation  “Near production” environment  Provides cycles for science  Provides cycles for stockpile  Leading to next generation production systems  These are the capacity systems in a strategic capacity/capability mix 3.Delivers affordable path to petaFLOP/s  Research environment, leading transition to petaflop systems?  Are there other paths to a breakthrough regime by 2006-7? Performance Time Mainframes (RIP) Vendor integrated SMP Cluster (IBM SP, HP SC) IA32/ IA64/AMD + Linux Cell-Based (IBM BG/L) Today FY05 Straddle strategy for stability and preeminence $2M/TF (Purple C) $1.2M/TF (MCR) $170K/TF $10 M/TF (White) $7M/TF (Q) $ 500K /TF Any given technology curve is ultimately limited by Moore’s Law

8 Q4. Other than your own machine, for your needs what are the best and worst machines? And, why?  Clusters of SMPs with full node OS makes system administration and programming much easier, but scalability is an issue  Vectors suck –10x potential speed-up from vectorization on Cray YMP class machines yielded only 1.5-2x in delivered performance boost to stockpile codes


Download ppt "Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager 925-423-3141 ICCD ADH for Advanced Technology Lawrence Livermore."

Similar presentations


Ads by Google