Accelerated Strategic Computing Initiative Large, complex, multifaceted, highly integrated research and development effort created by US department of energy(DOE). Goal- deploy 1 TFLOP by end of 1996,10 TFLOP by 1999, and a 100 TFLOP by 2002 and all of these systems to be designed at similar costs First phase – built by Intel, also known as ASCI Option Red/ Intel TFLOPS supercomputer
Physical Features It occupies 1,600 sq. ft of floor space (this is excluding the network resources, tertiary storage and other supporting hardware) The system uses 9,216 Intel Pentium Pro processors and has over 4500 nodes 596 Gbytes of RAM - connected through a 38x32x2 mesh Two independent 1 Tbyte disk systems Have disks that can be switched so the machine can be used for both classified and unclassified computing
ASCI TOPS Hardware Massive parallel processor, Distributed memory, MIMD, message passing supercomputer All aspects of the system are scalable including aggregate communication bandwidth, the number of compute nodes, the amount of main memory, disk storage and I/O bandwidth Organized into four partitions compute, service, system, I/O
Partitions Service partition - provides integrated, scalable host that supports interactive users, application development and system administration I/O partition - supports scalable file system and network services System partition - supports system Reliability, Availability and Serviceability services Compute partition - contains nodes for floating point performance and is where parallel applications execute
SYSTEM BLOCK DIAGRAM Logical System Block Diagram for the ASCI Option Red Supercomputer. This system uses a split-plane mesh topology and has 4 partitions: System, Service, I/O and Compute. Two different kinds of node boards are used : the Eagle node and the Kestrel node. The operators console (the SPS station) is connected to an independent ethernet network that ties together patch support boards on each card cage.
ASCI SYSTEM PLAN Schematic diagram of the ASCI Option Red supercomputer as it will be installed at Sandia National Laboratories in Albuquerque NM. The cabinets near each end labeled with an X are the disconnect cabinets used to isolate one end or the other. Each end of the computer has its own I/O subsystem (the group of 5 cabinets at the bottom and the left), and their own SPS station (next to the I/O cabinets). The lines show the SCSI cables connecting the I/O nodes to the I/O cabinets. The curved line at the top of the page show the windowed-wall to the room where the machine operators will sit. The black square in the middle of the room is a support post.
PENTIUM PRO PROCESSOR Both CISC and RISC chip Peak flop rate of 200 MFLOP at 200Mhz Peak multiply rate of 100 MFLOP at 200 Mhz Includes separate on chip data and instruction L1 caches (each 8Kbytes) and an L2 cache (256 Kbytes)
EAGLE BOARD The node boards used in the I/O and system partitions are the Eagle Boards Each node includes two 200 MHz Pentium Pro processors. These two processors support two on-board PCI interfaces that each provide 133 MB/sec I/O bandwidth. Each Eagle board provides ample processing capacity and throughput to support a wide variety of high-performance I/O devices
EAGLE BOARD The ASCI Option Red Supercomputer I/O Node (Eagle Board). The NIC connects to the MRC on the backplane through the ICF Link.
KESTREL BOARD Kestrel boards are used in the compute and service partitions. Each Kestrel board holds two compute nodes. The nodes are connected through their network interface chips(NIC) with one of the NIC’s connecting to an mesh router chip (MRC) on the backplane. Each node on the Kestrel board includes its own boot support (FLASHROM and simple I/O devices) through a PCI bridge on its local bus.
KESTREL BOARD The ASCI Option Red supercomputer Kestrel Board. This board includes two compute nodes daisy-chained together through their NICs. One of the NICs connects to the MRC on the backplane through the ICF Link.
INTERCONNECTION FACILITY The interconnection facility utilizes dual plan mesh to provide better aggregate bandwidth and to support routing around mesh failures. It uses two custom components NIC and MRC Mesh Router Chip – sits on the system back plane and routes messages across the machine Network Interface Chip - resides on each node and provides an interface between the node’s memory bus and the MRC
INTERCONNECTION FACILITY ASCI Option Red Supercomputer 2 Plane Interconnection Facility (ICF). The red squares on each node board are the Network Interface Chips (NIC) while the black squares on the dual backplanes are the Mesh Router Chips (MRC). Bandwidth figures are given for NIC-MRC and MRC-MRC communication. Bi- directional bandwidths are given on the left side of the figure while uni-directional bandwidths are given on the right side. In both cases, sustainable (as opposed to peak) numbers are given in parentheses.
OPERATING SYSTEM Uses two different operating systems for different parts of the machine For service, I/O, and system partitions OS used is Intel’s distributed version of UNIX developed for the paragon XP/S supercomputer In the compute partition OS used is Cougar – a light weight kernel(LWK) LWK is based on the Puma operating system developed at Sandia National Laboratory and University of New Mexico
APPLICATIONS Provides computational and simulation capabilities which help scientists understand aging weapons, predict when components will have to be replaced, and evaluate implications of changes in materials and fabrication processes Achieve higher resolution, higher fidelity, three dimensional physics and full system modeling capabilities to reduce reliance on empirical judgments
Intel Option Red/ Intel TFLOPS supercomputer Performance Goals Deliver a sustained TeraFLOP on MP LINPACK before end of 1996 Run a yet to be defined ASCI application using all memory and all nodes by June 1997
PERFORMANCE Broke the MP-LINPACK benchmark at the rate of 1.06 TFLOPS just using 7,624 Pentium Pro's. In June 1997 when the full machine was installed it broke its own record and ran the MP-LINPACK benchmark of 1.34 TFLOPS. The system has peak performance of 1.8 TFLOPS
Noteworthy for several reasons worlds first TOPS supercomputer I/O, memory, compute nodes and communication are scalable to an extreme degree standard parallel interfaces make it simple to port parallel applications to this system the system uses two operating systems to make the computer familiar to the user (UNIX) and non-intrusive for the scalable application (Cougar) makes use of Commercial Commodity Off The Shelf (CCOTS) technology to maintain affordability
ON-GOING COMPUTING ELEMENT ASCI Blue Pacific Supercomputer ASCI Blue Mountain Supercomputer ASCI white Supercomputer (IBM) ASCI Q ASCI Purple(IBM)