An Introduction to the Jeffrey P. Gardner Pittsburgh Supercomputing Center

2 National Science Foundation TeraGrid
The world's largest collection of supercomputers

3 Pittsburgh Supercomputing Center
Founded in 1986 Joint venture between Carnegie Mellon University, University of Pittsburgh, and Westinghouse Electric Co. Funded by several federal agencies as well as private industries. Main source of support is National Science Foundation

4 Pittsburgh Supercomputing Center
PSC is the third largest NSF sponsored supercomputing center BUT we provide over 60% of the computer time used by the NSF research AND PSC most recently had the most powerful supercomputer in the world (for unclassified research)

5 Pittsburgh Supercomputing Center
SCALE: 3000 processors SIZE: 1 basketball court COMPUTING POWER: 6 TeraFlops (6 trillion floating point operations per second) Will do in 3 hours what a PC will do in a year The Terascale Computing System (TCS) at the Pittsburgh Supercomputing Center Upon entering production in October 2001, the TCS was the most powerful computer in the world for unclassified research

6 Pittsburgh Supercomputing Center
HEAT GENERATED: 2.5 million BTUs (169 lbs of coal per hour) AIR CONDITIONING: 900 gallons of water per minute (375 room air conditioners) BOOT TIME: ~3 hours The Terascale Computing System (TCS) at the Pittsburgh Supercomputing Center Upon entering production in October 2001, the TCS was the most powerful computer in the world for unclassified research

7 Pittsburgh Supercomputing Center
Boulder, CO

8 NCSA: National Center for Super-computing Applications
SCALE: 1774 processors ARCHITECHTURE: Intel Itanium2 COMPUTING POWER: 10 TeraFlops The TeraGrid cluster "Mercury" at NCSA

9 TACC: Texas Advanced Computing Center
SCALE: 1024 processors ARCHITECHTURE: Intel Xeon COMPUTING POWER: 6 TeraFlops The TeraGrid cluster "LoneStar" at TACC

10 Before the TeraGrid: Supercomputing “The Old Fashioned way”
Each supercomputer center was it's own independent entity. Users applied for time at a specific supercomputer center Each center supplied its own: compute resources archival resources accounting user support

The TeraGrid Strategy Creating a unified user environment… Single user support resources. Single authentication point Common software functionality Common job management infrastructure Globally-accessible data storage …across heterogeneous resources 7+ computing architectures 5+ visualization resources diverse storage technologies Create a unified national HPC infrastructure that is both heterogeneous and extensible

12 The TeraGrid Strategy Strength through uniformity!
A major paradigm shift for HPC resource providers Make NSF resources useful to a wider community TeraGrid Resource Partners Strength through uniformity! Strength through diversity!

13 TeraGrid Components Compute hardware Intel/Linux Clusters
Alpha SMP clusters IBM POWER3 and POWER4 clusters SGI Altix SMPs SUN visualization systems Cray XT3 (PSC July 20) IBM Blue Gene/L (SDSC Oct 1)

14 TeraGrid Components Large-scale storage systems
hundreds of terabytes for secondary storage Very high-speed network backbone (40Gb/s) bandwidth for rich interaction and tight coupling Grid middleware Globus, data management, … Next-generation applications

15 Building a System of Unprecidented Scale
40+ teraflops compute 1+ petabyte online storage 10-40Gb/s networking

16 TeraGrid Resources Compute Resources Online Storage Mass Storage
ANL/ UC Caltech CACR IU NCSA ORNL PSC Purdue SDSC TACC Compute Resources Itanium2 (0.5 TF) IA-32 (0.8 TF) (0.2 TF) (2.0 TF) (10 TF) SGI SMP (6.5 TF) (0.3 TF) XT3 TCS (6 TF) Marvel Hetero (1.7 TF) (4.4 TF) Power4 (1.1 TF) (6.3 TF) Sun (Vis) Online Storage 20 TB 155 TB 32 TB 600 TB 1 TB 150 TB 540 TB 50 TB Mass Storage 1.2 PB 3 PB 2.4 PB 6 PB 2 PB Data Collections Yes Visualization Instruments Network (Gb/s,Hub) 30 CHI LA 10 ATL Boulder, CO

17 “Grid-Like” Usage Scenarios Currently Enabled by the TeraGrid
"Traditional" massively parallel jobs Tightly-coupled interprocessor communication storing vast amounts of data remotely remote visualization Thousands of independent jobs Automatically scheduled amongst many TeraGrid machines Use data from a distributed data collection Multi-site parallel jobs Compute upon many TeraGrid sites simultaneously TeraGrid is working to enable more!

18 Allocations Policies Any US researcher can request an allocation
Policies/procedures posted at: Online proposal submission

Allocations Policies Different levels of review for different size allocations DAC: "Development Allocation Committee" up to 30,000 Service Units ("SUs", 1 SU =~ 1 CPU Hour) only a one paragraph abstract required Must focus on developing an MRAC or NRAC application accepted continuously! MRAC: "Medium Resource Allocation Committee" <200,000 SUs/year reviewed every 3 months next deadline July 15, 2005 (then October 21) NRAC: "National Resource Allocation Committee" >200,000 SUs/year reviewed every 6 months next deadline July 15, 2005 (then January 2006)

20 Accounts and Account Management
Once a project is approved, the PI can add any number of users by filling out a simple online form User account creation usually takes 2-3 weeks TG accounts created on ALL TG systems for every user single US mail packet arriving for user accounts and usage synched through centralized database

21 Roaming and Specific Allocations
R-Type: "roaming" allocations can be used on any TG resource usage debited to a single (global) allocation of resource maintained in a central database S-Type: "specific" allocations can only be used on specified resource (All S-only awards come with 30,000 roaming SUs to encourage roaming usage of TG)

22 Useful links TeraGrid website Policies/procedures posted at:
Policies/procedures posted at: TeraGrid user information overview Summary of TG Resources Summary of machines with links to site-specific user guides (just click on the name of each site)

