Presentation is loading. Please wait.

Presentation is loading. Please wait.

PDSF and the Alvarez Clusters Presented by Shane Canon, NERSC/PDSF

Similar presentations


Presentation on theme: "PDSF and the Alvarez Clusters Presented by Shane Canon, NERSC/PDSF"— Presentation transcript:

1 PDSF and the Alvarez Clusters Presented by Shane Canon, NERSC/PDSF canon@nersc.gov

2 NERSC Hardware National Energy Research Scientific Computing Center http://www.nersc.gov http://www.nersc.gov One of the nation’s top unclassified Computing resources, funded by the DOE for over 25 years with the mission of providing computing and network services for research. NERSC is located at Lawrence Berkeley Laboratory, in Berkeley, CA http://www.lbl.gov http://www.lbl.gov High Performance Computing Resources http://hpcf.nersc.gov http://hpcf.nersc.gov - IBM SP cluster, 2000+ processors, 1.2+ TB RAM, 20 TB+ cluster filesystem - Cray T3E, 692 processors, 177 GB RAM - Cray PVP, 64 processors, 3 GW RAM - PDSF, 160 Compute nodes, 281 processors, 7.5 TB disk space - HPSS, 6 StorageTek Silos, 880 TB’s of near-line and offline storage. Soon to be expanded to a full PetaByte of storage

3 NERSC Facilities New Oakland Scientific Facility - 20,000 sq. foot data center - 24x7 operations team - OC48 (2.5 Gbits/sec) connection to LBL/ESNet - options on 24,000 sq. foot expansion

4 NERSC Internet Access ESNet Headquarters http://www.es.net/ - Provides leading edge networking to DOE researchers - Backbone has OC12 (622 Mbit/sec) connection to CERN - Backbone connects key DOE sites - Headquartered at Lawrence Berkeley - Location assures prompt response

5 Cluster Design Embarrassingly Parallel Commodity networking Commodity parts Buy “at the knee” No modeling

6 Issues with Cluster Configuration Maintaining consistency Scalability System Human Adaptability/Flexibility Community tools

7 Cluster Configuration Present Installation Home grown (nfsroot/tar image) Configuration management Rsync/RPM Cfengine

8 Cluster Configuration Future Installation kickstart (or systemimager/systeminstaller) Configuration management RPM Cfengine Database Resource management Integrate with configuration management

9 NERSC Staff NERSC and LBL have dedicated, experienced staff in the fields of high performance computing, GRID computing and mass storage Researchers - Will Johnston, Head of Distributed Systems Dept. GRID researcher http://www.itg.lbl.gov/ Project manager for NASA Information Power Grid http://www.nas.nasa.gov/IPG - Arie Shoshani, Head of Scientific Data Management http://gizmo.lbl.gov/DM.html Researches mass storage issues related to scientific computing - Doug Olson, Project Coordinator Particle Physics Data Grid http://www.ppdg.net/ http://www.ppdg.net/ Coordinator for STAR computing at PDSF - Dave Quarrie, Chief Software Architect, ATLAS http://www.nersc.gov/aboutnersc/bios/henpbios.html http://www.nersc.gov/aboutnersc/bios/henpbios.html - Craig Tull, Offline Software Framework/Control, Coordinator for ATLAS computing at PDSF NERSC High Performance Computing Department http://www.nersc.gov/aboutnersc/hpcd.html - Advanced Systems Group evaluates and vetts HW/SW for production computing (4 FTE) - Computing Systems Group manages infrastructure for computing (9 FTE) - Computer Operations & Support provides 24x7x365 support (14 FTE) - Networking and Security Group provides Networking and Security (3 FTE) - Mass Storage manages the near-line and off-line storage facilities (5 FTE)

10 PDSF & STAR PDSF has been working with the STAR since 1998 http://www.star.bnl.gov/l - Data collection occurs at Brookhaven, and DST’s are sent to NERSC - PDSF is the primary offsite computing facility for STAR - Collaboration carries out DST analysis and simulations at PDSF - STAR has 37 collaborating institutions (too many for arrows!)

11 PDSF Philosophy PDSF is a Linux cluster built from commodity hardware and open source software - Our mission is to provide the most effective distributed computer cluster possible that is suitable for experimental HENP applications http://pdsf.nersc.gov - PDSF acronym came from SSC lab in 1995, along with original equipment - Architecture tuned for “embarassingly parallel” applications - Uses LSF 4.1 for batch scheduling - AFS access, and access to HPSS for mass storage - High speed (Gigabit Ethernet) access to HPSS system - One of several Linux clusters at LBL - Alvarez cluster has similar architecture, but supports Myrinet cluster interconnect - NERSC PC Cluster project by Future Technology Group is an experimental cluster http://www.nersc.gov/research/FTG/index.html - Genome cluster at LBL for research into fruit fly genome - 152 compute nodes, 281 processors, 7.5 TB of storage - Cluster uptime for year 2000 was > 98%, and for most recently measured period (January 2001), cluster utilization for batch jobs was 78%. - Overall cluster has had zero downtime due to security issues - PDSF and NERSC have a track record of solid security balanced with unobtrusive practices

12 More About PDSF PDSF uses a common resource pool for all projects - PDSF supports multiple experiments: STAR, ATLAS, BABAR, D0, Amanda, E871, E895, E896 and CDF. - Multiple projects have access to the computing resources, s/w available supports all experiments - Actual level of access is determined by the batch scheduler, using fair share rules - Each project’s investment goes into purchasing hardware and support infrastructure for the entire cluster - The use of a common configuration decreases management overhead, lowers administration complexity, and increases availability of useable computing resources - Use of commodity Intel hardware makes us vendor neutral, and lowers the cost to all of our users - Low cost and easy access to hardware makes it possible for us to update configurations relatively quickly to support new computing requirements. - Because the actual physical resources available is always greater than any individual contributor’s investment, there is usually some excess capacity available for sudden peaks in usage, and always a buffer to absorb sudden hardware failures


Download ppt "PDSF and the Alvarez Clusters Presented by Shane Canon, NERSC/PDSF"

Similar presentations


Ads by Google