NCCS User Forum September 14, 2010. Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz,

Slides:



Advertisements
Similar presentations
NCCS User Forum June 19, Agenda Introduction Discover Updates NCCS Operations & User Services Updates Question & Answer Breakout Session: –Climate.
Advertisements

Near-Term NCCS & Discover Cluster Changes and Integration Plans: A Briefing for NCCS Users October 30, 2014.
Information Technology Center Introduction to High Performance Computing at KFUPM.
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME SESAME – LinkSCEEM.
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Academic and Research Technology (A&RT)
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
User Forum NASA Center for Climate Simulation High Performance Science July 22, 2014.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Illinois Campus Cluster Program User Forum October 24, 2012 Illini Union Room 210 2:00PM – 3:30PM.
Lustre at Dell Overview Jeffrey B. Layton, Ph.D. Dell HPC Solutions |
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
DMF Configuration for JCU HPC Dr. Wayne Mallett Systems Manager James Cook University.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
Research Support Services Research Support Services.
NASA Center for Computational Sciences Climate and Weather Research at NASA Goddard 9 September 2009 Phil Webster Goddard Space Flight Center
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
NCCS User Forum 15 May NCCS User Forum5/15/20082 Agenda Welcome & Introduction Phil Webster NCCS Current System Status Fred Reitz, Operations Manager.
VO Sandpit, November 2009 e-Infrastructure to enable EO and Climate Science Dr Victoria Bennett Centre for Environmental Data Archival (CEDA)
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Stephen Dart LaRDS Service Manager Monash e-Research Centre LaRDS Staging Post Enhancing Workgroup Productivity.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
NCCS NCCS User Forum 24 March NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Manager.
NCCS User Forum June 15, Agenda Current System Status Fred Reitz, HPC Operations NCCS Compute Capabilities Dan Duffy, Lead Architect User Services.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Block1 Wrapping Your Nugget Around Distributed Processing.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
NCCS User Forum March 20, Agenda – March 20, 2012 Welcome & Introduction (Lynn Parnell) Discover Update (Dan Duffy) Dirac DMF Growth Rate – Issue.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
NCCS User Forum 11 December GSFC NCCS NCCS User Forum12/11/082 Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred.
NCCS/SIVO Open House 14 February NCCS/SIVO Open House2/14/20082 Agenda Welcome & Introduction Phil Webster/NCCS, Mike Seablom/SIVO About the NCCS.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.
NCCS User Forum July 19, Agenda Introduction (Lynn Parnell, NCCS HPC Lead) Discover Update Archive Update User Services Update Data Services & Analysis.
CASPUR Site Report Andrei Maslennikov Lead - Systems Rome, April 2006.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
CLOUD COMPUTING. What is cloud computing ? History Virtualization Cloud Computing hardware Cloud Computing services Cloud Architecture Advantages & Disadvantages.
NCCS NCCS User Forum 22 September NCCS Agenda Welcome & Introduction Phil Webster, CISTO Chief Current System Status Fred Reitz, Operations Lead.
GSFC NCCS NCCS User Forum 25 September GSFC NCCS NCCS User Forum9/25/082 Agenda Welcome & Introduction Phil Webster, CISTO Chief Scott Wallace,
NCCS User Forum September 25, Agenda Introduction Discover Updates NCCS Operations & User Services Updates Question & Answer NCCS User Forum, Sep.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
LLNL’s Data Center and Interoperable Services 5 th Annual ESGF Face-to-Face Conference ESGF 2015 Monterey, CA, USA Dean N. Williams, Tony Hoang, Cameron.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Drupal Service: Infrastructure Update 2 Marek Salwerowicz Sergio Fernandez ENTICE Meeting
NICS Update Bruce Loftis 16 December National Institute for Computational Sciences University of Tennessee and ORNL partnership  NICS is the 2.
Introduction to Hartree Centre Resources: IBM iDataPlex Cluster and Training Workstations Rob Allan Scientific Computing Department STFC Daresbury Laboratory.
What was done for AR4. Software developed for ESG was modified for CMIP3 (IPCC AR4) Prerelease ESG version 1.0 Modified data search Advance search Pydap.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
NCCS User Forum December 7, Agenda – December 7, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS.
Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer.
The Evolution of the Italian HPC Infrastructure Carlo Cavazzoni CINECA – Supercomputing Application & Innovation 31 Marzo 2015.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
© Thomas Ludwig Prof. Dr. Thomas Ludwig German Climate Computing Center (DKRZ) University of Hamburg, Department for Computer Science (UHH/FBI) Disks,
The demonstration of Lustre in EAST data system
Belle II Physics Analysis Center at TIFR
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
Статус ГРИД-кластера ИЯФ СО РАН.
Data Management Components for a Research Data Archive
H2020 EU PROJECT | Topic SC1-DTH | GA:
Presentation transcript:

NCCS User Forum September 14, 2010

Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) User Services Update (Tyler Simon, NCCS User Services Group) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Earth System Grid Data Node Update Questions & Wrap-Up (Phil Webster) Breakout Session: Footprints Tutorial (Max Guillen, NCCS User Services) NCCS User Forum Sep. 14, 20102

NCCS computational machines are very busy (at capacity) with your science work. –Nehalem nodes utilization routinely more than 95% on workdays. Help is coming: SCU7 will double the peak capacity of Discover. NCCS User Forum Sep. 14, 20103

Accomplishments Systems –Discover SCU7, will double Discover’s Peak TFLOPs –Dirac archive Disk nearly quadrupled Migrate to new Distributed/Parallel Servers in October –Collaboration with Intel to scale Intel MPI beyond 7,700 cores Services –Earth System Grid Data Node for NCCS Users’ IPCC Contributions –iRODS-based Data Management System prototype –Users’ web access to NCCS “Footprints” trouble-ticket system NCCS User Forum Sep. 14, 20104

SMD Compute Allocation Period Announcement Call for Computer Allocations for period starting November 1, 2010 “The Science Mission Directorate (SMD) will select from requests submitted to the e- Books online system by September 20 for 1-year allocation awards beginning November 1. Any current projects set to expire on October 31 must have a request in e- Books to be considered for renewal.” – PI applications will specify “core-hours,” as before, but resources will be awarded using new System Billing Units (under development with NAS) –Allocation award letters will be stated in both units for information purposes. –When change begins, all awards will be converted to the new units in the accounting system. – will be sent to all users before the change to charging via SBUs begins. Please with any NCCS User Forum Sep. 14, 20105

NCCS Computing Capacity Evolution NCCS User Forum Sep. 14, 20106

Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) User Services Update (Tyler Simon, NCCS User Services Group) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Earth System Grid Data Node Update Questions & Wrap-Up (Phil Webster) Breakout Session: Footprints Tutorial (Max Guillen, NCCS User Services) NCCS User Forum Sep. 14, 20107

Operations and Maintenance NASA Center for Climate Simulation (NCCS) Project Fred Reitz September 14, 2010

Accomplishments Discover –Provided GRIP mission support. Still in mission support mode. –SCU7 preparations Installed initial Intel Xeon “Westmere” hardware in Test and Development System (TDS) Received SCU7 computing node hardware (159 TF peak) Took delivery of power and cooling equipment, SCU7 cooling doors –Implemented new I/O monitoring tools. –Provided resources to Intel MPI Development Team to test scaling at cores and investigate performance problems with Intel MPI libraries. –Installed two NVIDIA GPGPUs in Discover TDS, production Discover cluster. Mass Storage –Installed new Distributed DMF hardware in preparation for upcoming archive upgrade. –Implemented significant reconfiguration of DMF tape and disk Storage Area Networks in preparation for migration to new Distributed DMF servers and disk array. Dataportal –Created separate GPFS cluster for ESG nodes –Installed new Dataportal database hardware –Installed new Dataportal disk array –Replaced network switch with newer model NCCS User Forum Sep. 14, 20109

Discover Total CPU Consumption Past 12 Months (CPU Hours) NCCS User Forum Sep. 14,

Discover Utilization Past 12 Months (Percentage) NCCS User Forum Sep. 14, SCU6 added

Discover Workload Distribution and Utilization by Architecture – August 2010 NCCS User Forum Sep. 14,

Discover CPU Consumption by Group Past 12 Months NCCS User Forum Sep. 14,

Discover CPU Consumption by Top Queues Past 12 Months NCCS User Forum Sep. 14,

Discover Job Analysis by Job Size August 2010 NCCS User Forum Sep. 14,

Discover Job Analysis by Queue August 2010 NCCS User Forum Sep. 14,

Discover Availability Past 12 Months NCCS User Forum Sep. 14, Scheduled Maintenance: June-August 26 June – 8.5 hours Corrected DDN5 cabling error 21 July – 12 hours GPFS, OFED, disk controller firmware upgrades Unscheduled Outages: June-August 4 June – 3 hours, 35 minutes Internal SCU5 network flapping 10 June – 7 hours, 57 minutes borgmg management server hang 25 June – 1 hour, 49 minutes DDN5 disk controller cabling issue 21 July – 50 minutes Extended maintenance window 23 July – 2 hours, 15 minutes Internal SCU5 network flapping 26 August – 20 minutes Internal SCU5 network flapping

Archive Data Stored 4 PBs added over last 12 months, versus 4 PBs in 20 months previously (10/1/07 to 6/1/09) NCCS pays an SGI license based on data stored (15 PBs at this time) Please remove any data you know you do not need NCCS User Forum Sep. 14,

NCCS Network Availability Past 12 months NCCS User Forum Sep. 14,

Issues 1.User jobs exhibiting high file open/close activity adversely affect other users’ work –GPFS designed for larger files, streaming I/O –NCCS is working with IBM, monitoring system –Please work with NCCS, SIVO if your application exhibits high file open/close activity or uses many small files 2.Jobs take longer to begin execution –SCU7 will help 3.Users running nodes out of memory can cause GPFS hangs –Improved monitoring has reduced GPFS hangs –Please work with NCCS, SIVO if your application runs nodes out of memory NCCS User Forum Sep. 14,

Future Enhancements Data Exploration Theater performance, functionality Discover –PBS v10 –SLES11 –SCU7 Dataportal –iRODS –Additional storage –New database servers Mass Storage –Platform upgrade –Additional storage NCCS User Forum Sep. 14,

Archive (Dirac) Upgrades Dirac currently on 5 year old system Upgrades –New DMF server to replace Bx2 –Parallel Data Movers for tape –Two Login nodes for interactive access to archive data –Two NFS nodes for improved NFS access to archive data from Discover –All systems commodity Xeon based –Upgrade to primary disk cache to 600 TB RAW - Complete –Upgrade to disk cache for DMF databases to improve serviceability and data access What does this mean to users? –Much higher performing archive –Outage for moving the DMF databases between systems (please be patient) Timeline –Currently scheduled for October 12 and 13 NCCS User Forum Sep. 14,

Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) User Services Update (Tyler Simon, NCCS User Services Group) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Earth System Grid Data Node Update Questions & Wrap-Up (Phil Webster) Breakout Session: Footprints Tutorial (Max Guillen, NCCS User Services) NCCS User Forum Sep. 14,

User Services NASA Center for Climate Simulation (NCCS) Project Tyler Simon September 14, 2010

NCCS User Services Online Ticketing Compiler/MPI Baselines Performance concerns –I/O Compute/Archive –Job throughput –Architectural behavior (demp, wood, harp, neha) Tools, benchmarking NCCS User Forum Sep. 14,

Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) User Services Update (Tyler Simon, NCCS User Services Group) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Earth System Grid Data Node Update Questions & Wrap-Up (Phil Webster) Breakout Session: Footprints Tutorial (Max Guillen, NCCS User Services) NCCS User Forum Sep. 14,

10 Years of NCCS Highlights NCCS User Forum Sep. 14, Halem – HP/Alpha Cluster 18 th fastest computer in the world Support for the IPCC AR4 modeling runs by GISS – Nobel Prize Large SGI Shared Memory Systems at GSFC Introduction of the Data Portal System and Services WMS, MAP, Cloud Library Discover – First production IB commodity cluster in NASA HEC Advancement of Data Services with the Data Management System (DMS) and the Earth Systems Grid (ESG) 10 GbE NCCS LAN 10 GbE between HEC centers Discover upgrades supports 3 KM global resolution model – highest resolution ever run Introduction of Dali analysis environment, interactive access to large data sets Climate Data Theater New tool for scientific analysis and visualization

Growth of Computing Cores Over the Last 10 Years NCCS User Forum Sep. 14, Basically flat for about 5 years. Increase of 10x in cores over the past 5 years – 3,000 to 30,000

Growth of Peak Computing Capability Over the Last 10 Years NCCS User Forum Sep. 14, Basically flat for about 5 years. Increase of 30x in peak computing capability over the past 5 years – 10 TF to over 300 TF

Representative IB and GPFS Architecture NCCS User Forum Sep. 14, B B a 5b 6a 6b Base Unit: 512 Dempsey (3.2 GHz) SCU1: 1,024 Woodcrest (2.66 GHz) SCU2: 1,024 Woodcrest (2.66 GHz) SCU3: 2,048 Harpertown (2.5 GHz) SCU4: 2,048 Harpertown (2.5 GHz 24 DDR IB uplinks to each unit 20 GPFS I/O Nodes 16 NSD (data) 4 MDS (metadata) 20 GPFS I/O Nodes 16 NSD (data) 4 MDS (metadata) SCU5: 4,096 Nehalem (2.8 GHz) SCU6: 4,096 Nehalem (2.8 GHz) SCU7: 14,400 Westmere (2.8 GHz) Analysis Direct Connect GPFS Nodes Data File Systems: Data Direct Networks S2A9500 S2A9550 S2A9900 Metadata File Systems: IBM/Engenio DS4700 Each circle represents a 288-port DDR IB Switch Brocade Brocade a- 7e The triangle represents a 2- to-1 QDR IB Switch fabric

FY10 SCU7 Discover Cluster Augmentation SCU7 will augment Discover computing nodes with Intel Xeon Westmere processors within the existing SCU5 and SCU6 InfiniBand communication fabric. –Dual-socket, hex-core processors. –12 MB L3 cache, 50% larger than Nehalem’s. –24 GB of RAM. –Quad-Data Rate (QDR) Infiniband Initial performance of the GEOS-5 Cubed- Sphere benchmark showed a slight improvement on Westmere processors compared to Nehalem. –Test compared use of 4 cores per socket on both Nehalem and Westmere. System has been delivered. NCCS User Forum Sep. 14, Xeon Processor SpeedCores/ Socke t L3 Cache Size Nehalem2.8 GHz48 MB Westmere2.8 GHz612 MB Intel Xeon 5600 processor series: “Westmere” Discover’s SCU7 will augment the computing capacity available for high resolution GEOS-5 Cubed Sphere runs and for GISS and GMAO simulations in support of the IPCC 5 th Reassessment.

More SCU7 Details and Timeline SCU7 is made up of 5 units of 240 nodes –Full QDR bi-section bandwidth (32 Gbps) within those 240 nodes (2,880 cores) –24 DDR (320 Gbps total) uplinks between each SCU7 unit and the top level switch To be available for general use by early December. NCCS User Forum Sep. 14, a 5b 6a 6b 20 GPFS I/O Nodes 16 NSD (data) 4 MDS (metadata) 7a 7b 7c 7d 7e Each blue circle represents: 256 nodes, 2,048 cores Each triangle circle represents: 240 nodes, 2,880 cores

Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) User Services Update (Tyler Simon, NCCS User Services Group) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Earth System Grid Data Node Update Questions & Wrap-Up (Phil Webster) Breakout Session: Footprints Tutorial (Max Guillen, NCCS User Services) NCCS User Forum Sep. 14,

Earth System Grid (ESG) Data Node for IPCC AR5 Earth System Grid purpose: provide data, information, models, analysis, and visualization tools and computational capabilities for management and analysis of climate resources. NCCS’s initial ESG Data Node is being set up to serve IPCC AR5 climate data (GISS and GMAO contributions). ESG allows the user to search and subset all the IPPC AR5 data by CMIP5 variable name. GMAO and GISS contributions will be available via the NCCS Data Portal: esg.nccs.nasa.gov –Access is via the PCMDI ESG Gateway at the CMIP5 link under: or at PCMDI link (below) NCCS’s data architecture for the ESG Data Node: –AR5 data are exported read-only from their source location to the Data Node. –Thus scores of TB are not required to be copied & replicated within the NCCS environment, reducing disk and I/O resource requirements. Status: published initial GMAO and GISS sample data to Lawrence Livermore National Laboratories’ Program for Climate Model Diagnosis and Intercomparison (PCMDI) ESG Gateway pcmdi3.llnl.gov/esgcet NCCS User Forum Sep. 14,

Earth System Grid and NCCS Data Portal: Live Access Server (LAS) for Data Analysis NCCS staff has also provided early access to Live Access Server (LAS) analysis services for AR5 and other climate data on the NCCS Data Portal. –LAS is to be part of ESG Data Node distribution, but is not yet included in the current distribution. –LAS is OPeNDAP-aware and allows users to plot, animate, subset and compare geo- referenced scientific (e.g. climate) data from distributed locations. The Live Access Server (LAS) is available under esg.nccs.nasa.gov and, as proof of concept, provides access to the much of the MERRA data stored at the GES DISC as well as some YOTC data. Data under LAS is organized by project, 2D, 3D for easy navigation. Future plans include offering THREDDS to provide download, OPeNDAP and NetCDF subsetting. NCCS User Forum Sep. 14,

Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) User Services Update (Tyler Simon, NCCS User Services Group) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Earth System Grid Data Node Update Questions & Wrap-Up (Phil Webster) Breakout Session: Footprints Tutorial (Max Guillen, NCCS User Services) NCCS User Forum Sep. 14,

Questions and Wrap-Up Contact information: –NCCS User Services: –Lynn Parnell, HPC Lead: –Phil Webster, CISTO Chief: NCCS User Forum Sep. 14,

Agenda – September 14, 2010 Welcome & Introduction (Phil Webster, CISTO Chief) Current System Status (Fred Reitz, NCCS Operations Manager) User Services Update (Tyler Simon, NCCS User Services Group) SCU7 and Other NCCS Systems Updates (Dan Duffy, NCCS Lead Architect) Earth System Grid Data Node Update Questions & Wrap-Up (Phil Webster) Breakout Session: Footprints Tutorial (Max Guillen, NCCS User Services) NCCS User Forum Sep. 14,