Aim High…Fly, Fight, Win NWP Transition from AIX to Linux Lessons Learned Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011.

Slides:

Advertisements

Similar presentations

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.

Advertisements

Accelerating Your Success™ Oracle on Oracle for NHS 1.

IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.

Appro Xtreme-X Supercomputers A P P R O I N T E R N A T I O N A L I N C.

CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.

Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Information Technology Center Introduction to High Performance Computing at KFUPM.

Application Models for utility computing Ulrich (Uli) Homann Chief Architect Microsoft Enterprise Services.

Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.

Energy Efficient Prefetching with Buffer Disks for Cluster File Systems 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software.

1 HPC and the ROMS BENCHMARK Program Kate Hedstrom August 2003.

1 Maui High Performance Computing Center Open System Support An AFRL, MHPCC and UH Collaboration December 18, 2007 Mike McCraney MHPCC Operations Director.

Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.

1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.

1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.

HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.

AN INTRODUCTION TO LINUX OPERATING SYSTEM Zihui Han.

Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

Tanenbaum 8.3 See references

Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.

Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)

© 2012 International Business Machines Corporation IBM Watson in Health Care Joel Farrell, IBM MedBiquitous Annual Conference 2013.

Copyright 2009 Fujitsu America, Inc. 0 Fujitsu PRIMERGY Servers “Next Generation HPC and Cloud Architecture” PRIMERGY CX1000 Tom Donnelly April

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.

Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Rensselaer Why not change the world? Rensselaer Why not change the world? 1.

Future Server and Storage Technology Brian Minick, Infrastructure Design Leader - GE.

Early Experiences with Energy-Aware (EAS) Scheduling

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

IM&T Vacation Program Benjamin Meyer Virtualisation and Hyper-Threading in Scientific Computing.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

Enterprise Grid in Financial Services Nick Werstiuk

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

The CRI compute cluster CRUK Cambridge Research Institute.

© 2009 IBM Corporation Motivation for HPC Innovation in the Coming Decade Dave Turek VP Deep Computing, IBM.

11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,

IDC HPC User Forum April 14 th, 2008 A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering

Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.

Virtualization Supplemental Material beyond the textbook.

Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.

Power and Cooling at Texas Advanced Computing Center Tommy Minyard, Ph.D. Director of Advanced Computing Systems 42 nd HPC User Forum September 8, 2011.

The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.

Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,

Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.

Mass Storage at SARA Peter Michielse (NCF) Mark van de Sanden, Ron Trompert (SARA) GDB – CERN – January 12, 2005.

Tackling I/O Issues 1 David Race 16 March 2010.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.

29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Extreme Scale Infrastructure

High Performance Computing (HPC)

Green IT Focus: Server Virtualization

Appro Xtreme-X Supercomputers

Stallo: First impressions

Large Scale Test of a storage solution based on an Industry Standard

IBM Power Systems.

Introduction By mid-2006, the NYISO was averaging < 2% utilization across nearly 500 UNIX servers and suffering from growing power, cooling and space requirements.

Cluster Computers.

Presentation transcript:

Aim High…Fly, Fight, Win NWP Transition from AIX to Linux Lessons Learned Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011

Aim High…Fly, Fight, Win Overview Introduction AFWA Architecture Applications run on HPC Original NWP Environment Linux Configuration TCO Comparison Lessons Learned Future Linux Plans Summary

Aim High…Fly, Fight, Win Introduction AFWA has a long history of AIX HPC environment Air Force Weather Environment Worldwide, 24x7x365, systems, weather data and product support Headquarters, Operational Weather Squadrons (OWS), and Combat Weather Teams (CWTs), Climatological Center (14 th WS) 600+ systems across 4 distinct security enclaves 16 million+ lines of code ~1,000 software applications supported As model resolutions improve and processing requirements soar, AFWA requirements for NWP processing capability have increased dramatically SEMS (in-house support contractor) performed a study, evaluating IBM, HP, and Cray Red Hat Linux on HP hardware Transitioning from IBM/AIX to HP/Linux has resulted in a significant savings in Total Cost of Ownership (TCO)

Aim High…Fly, Fight, Win AFWA Architecture (Unclassified Only)

Aim High…Fly, Fight, Win Applications Run on HPC Run Regional Models WRF WRF Chem CDFS II (future) Dust LIS Run Global UM Ensembles Model post-processing Misc space products

Aim High…Fly, Fight, Win Original NWP Environment (Unclassified)

Aim High…Fly, Fight, Win “Free” Hardware Adventure In 2008 AFWA evaluated JVN (available from HPCMO Modernization) 1024 compute nodes 36 racks of equipment 589 KW power requirements 161 tons of cooling The “Free” hardware was not cost-effective SEMS performed a study to evaluate alternatives New hardware was more cost effective Less space Less power Less cooling More Flops Lower TCO Decision made to pursue Linux HPC solution

Aim High…Fly, Fight, Win AFWA Unclassified HPC Configuration

Aim High…Fly, Fight, Win Linux Configuration Prod 8/DC3 OS: Linux RHEL 5.3 File System: Lustere Disk: 50 TB I/O Bandwidth: 900 Mb/s throughput Chipset (2) ) 2.53 GHz Intel Nehalem E5540 quad-core CPUs per node Compute Blades: 128 Cores/Memory: 1024 cores, 3GB per core Processing capacity: 10 TeraFlops (Production) Test and development system (DC3): 5 TeraFlops

Aim High…Fly, Fight, Win TCO Comparison Original 10 TeraFlops of IBM/AIX HPC O&M (non-labor) - $1.4M Nominally $133K per TeraFlop for IBM/AIX HPC Annual projected O&M costs for Linux (now totalling 24 TeraFlops) - $ 1M Conservatively, $30K per TeraFlop for HP/Linux HPC Bottom line: Linux HPC solution represented a significant savings

Aim High…Fly, Fight, Win Lessons Learned Not all “free” hardware is desirable (JVN) Differences in Linux vs. AIX compilers (minor, but require modifications) Significant tuning differences between AIX and Linux File system configurations significantly different (Lustere/IBRIX vs GPFS) Job scheduler differences had to be worked through (IBM Load Leveler vs. Platform LSF) Full reduction of TCO doesn’t occur until previous OS support is no longer required So far, Linux has been proven to be a reliable and cost-effective OS for NWP

Aim High…Fly, Fight, Win Future Linux Plans core Linux cluster is being planned for delivery in August 2011 Represents 51 TeraFlops of additional capability Total HPC capacity by end of year 2011 > 90 TeraFlops Total phase out of IBM/AIX HPC environment

Aim High…Fly, Fight, Win Summary Total Cost of Ownership is complex Initial costs Transition costs Facility costs Support costs Linux does scale well Linux is a viable and cost-effective HPC platform Transitioning from IBM/AIX to HP/Linux has resulted in a significant TCO savings