Erich Strohmaier, Berkeley Lab & TOP500 Wu Feng, Virginia Tech & Green500 with Natalie Bates, EE HPC Working Group Nic Dube, HP & The Green Grid Craig.

Slides:



Advertisements
Similar presentations
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Advertisements

©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.
Software Modeling SWE5441 Lecture 3 Eng. Mohammed Timraz
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
The Architecture Design Process
ISO Energy Management System Certification
Marketing Research Unit 7.
Introduction to Software Testing
CSC230 Software Design (Engineering)
Driving HPC energy efficiency  Energy Efficient HPC Working Group Forum for sharing of information Peer-to-peer exchange Best practices and case studies.
Usage Centric Green Metrics for Storage Doron Chen, Ealan Henis, Ronen Kat and Dmitry Sotnikov IBM Haifa Research Lab Most of the metrics defined today.
Benchmarking Energy Use in Commercial Buildings Satish Kumar, Ph.D. Lawrence Berkeley National Laboratory Public Sector Energy Management Workshop Mumbai,
Effective Methods for Software and Systems Integration
TIMELESS LEARNING POLICY & PRACTICE. JD HOYE President National Academy Foundation.
System Design Chapter 8. Objectives  Understand the verification and validation of the analysis models.  Understand the transition from analysis to.
Assessment of Core Services provided to USLHC by OSG.
Introduction to Information System Development.
S/W Project Management
S T A M © 2000, KPA Ltd. Software Trouble Assessment Matrix Software Trouble Assessment Matrix *This presentation is extracted from SOFTWARE PROCESS QUALITY:
UML - Development Process 1 Software Development Process Using UML (2)
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
Chapter 7: The 30 elements of systems engineering
NASA Langley Research Center - 1Workshop on UQEE Prediction of Computational Quality for Aerospace Applications Michael J. Hemsch, James M. Luckring, Joseph.
Sociology 3322a. “…the systematic assessment of the operation and/or outcomes of a program or policy, compared to a set of explicit or implicit standards.
Mantychore Oct 2010 WP 7 Andrew Mackarel. Agenda 1. Scope of the WP 2. Mm distribution 3. The WP plan 4. Objectives 5. Deliverables 6. Deadlines 7. Partners.
ITEC224 Database Programming
Energy Profiling And Analysis Of The HPC Challenge Benchmarks Scalable Performance Laboratory Department of Computer Science Virginia Tech Shuaiwen Song,
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D The Case for Monitoring and Testing David.
Liam Newcombe BCS Data Centre Specialist Group Secretary Energy Efficient Data Centres.
Herbert Huber, Axel Auweter, Torsten Wilde, High Performance Computing Group, Leibniz Supercomputing Centre Charles Archer, Torsten Bloth, Achim Bömelburg,
Programming in Java Unit 3. Learning outcome:  LO2:Be able to design Java solutions  LO3:Be able to implement Java solutions Assessment criteria: 
Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.
1 Metrics for the Office of Science HPC Centers Jonathan Carter User Services Group Lead NERSC User Group Meeting June 12, 2006.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Programme Objectives Analyze the main components of a competency-based qualification system (e.g., Singapore Workforce Skills) Analyze the process and.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
1 Data Center Energy Efficiency Metrics ASHRAE June 24, 2008 Bill Tschudi Lawrence Berkeley National Laboratory
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
1 The Good, the Bad, and the Ugly: Collecting and Reporting Quality Performance Data.
Chapter 6: THE EIGHT STEP PROCESS FOCUS: This chapter provides a description of the application of customer-driven project management.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Update IDC HPC Forum.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
Energy Efficient Data Centers Update on LBNL data center energy efficiency projects June 23, 2005 Bill Tschudi Lawrence Berkeley National Laboratory
Smart Home Technologies
Accounting for Load Variation in Energy-Efficient Data Centers
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Data Center Energy Use, Metrics and Rating Systems Steve Greenberg Energy Management Engineer Environmental Energy Technologies Division Lawrence Berkeley.
December 13, G raphical A symmetric P rocessing Prototype Presentation December 13, 2004.
Stage 1 Integrated learning Coffee Shop. LEARNING REQUIREMENTS The learning requirements summarise the knowledge, skills, and understanding that students.
Session 2: Developing a Comprehensive M&E Work Plan.
Session 7: Planning for Evaluation. Session Overview Key definitions:  monitoring  evaluation Process monitoring and process evaluation Outcome monitoring.
1 1- Introduction Focus of Human Factors: Human factors (or ERGONOMICS) focus on human beings and their interaction with products, equipment, facilities,
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Lizhe Wang, Gregor von Laszewski, Jai Dayal, Thomas R. Furlani
Managing the Project Lifecycle
Software Architecture in Practice
Green cloud computing 2 Cs 595 Lecture 15.
Unified Modeling Language
Introduction to Software Testing
"IT principles" Context, roadmap
Objective of This Course
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Systems Engineering for Mission-Driven Modeling
Implementation of a small-scale desktop grid computing infrastructure in a commercial domain    
Presentation transcript:

Erich Strohmaier, Berkeley Lab & TOP500 Wu Feng, Virginia Tech & Green500 with Natalie Bates, EE HPC Working Group Nic Dube, HP & The Green Grid Craig Steffen, NCSA And others on the Compute System Metrics Team SC11 BoF; November 2011; Seattle

 Context  Power consumption and facility costs of HPC are increasing.  “Can only improve what you can measure”  What is needed?  Converge on a common basis for:  METHODOLOGIES  WORKLOADS  METRICS for energy-efficient supercomputing, so we can make progress towards solutions.

 Collaboration between Top500, Green500, Green Grid and EE HPC WG  Evaluate and improve methodology, metrics, and drive towards convergence on workloads  Form a basis for evaluating energy efficiency of individual systems, product lines, architectures and vendors  Target architecture design and procurement decision making process

 Measure behavior of key system components including compute, memory, interconnect fabric, storage and external I/O  Workloads and Metrics might address several components at the same time  Phased implementation planned

 Leverage well-established benchmarks  Must exercise the HPC system to the fullest capability possible  Use High Performance LINPACK (HPL) for exercising (mostly) compute sub-system  Use RandomAccess (Giga Updates Per second or GUPs) for exercising memory sub-system (?)  Need to identify workloads for exercising other sub- systems

 HPL and RandomAccess measurement methodologies are well established  Green500 & TOP500 power-measurement methodology  Similar, but not identical methodologies  Issues/concerns with power-measurement methodology  Variation in start/stop times as well as sampling rates  Node, rack or system level measurements  What to include in the measurement (e.g., integrated cooling)  Need to increase vendor and/or supercomputing center power- measurement reporting  June 2010 Green500 List – 56% of the list were submitted /measured numbers, the other 44% were derived by Green500

 Current power measurement methodology is very flexible, but compromises consistency between submissions  Proposal is to keep flexibility, but keep track of rules used and quality of power measurement  3 Tiers of power measurement quality  Sampling rate; more measurements means higher quality  Completeness of what is being measured; more of the system translates to higher quality  Common rules for start/stop times

Metric and Methodology General Definitions: –metric: a basis for comparison, a reference point against which other things can be evaluated; a measure –methodology: the system of methods followed in a particular discipline; a way of doing something, especially a systematic way (usually in steps); a measurement procedure Specific Instantiations: –Metric: Average Power consumed while running workload. Others; peak instantaneous energy, total energy, etc. –Methodology: Time fraction and granularity of run, fraction of the system measured, sub-systems included, how power/energy data is recorded, how average power is computed, and what data is required for reporting Others; environmental conditions, measurement tool requirements, etc.

Power/Energy Measurement Methodology (Current Proposed) Aspect 1: Time Fraction & Granularity Aspect 2: Machine Fraction Aspect 3: Subsystems Measured Level 1 20% of run: 1 average power measurement (larger of) 1/64 of machine or 1kW [Y] Compute nodes [ ] Interconnect net [ ]Storage [ ]Storage Network [ ]Login/Head nodes Level 2 100% of run: at least 100 average power measurements (Larger of) 1/8 of machine or 10kW Level 3 100% of run: at least 100 running total energy measurements Whole machine

Why Report 100 Measurements? Top-500 requires HPL output for performance verification Power/Energy Measurement Methodology defines data to be reported for power consumption verification. –The data is used to verify the computation of average power and matched against total runtime as reported by the workload

Experimental power data from LLNL Muir Lawrence Livermore National Lab “Muir” Cluster Dell Xanadu 3 Cluster, Xeon X Ghz, QLogic InfiniBand QDR, 1242 nodes LLNL data center instrument with power and energy meters at the panel board level Data that follows is from a special HPL run just for power data (not optimized for performance) Data is total machine power every 1 second for the entire run including start and end idle time

HPL Power Profile on Muir (non-tuned) Idle

How Is The Data Used to Compute Average Power? L1: Average Power is recorded average power L2: Average power is numerical average of instantaneous power measurements within the run window. Power data is used to verify start and stop times of run window and verify with benchmark time. L3: Average power is difference of total energy at the end the run window minus the total energy at the beginning, divided by total run time. Energy data is used to determine the beginning and end of run window and verify with benchmark time.

Aspect 1 Time Fraction: Defining Run "Beginning" HPL timing start

Aspect 1 Time Fraction: Defining Run End HPL timing end The runtime length obtained in this way agrees with HPL reported runtime within ~2 seconds.

Aspect 1 Time Fraction: Why Whole Run is Required for Higher Quality Levels Power drops from ~390kW to ~360kW over run, a variation of almost 8%

Aspect 1 Granularity: Why Is So Much Data Require For Reporting? L2: Instantaneous power data is the result of interference between sampling, supply switching and input AC. High resolution maximizes the feature capture. L3: Integrated total energy measurements have effectively infinite granularity. All power transients taken into account at the device level.

Aspect 1 Granularity Requirement Examples: Sometimes Wide Variations or Sometimes Not

Aspect 1 Granularity Requirement: Highly Periodic Structure: Sampling Artifact

Aspect 1 Granularity Requirement: True Power Variations Over Run

Aspect 2 Machine Fraction: Why Are Larger Machine Fractions Preferred? Measuring power/energy for a larger machine fractions decreases random power supply and measurement fluctuations Larger fractions reduces the effect of unusual (e.g., low/high power) nodes A larger machine fraction reduces magnifying instrument error by extrapolation

Aspect 3: Sub-system List No particular subsystem is required for any level Subsystem inclusion is closely tied to power distribution, machine construction, and power instrumentation details Reporting of what subsystems are included in the power measurement is required to increase information and to inform future requirements

 Continued focus on improving power measurement methodology  Power/Energy Measurement Methodology Working paper ready for broader review, effective Nov’11   “Quality level” implementation targeted for June 2012 Green500 and Top500 Lists

 Identify workloads for exercising other sub- systems; e.g., storage, I/O  Still need to decide upon exact metric  Classes of systems (e.g., Top50, Little500)  Multiple metrics or a single index  Energy and power measurements  Average power, total energy, max instantaneous power  Include non-workload consumption; idle time, job launch and shut down, etc.  Environmental conditions?

 To gather community momentum and support  To solicit your feedback  To ask you to participate by submitting power measurements for Top500 and Green500 Lists

- Driving energy conservation measures and energy efficient design in high performance computing. - Demonstrate leadership in energy efficiency as in computing performance. - Forum for sharing of information (peer-to-peer exchange) and collective action. Energy Efficient HPC Working Group EE HPC WG

System Power Measurement Methodology (Possible Proposed Changes) Aspect 1: Time Fraction & Granularity Aspect 2: Machine Fraction Aspect 3: Subsystems Measured Level 1 20% of run: 1 average power measurement (larger of) 1/64 of machine or 1kW [Y] Compute nodes [ ] Interconnect net [ ]Storage [ ]Storage Network [ ]Login/Head nodes Level 2 Instantaneous power measurements every 1 second over whole run (Larger of) 1/8 of machine or 10kW Level 3 100% of run: at least 100 running total energy measurements Whole machine

Today’s focus  “Data Center” Measurements  Power Usage Effectiveness (PUE)  PUE = Total Facility Energy/ IT Energy  Energy Reuse Effectiveness (ERE)  ERE = (Total E.-Reused E.)/IT Energy AND  “Compute System” Measurements  Workload/Productivity Metrics  Useful Work / Energy Consumed

 Stakeholders  HPC computer system designers and procurement decision makers, including users, data center and facilities designers/managers  Purpose  Unite the community behind energy efficiency metric(s) for HPC systems that form the basis for comparing and evaluating individual systems, product lines, architectures and vendors

 A basis for comparison  A reference point against which other things can be evaluated  A measure

 Granular enough  Individual components  Analyzed in manageable chunks  Assigned to specific parties for improvement  Intuitive, obvious, and clear  Scientifically accurate and used precisely  Sufficiently flexible to respond to new technology developments  Vendor-neutral  Inexpensive and worthwhile Stanley, J.R., Brill, K.G. and Koomey, J. “Four Metrics Define Data Center ‘Greenness,’” Uptime Institute White Paper

 The application or benchmark software designed to exercise the HPC system to the fullest capability possible

 Use Workload-based Metrics to Represent HPC Energy Efficiency  Use workload-based for numerator and measured power during workload run for denominator  Examples  Green500 “FLOPS per Watt”  SPEC FP / Measured Watt  Green Grid “Productivity Proxies”  Still need to decide upon exact metric  Classes of systems (e.g., Top50, Little500)  Multiple metrics or a single index  Idle as well as full stress

 The system of methods followed  A way of doing something, especially a systematic way; implies an orderly logical arrangement (usually in steps)  A measurement procedure