Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii.

Similar presentations


Presentation on theme: "Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii."— Presentation transcript:

1 Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii

2 Overview of HPCS

3 High Productivity Computing Systems
Goal: Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 – 2010) Impact: Performance (time-to-solution): speedup critical national security applications by a factor of 10X to 40X Programmability (time-for-idea-to-first-solution): reduce cost and time of developing application solutions Portability (transparency): insulate research and operational application software from system Robustness (reliability): apply all known techniques to protect against outside attacks, hardware faults, & programming errors HPCS Program Focus Areas Mission: Provide a focused research and development program, creating new generations of high end programming environments, software tools, architectures, and hardware components in order to realize a new vision of high end computing, high productivity computing systems (HPCS). Address the issues of low efficiency, scalability, software tools and environments, and growing physical constraints. Fill the high end computing between today’s late 80’s based technology High Performance Computing (HPCs) and the promise of quantum computing. Provide economically viable high productivity computing systems for the national security and industrial user communities with the following design attributes in the latter part of this decade: Performance: Improve the computational efficiency and performance of critical national security applications. Productivity: Reduce the cost of developing, operating, and maintaining HPCS application solutions. Portability: Insulate research and operational HPCS application software from system specifics. Robustness: Deliver improved reliability to HPCS users and reduce risk of malicious activities. Background: High performance computing is at a critical juncture. Over the past three decades, this important technology area has provided crucial superior computational capability for many important national security applications. Government research, including substantial DoD investments, has enabled major advances in computing, contributing to the U.S. dominance of the world computer market. Unfortunately, current trends in commercial high performance computing, future complementary metal oxide semiconductor (CMOS) technology challenges, and emerging threats are creating technology gaps that threaten continued U.S. superiority in important national security applications. As reported in recent DoD studies, there is a national security requirement for high productivity computing systems. Without government R&D and participation, high-end computing will be available only through commodity manufacturers primarily focused on mass-market consumer and business needs. This solution would be ineffective for important national security applications. The HPCS program will significantly contribute to DoD and industry information superiority in the following critical applications areas: operational weather and ocean forecasting; planning exercises related to analysis of the dispersion of airborne contaminants; cryptanalysis; weapons (warheads and penetrators); survivability/stealth design; intelligence/surveillance/reconnaissance systems; virtual manufacturing/failure analysis of large aircraft, ships, and Applications: Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant modeling and biotechnology Fill the Critical Technology and Capability Gap Today (late 80’s HPC technology)…..to…..Future (Quantum/Bio Computing)

4 Double Raw Performance every 18 Months Double Value Every 18 Months
Vision: Focus on the Lost Dimension of HPC – “User & System Efficiency and Productivity” 1980’s Technology Parallel Vector Systems Vector Moore’s Law Double Raw Performance every 18 Months Commodity HPCs Tightly Coupled Parallel Systems To meet the pressing IT imperatives of the future, significant development of IT capabilities must be accomplished. The technical areas in which development must occur are layered in the notional architecture shown here. At the bottom of the architecture are the sensors, platforms, weapons, etc. They are the sensors and effectors that the IT world uses to interface with the physical world. The continued development and enhancement of these systems will not be a part of the IT development, but as they are improved the IT world will need to change to take full advantage of these systems’ capabilities. The layer immediately above the physical systems we call the “pervasive computing foundations.” It will develop the communications, networks, computing devices, and storage capabilities to implement a foundation that is much faster, reliable, easier to manage, and ubiquitous than today’s. Since it is recognized that almost all DoD and related systems in the future will be distributed, and indeed often widely dispersed, heterogeneous, and with a large number of nodes, a distributed processing infrastructure layer is critical to the operation of future IT systems. Agent-based computing paradigms will provide an increasingly important capability for systems and system components to interoperate, especially when they have not been designed up front to do so. In these distributed environments, the allocation of resources is vital, whether these resources be computing, storage, personnel, or information. Warfighters will ultimately interact with applications, which are not an ITO charter, but these applications can significantly benefit from a layer of application enablers, which may be “above” or “below” the applications in this notional architecture. These layers include capabilities to enable human-computer collaboration, the integration of large amounts of disparate information, the management of the knowledge that is essential to the applications, and support for decision making, especially under the uncertainties that are always present in warfare. Spanning these technical area layers are two additional information technology areas: the architecture and design of the complex, distributed information systems, and the security that designs in and builds in trust to applications. Both of these areas are more process-oriented that the product-oriented technical layers, but nevertheless will be vitally important to the development of advance information systems in the future. New Goal: Double Value Every 18 Months 2010 High-End Computing Solutions Fill the high-end computing technology and capability gap for critical national security missions

5 HPCS Technical Considerations
Architecture Types Communication Programming Models Custom Vector Microprocessor Symmetric Multiprocessors Distributed Shared Memory Parallel Vector HPCS Focus Tailorable Balanced Solutions Shared-Memory Multi-Processing Performance Characterization & Precision Programming Models Hardware Technology Software System Architecture Massively Parallel Processors Commodity Clusters, Grids Distributed-Memory Multi-Computing “MPI” Scalable Vector Vector Supercomputer Commodity HPC Single Point Design Solutions are no longer Acceptable

6 HPCS Program Phases I - III
Early Academia Early Metrics and Benchmarks Metrics, Products Software Research Pilot Benchmarks HPCS Capability or Products Tools Platforms Platforms Application Analysis Performance Assessment Requirements and Metrics Technology Assessments Research Prototypes & Pilot Systems System Design Review Concept Reviews PDR DDR Industry Industry Evolutionary Development Cycle Phase II Readiness Reviews Phase III Readiness Review Fiscal Year 02 03 04 05 06 07 08 09 10 Reviews Industry Procurements Critical Program Milestones Phase I Industry Concept Study Phase II R&D Phase III Full Scale Development

7 Application Analysis/ Performance Assessment
Activity Flow Inputs Application Analysis Benchmarks & Metrics Impacts DDR&E & IHEC Mission Analysis Common Critical Kernels Participants HPCS Technology Drivers Define System Requirements and Characteristics HPCS Applications 1. Cryptanalysis 2. Signal and Image Processing 3. Operational Weather 4. Nuclear Stockpile Stewardship 5. Etc. Mission Partners: DOD DOE NNSA NSA NRO Compact Applications Applications Productivity Ratio of Utility/Cost Metrics Development time (cost) Execution time (cost) Implicit Factors Mission-Specific Roadmap Mission Work Flows Mission Partners Improved Mission Capability Participants: Cray IBM Sun DARPA HPCS Program Motivation

8 Workflow Priorities & Goals
Implicit Productivity Factors Workflow Perf. Prog. Port. Robust. Researcher High Enterprise High High High High Production High High Mission Needs System Requirements Workflows define scope of customer priorities Activity and Purpose benchmarks will be used to measure Productivity HPCS Goal is to add value to each workflow Increase productivity while increasing problem size Productivity Problem Size Workstation Cluster HPCS HPCS Goal Researcher Production Enterprise

9 Productivity Framework Overview
Phase I: Define Framework & Scope Petascale Requirements Phase II: Implement Framework & Perform Design Assessments Phase III: Transition To HPC Procurement Quality Framework Acceptance Level Tests Value Metrics Execution Development Run Evaluation Experiments Preliminary Multilevel System Models & Prototypes HPCS Vendors HPCS FFRDC & Gov R&D Partners Mission Agencies Final Multilevel System Models & SN001 Workflows -Production -Enterprise -Researcher Benchmarks -Activity Purpose Commercial or Nonprofit Productivity Sponsor Productivity Framework. Phase 1: Definition. Phase 2: Implementation. Phase 3: Transition. HPCS needs to develop a procurement quality assessment methodology that will be the basis of HPC procurements

10 HPCS Phase II Teams LCS Industry: Productivity Team (Lincoln Lead)
PI: Elnozahy PI: Rulifson PI: Smith Goal: Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 – 2010) Productivity Team (Lincoln Lead) PI: Kepner MIT Lincoln Laboratory PI: Lucas PI: Basili PI: Benson & Snavely HPCS Phase II teams. Industry: IBM, Sun, Cray. Productivity Team: Lincoln, ISI, UMD, UCSD, MITRE, LLNL, LANL, ANL, LBL, UCSB, LCS, OSU, CodeSourcery PI: Koester PIs: Vetter, Lusk, Post, Bailey PIs: Gilbert, Edelman, Ahalt, Mitchell LCS Ohio State Goal: Develop a procurement quality assessment methodology that will be the basis of HPC procurements

11 Motivation: Metrics Drive Designs
“You get what you measure” Execution Time (Example) Current metrics favor caches and pipelines Systems ill-suited to applications with Low spatial locality Low temporal locality Development Time (Example) No metrics widely used Least common denominator standards Difficult to use Difficult to optimize Low Table Toy (GUPS) (Intelligence) High High Performance High Level Languages Large FFTs (Reconnaissance) Matlab/ Python Spatial Locality Adaptive Multi-Physics Weapons Design Vehicle Design Weather UPC/CAF Language Expressiveness HPCS HPCS Tradeoffs C/Fortran MPI/OpenMP SIMD/ DMA StreamsAdd Top500 Linpack Rmax Assembly/ VHDL Low High Language Performance High Temporal Locality Low Low High Motivation: Metrics Drive Design. “You get what you measure” Goal: allow quantitative tradeoffs between execution time and development time

12 Phase 1: Productivity Framework
Activity & Purpose Benchmarks System Parameters (Examples) BW bytes/flop (Balance) Memory latency Memory size …….. Execution Time (cost) Processor flop/cycle Processor integer op/cycle Bisection BW ……… Actual System or Model Productivity Metrics Work Flows Productivity (Ratio of Utility/Cost) Common Modeling Interface Size (ft3) Power/rack Facility operation ………. Development Time (cost) Code size Restart time (Reliability) Code Optimization time ………

13 Phase 2: Implementation
(Mitre, ISI, LBL, Lincoln, HPCMO, LANL & Mission Partners) Activity & Purpose Benchmarks (Lincoln, OSU, CodeSourcery) System Parameters (Examples) Performance Analysis (ISI, LLNL & UCSD) BW bytes/flop (Balance) Memory latency Memory size …….. Execution Time (cost) Exe Interface Processor flop/cycle Processor integer op/cycle Bisection BW ……… Actual System or Model Productivity Metrics Work Flows Productivity (Ratio of Utility/Cost) Common Modeling Interface Size (ft3) Power/rack Facility operation ………. Development Time (cost) Metrics Analysis of Current and New Codes (Lincoln, UMD & Mission Partners) Dev Interface Code size Restart time (Reliability) Code Optimization time ……… University Experiments (MIT, UCSB, UCSD, UMD, USC) (ISI, LLNL& UCSD) (ANL & Pmodels Group) Contains Proprietary Information - For Government Use Only

14 HPCS Mission Work Flows
Overall Cycle Development Cycle Experiment Theory Researcher Code Test Design Prototyping Days to hours Hours to minutes Researcher Development Execution Design Simulation Visualize Enterprise Enterprise Port Legacy Software Code Design Prototyping Optimize Scale Test Development Port Legacy Software Months to days Months to days Decide Observe Act Orient Production Design Initial Product Development Production Years to months Code Development Initial Evaluation Operation Maintenance Hours to Minutes (Response Time) Test Port, Scale, Optimize HPCS Productivity Factors: Performance, Programmability, Portability, and Robustness are very closely coupled with each work flow

15 HPC Workflow SW Technologies
Production Workflow Many technologies targeting specific pieces of workflow Need to quantify workflows (stages and % time spent) Need to measure technology impact on stages Workstation Supercomputer Algorithm Development Spec Design, Code, Test Port, Scale, Optimize Run Operating Systems Compilers Libraries Tools Problem Solving Environments Linux RT Linux Matlab Java C++ OpenMP F90 UPC Coarray ATLAS, BLAS, FFTW, PETE, PAPI VSIPL ||VSIPL++ CORBA MPI DRI UML Globus TotalView CCA ESMF POOMA PVL Mainstream Software HPC Software

16 Prototype Productivity Models
Special Model with Work Estimator (Sterling) Efficiency and Power (Kennedy, Koelbel, Schreiber) Utility (Snir) Productivity Factor Based (Kepner) CoCoMo II (software engineering community) Least Action (Numrich) Time-To-Solution (Kogge) HPCS has triggered ground breaking activity in understanding HPC productivity -Community focused on quantifiable productivity (potential for broad impact)

17 Example Existing Code Analysis
Analysis of existing codes used to test metrics and identify important trends in productivity and performance

18 Example Experiment Results (N=1)
Matlab C++ C Same application (image filtering) Same programmer Different langs/libs Matlab BLAS BLAS/OpenMP BLAS/MPI* PVL/BLAS/MPI* MatlabMPI pMatlab* Current Practice 3 Research 2 Distributed Memory 1 PVL BLAS /MPI BLAS /MPI pMatlab *Estimate 4 MatlabMPI Performance (Speedup x Efficiency) Shared Memory BLAS/ OpenMP 6 7 5 Single Processor BLAS Matlab Development Time (Lines of Code) Controlled experiments can potentially measure the impact of different technologies and quantify development time and execution time tradeoffs

19 Summary Goal is to develop an acquisition quality framework for HPC systems that includes Development time Execution time Have assembled a team that will develop models, analyze existing HPC codes, develop tools and conduct HPC development time and execution time experiments Measures of success Acceptance by users, vendors and acquisition community Quantitatively explain HPC rules of thumb: "OpenMP is easier than MPI, but doesn’t scale a high” "UPC/CAF is easier than OpenMP” "Matlab is easier the Fortran, but isn’t as fast” Predict impact of new technologies

20 Example Development Time Experiment
Goal: Quantify development time vs. execution time tradeoffs of different parallel programming models Message passing (MPI) Threaded (OpenMP) Array (UPC, Co-Array Fortran) Setting: Senior/1st Year Grad Class in Parallel Computing (MIT/BU, Berkeley/NERSC, CMU/PSC, UMD/?, …) Timeline: Month 1: Intro to parallel programming Month 2: Implement serial version of compact app Month 3: Implement parallel version Metrics: Development time (from logs), SLOCS, function points, … Execution time, scalability, comp/comm, speedup, … Analysis: Development time vs. Execution time of different models Performance relative to expert implementation Size relative to expert implementation

21 Hackystat in HPCS

22 About Hackystat Five years old: General application areas:
I wrote the first LOC during first week of May, 2001. Current size: 320,562 LOC (not all mine) ~5 active developers Open source, GPL General application areas: Education: teaching measurement in SE Research: Test Driven Design, Software Project Telemetry, HPCS Industry: project management Has inspired startup: 6th Sense Analytics

23 Goals for Hackystat-HPCS
Support automated collection of useful low-level data for a wide variety of platforms, organizations, and application areas. Make Hackystat low-level data accessable in a standard XML format for analysis by other tools. Provide workflow and other analyses over low-level data collected by Hackystat and other tools to support: discovery of developmental bottlenecks insight into impact of tool/language/library choice for specific applications/organizations.

24 Pilot Study, Spring 2006 Goal: Explore issues involved in workflow analysis using Hackystat and students. Experimental conditions (were challenging): Undergraduate HPC seminar 6 students total, 3 did assignment, 1 collected data. 1 week duration Gauss-Seidel iteration problem, written in C, using PThreads library, on cluster As a pilot study, it was successful.

25 Data Collection: Sensors
Sensors for Emacs and Vim captured editing activities. Sensor for CUTest captured testing activities. Sensor for Shell captured command line activities. Custom makefile with compilation, testing, and execution targets, each instrumented with sensors.

26 Example data: Editor activities

27 Example data: Testing

28 Example data: File Metrics

29 Example data: Shell Logger

30 Data Analysis: Workflow States
Our goal was to see if we could automatically infer the following developer workflow states: Serial coding Parallel coding Validation/Verification Debugging Optimization

31 Workflow State Detection: Serial coding
We defined the "serial coding" state as the editing of a file not containing any parallel constructs, such as MPI, OpenMP, or PThread calls. We determine this through the MakeFile, which runs SCLC over the program at compile time and collects Hackystat FileMetric data that provides counts of parallel constructs. We were able to identify the Serial Coding state if the MakeFile was used consistently.

32 Workflow State Detection: Parallel Coding
We defined the "parallel coding" state as the editing of a file containing a parallel construct (MPI, OpenMP, PThread call). Similarly to serial coding, we get the data required to infer this phase using a MakeFile that runs SCLC and collects FileMetric data. We were able to identify the parallel coding state if the MakeFile was used consistently.

33 Workflow State Detection: Testing
We defined the "testing" state as the invocation of unit tests to determine the functional correctness of the program. Students were provided with test cases and the CUTest to test their program. We were able to infer the Testing state if CUTest was used consistently.

34 Workflow State Detection: Debugging
We have not yet been able to generate satisfactory heuristics to infer the "debugging" state from our data. Students did not use a debugging tool that would have allowed instrumentation with a sensor. UMD heuristics, such as the presence of "printf" statements, were not collected by SCLC. Debugging is entwined with Testing.

35 Workflow State Detection: Optimization
We have not yet been able to generate satisfactory heuristics to infer the "optimization" state from our data. Students did not use a performance analysis tool that would have allowed instrumentation with a sensor. Repeated command line invocation of the program could potentially identify the activity as "optimization".

36 Insights from the pilot study, 1
Automatic inference of these workflow states in a student setting requires: Consistent use of MakeFile (or some other mechanism to invoke SCLC consistently) to infer serial coding and parallel coding workflow states. Consistent use of an instrumented debugging tool to infer the debugging workflow state. Consistent use of an "execute" MakeFile target (and/or an instrumented performance analysis tool) to infer the optimization workflow state.

37 Insights from the pilot study, 2
Ironically, it may be easier to infer workflow states from industrial settings than from classroom settings! Industrial settings are more likely to use a wider variety of tools which could be instrumented and provide better insight into development activities. Large scale programming leads inexorably to consistent use of MakeFiles (or similar scripts) that should simplify state inference.

38 Insights from the pilot study, 3
Are we defining the right set of workflow states? For example, the "debugging" phase seems difficult to distinguish as a distinct state. Do we really need to infer "debugging" as a distinct activity? Workflow inference heuristics appear to be highly contextual, depending upon the language, toolset, organization, and application. (This is not a bug, this is just reality. We will probably need to enable each MP to develop heuristics that work for them.)

39 Next steps Graduate HPC classes at UH.
The instructor (Henri Casanova) has agreed to participate with UMD and UH/Hackystat in data collection and analysis. Bigger assignments, more sophisticated students, hopefully larger class! Workflow Inference System for Hackystat (WISH) Support export of raw data to other tools. Support import of raw data from other tools. Provide high-level rule-based inference mechanism to support organization-specific heuristics for workflow state identification.


Download ppt "Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii."

Similar presentations


Ads by Google