Presentation is loading. Please wait.

Presentation is loading. Please wait.

Working Group 1 Enabling Technologies Chair: Sheila Vaidya Vice Chair: Stu Feldman.

Similar presentations


Presentation on theme: "Working Group 1 Enabling Technologies Chair: Sheila Vaidya Vice Chair: Stu Feldman."— Presentation transcript:

1 Working Group 1 Enabling Technologies Chair: Sheila Vaidya Vice Chair: Stu Feldman

2 WG 1 – Enabling Technologies Charter Charter –Establish the basic technologies that may provide the foundation for important advances in HEC capability, and determine the critical tasks required before the end of this decade to realize their potential. Such technologies include hardware devices or components and the basic software approaches and components needed to realize advanced HEC capabilities. Chair –Sheila Vaidya, Lawrence Livermore National Laboratory Vice-Chair –Stuart Feldman, IBM

3 WG 1 – Enabling Technologies Guidelines and Questions As input to HECRTF charge (1a), Please provide information about key technologies that must be advanced to strengthen the foundation for developing new generations of HEC systems. Include discussion of promising novel hardware and software technologies with potential pay-off for HEC Provide brief technology maturity roadmaps and investments, with discussion of costs to develop these technologies Discuss technology dependencies and risks (for example, does the roadmap depend on technologies yet to be developed?) Example topics: –semiconductors, memory (e.g. MRAM), networks (e.g. optical), packaging/cooling, novel logic devices (e.g. RSFQ), alternative computing models

4 Working Group Participants Kamal Abdali, NSF Fernand Bedard, NSA Herbert Bennett, NIST Ivo Bolsens, XILINX Jon Boyens, DOC Bob Brodersen, UC Berkeley Yolanda Comedy, IBM Loring Craymer, JPL Bronis R. de Supinski, LLNL Martin Deneroff, SGI Stuart Feldman, IBM (VICE- CHAIR) Sue Fratkin, CASC David Fuller, JNIC/Raytheon Gary Hughes, NSA Tyce McLarty, LLNL Kevin Martin, Georgia Tech Virginia Moore, NCO/ITRD Ahmed Sameh, Purdue John Spargo, Norhrop-Grumman William Thigpen, NASA Sheila Vaidya, LLNL (CHAIR) Uzi Vishkin, U Maryland Steven Wallach, Chiaro

5 Timescales 0-5 years –Suitable for deployment in high-end systems within next 5 years Implies that the technology has been tried and tested in a systems context Requires additional investment beyond commercial industry 5-10 years –Suitable for deployment in high-end systems in 10 years Implies that the component has been studied and feasibility shown Requires system embodiment and growing investment 10+ years –New research, not yet reduced to practice Usefulness in systems not yet demonstrated

6 Interconnects Passive 0-5 –Optical networking –Serial optical interface 5-10 –High-density optical networking –Optical packet switching 10+ –Scalability (node density, bandwidth) Active 0-5 –Electronic cross-bar switch –Network processing on board 5-10 –Data Vortex –Superconducting cross-bar switch 10+

7 Power/Thermal Management, Packaging 0-5 –Optimization for power efficiency –2.5-D packaging –Liquid cooling (e.g., spray) 5-10 –3-D packaging and cooling (microchannel) –Active temperature response 10+ –Higher scalability concepts (improving OPS/W)

8 Single Chip Architecture 0-5 –Power-efficient designs –System on Chip; Processor-in-Memory –Reconfigurable circuits –Fine-grained irregular parallel computing 5-10 –Adaptive architecture –Optical clock distribution –Asynchronous designs 10+

9 Memory Main Memory 0-5 –Optimized memory hierarchy –Smart memory controllers 5-10 –3-D memory (e.g., MRAM) 10+ –Nanoelectronics –Molecular electronics Storage & I/O 0-5 –Object-based storage –Remote DMA –I/O controllers (MPI, etc.) 5-10 –Software for “cluster” storage access to –MRAM, holographic, MEMS, STM, E-beam 10+ –Spectral hole burning –Molecular electronics

10 Device Technologies 0-5 –Silicon on Insulator, SiGe, mixed III-V devices –Integrated electro-optic and high-speed electronics 5-10 –Low-temperature CMOS –Superconducting - RSFQ 10+ –Nanotechnologies –Spintronics

11 Algorithms, SW-HW Tools 0-5 –Compiler innovations for new architectures –Tools for robustness (e.g., delay, fault tolerance) –Low-overhead coordination mechanisms –Performance monitors –Sparse matrix innovations 5-10 –Very High Level Language hardware support –Real-time performance monitoring and feedback –PRAM (Parallel Random Access Machine model) 10+ –Ideas too numerous to select

12 Generic Needs Sharing –NNIN-like consortia National Nanotechnology Infrastructure Network –Custom hardware production –Intellectual Property policies (open?) Tools for –Design for Testability –Physical design –Testing and Verification –Simulation –Programmability

13 High-Impact Themes 0-5 –Show value of HEC solutions to the commercial sector –Facilitate sharing and collaboration across HEC community –Technology Power/thermal management Optical networking 5-10 –Long-term consistent investment in HEC –Technology 3-D Packaging New devices (MRAM, MEMS, RSFQ) Power/thermal management & Optical – Ongoing 10+ years –Continued research for HEC

14 Working Group 2 COTS-Based Architecture Chair: Walt Brooks Vice Chair: Steve Reinhardt

15 WG2 – Architecture: COTS-based Charter Charter –Determine the capability roadmap of anticipated COTS-based HEC system architectures through the end of the decade. Identify those critical hardware and software technology and architecture developments, required to both sustain continued growth and enhance user support. Chair –Walt Brooks, NASA Ames Research Center Vice-Chair –Steve Reinhart, SGI

16 WG2 – Architecture: COTS-based Guidelines and Questions Identify opportunities and challenges for anticipated COTS-based HEC systems architectures through the decade and determine its capability roadmap. Include alternative execution models, support mechanisms, local element and system structures, and system engineering factors to accelerate rate of sustained performance gain (time to solution), performance to cost, programmability, and robustness. Identify those critical hardware and software technology and architecture developments, required to both sustain continued growth and enhance user support. Example topics: –microprocessors, memory, wire and optical networks, packaging, cooling, power distribution, reliability, maintenance, cost, size

17 Working Group Participants Steve Reinhardt (co-chair) Bill Kramer(L) Don Dossa Dick Hildebrandt Greg Lindahl Tom McWilliams Curt Janseen Erik DeBenedicttis Walt Brooks(chair) Rob Schreiber(L) Yuefan Deng Steven Gottlieb Charles lefurgy John Ziebarth Stephen Wheat Guang R. Gao Burton Smith

18 Assumptions/Definitions Definition of “COTS based” –Using systems originally intended for enterprise or individual use –Building Blocks-Commodity processors, commodity memory and commodity disks –Somebody else building the hardware and you have limited influence over –Examples IN-Redstorm, Blue Planet, Altix OUT-X1, Origins, SX-6 Givens –Massive disk storage (object stores) –Fast wires (SERDES-driven) –Heterogeneous systems (processors)

19 Primary Technical Findings Improve memory bandwidth –We have to be patient in the short term for the next 2-3 years the die has been cast –Sustained Memory bandwidth is not increasing fast enough –Judicious investment in the COTS vendors to effect 2008 Improve the Interconnects-”connecting to the interconnect” –Easier to influence than memory bandwidth –Connecting through I/O is too slow we need to connect to CPU at memory equivalent speeds One example is HyperTransport which represents a memory grade interconnect in terms of bandwidth and is a well defined I/F -others are under development Provide ability for heterogeneous COTS based systems. –E.g. -FPGA, ASIC,… in the fabric FPGA allows tightly coupled research on emerging execution models and architectural ideas without going to foundry Must have the software to support programming ease for FPGA

20 Technology Influence Direct Cost to Design Direct Lead Time 4-6 years $5 - 100M 1 year $5M $0.2M Board Board/Components Nodes/Frames I/O Interconnect CPU/Chips Indirect Lead Time Indirect Cost to Design 10-15 years $10M $50M $300- 1,000M

21 Programmatic Approaches Develop a Government wide coordinated method for direct Influence with the vendors to make “designs” changes –Less influence with COTS mfrs, more with COTS-based vendors –Recognize that commercial market is the primary driver for COTS “Go” in early Develop joint Government. research objectives-must go to vendors with a short focused list of HEC priorities –Where possible find common interests with the industries that drive the commodity market –“Software”- we may have more influence- Fund long Term Research –Academic research must have access to systems at scale in order to do relevant research –Strategy for moving University research into the market Government must be an early adopter –risk sharing with emerging systems

22 Software Issues Not clear that these are part of our charter but would like to be sure they are handled –Scaling “Linux” to 1000’s of processors Administrated at full scale for capability computing –Scalable File systems –Need Compiler work to keep pace –Managing Open Source Coordinating release implementation Open source multi-vendor approach-O/S,Languages,Libraries, debuggers… –Overhead of MPI is going to swamp the interconnect and hamper scaling Need a lower overhead approach to Message Passing

23 Parallel Computing Parallel computing is (now) the path to speed People think the problem is solved but it’s not Need new benchmarks that expose true performance of COTS If the government is willing to invest early even at the chip level there is the potential to influence design in a way that makes scaling “commodity” systems easier Parallel computers to be much more general purpose than they are today –More useful, easier to use, and better balanced –Continued growth of computing may depend on it –To get significantly more performance, we must treat parallel computing as first class –COTS processors especially will be influenced only by a generally applicable approach

24 Themes From White Papers Broad Themes –Exploit Commodity – One system doesn’t fit all applications-For specific family of codes Commodity can be a good solution –unique topology and algorithmic approaches allow exploitation of current technology Novel uses of current technology(Overlap with Panel 3) –RCM Technology- FPGA faster, lower power with multiple units-Hybrid FPGA-core is the traditional processor on chip with logic units-Need H/W architect for RCM-Apps suitable for RCM-RCM are about ease of programming –Streaming technology utilizing commercial chips –Fine grained multi threading Supporting Technology( Overlap with panel 1) –Self managing Self Aware systems –MRAM,EUVL,Micro-channel –Power Aware Computing –High end interconnect and scalable files systems –High performance interconnect technology, optical and others that can scale to large systems –Systems software that scales up gracefully to enormous processor count with reliability,efficiency and and ease of –There is a natural layering of technologies involved in a high-performance machine: – the basic silicon, the cell boards and shared memory nodes, the cluster interconnect, the racks, the cooling, the OS kernel, the added OS services, the runtime libraries, the compilers and languages, the application libraries.

25 Relevant White Papers 18 of the 64/80 papers have some relevance to our topic 6 10 12 16 17 31 33 39 45 46 47 50 65 68 72 75 80

26 Working Group 3: Custom-Based Architectures Chair: Peter Kogge Vice Chair: Thomas Sterling

27 WG3 – Architecture: Custom based Charter Charter –Identify opportunities and challenges for innovative HEC system architectures, including alternative execution models, support mechanisms, local element and system structures, and system engineering factors to accelerate rate of sustained performance gain (time to solution), performance to cost, programmability, and robustness. Establish a roadmap of advanced-concept alternative architectures likely to deliver dramatic improvements to user applications through the end of the decade. Specify those critical developments achievable through custom design necessary to realize their potential. Chair –Peter Kogge, Notre Dame Vice-Chair –Thomas Sterling, California Institute of Technology & Jet Propulsion Laboratory

28 WG3 – Architecture: Custom based Guidelines and Questions Present driver requirements and opportunities for innovative architectures demanding custom design Identify key research opportunities in advanced concepts for HEC architecture Determine research and development challenges to promising HEC architecture strategies. Project brief roadmap of potential developments and impact through the end of the decade. Specify impact and requirements of future architectures on system software and programming environments. Example topics: –System-on-a-chip (SOC), Processor-in-memory (PIM), streaming, vectors, multithreading, smart networks, execution models, efficiency factors, resource management, memory consistency, synchronization

29 Working Group Participants Duncan Buell, U. So. Carolina George Cotter, NSA William Dally, Stanford Un. James Davenport, BNL Jack Dennis, MIT Mootaz Elnozahy, IBM Bill Feiereisen, LANL Michael Henesey, SRC Computers David Fuller, JNIC David Kahaner, ATIP Peter Kogge, U. Notre Dame Norm Kreisman, DOE Grant Miller, NCO Jose Munoz, NNSA Steve Scott, Cray Vason Srini, UC Berkeley Thomas Sterling, Caltech/JPL Gus Uht, U. RI Keith Underwood, SNL John Wawrzynek, UC Berkeley

30 Charter (from Charge) Identify opportunities & challenges for innovative HEC system architectures, including –alternative execution models, –support mechanisms, –local element and system structures, –and system engineering factors to accelerate –rate of sustained performance gain (time to solution), –performance to cost, –programmability, –and robustness. Establish roadmap of advanced-concept alternative architectures likely to deliver dramatic improvements to user applications through the end of the decade. Specify those critical developments achievable through custom design necessary to realize their potential.

31 Original Guidelines and Questions Present driver requirements and opportunities for innovative architectures demanding custom design Identify key research opportunities in advanced concepts for HEC architecture Determine research and development challenges to promising HEC architecture strategies. Project brief roadmap of potential developments and impact through the end of the decade. Specify impact and requirements of future architectures on system software and programming environments. (new) What role should/do universities play in developments in this area

32 Outline What is Custom Architecture (CA) Endgame Objectives, Benefits, & Challenges Fundamental Opportunities Delivered by CA Road Map Summary Findings Difficult fundamental challenges Roles of Universities

33 What Is Custom Architecture? Major components designed explicitly and system balanced for support of scalable, highly parallel HEC systems Exploits performance opportunities afforded by device technologies through innovative structures Addresses sources of performance degradation (inefficiencies) through specialty hardware and software mechanisms Enable higher HEC programming productivity through enhanced execution models Should incorporate COTS components where useful without sacrifice of performance

34 Endgame Objectives Enable solution of –Problems we can’t solve now –And larger versions of ones we can solve now Base economic model: provides 10 – 100X ops/Lifecycle $ AT SCALE –Vs inefficiencies of COTS Significant reduction in real cost of programming –Focus on sustained performance, not peak

35 Strategic Benefits Promotes architecture diversity Performance: ops & bandwidth over COTS –Peak: 10X – 100X through FPU proliferation –Memory bandwidth 10X-100X through network and signaling technology –Focus on sustainable High Efficiency –Dynamic latency hiding –High system bandwidth and low latency –Low overhead Enhanced Programmability –Reduced barriers to performance tuning –Enables use of programming models that simplify programming and eliminate sources of errors Scalability –Exploits parallelism at all levels of parallelism Cost, size, and power –High compute density

36 Challenges To Custom Small market and limited opportunity to exploit economy of scale Development lead time Incompatibility with standard ISAs Difficulty of porting legacy codes Training of users in new execution models Unproven in the field Need to develop new software infrastructure Less frequent technology refresh Lack of vendor interest in leading edge small volumes

37 Fundamental Technical Opportunities Enabled by CA Enhanced Locality – Increasing Computation/Communication Demand Exceptional global bandwidth Architectures that enable utilization of global bandwidth Execution models that enable compiler/programmer to use the above

38 Enhanced Locality – Increasing Computation/Communication Demand Mechanisms Spatial computation via reconfigurable logic Streams that capture physical locality by observing temporal locality Vectors – scalability and locality microarchitecture enhancements PIM – capture spatial locality via high bandwidth local memory (low latency) Deep and explicit register & memory hierarchies –With software management of hierarchies Technologies Chip stacking to increase local B/W

39 Providing Exceptional Global Bandwidth Mechanisms: High radix networks Non-blocking, bufferless topologies Hardware congestion control Compiler scheduled routing Technologies: High speed signaling (system-oriented) –Optical, electrical, heterogeneous (e.g. VCSEL) Optical switching & routing High bandwidth memory device, high density Notes: Routing & flow control are nearing optimal

40 Architectures that Enable Use of Global Bandwidth Note: This addresses providing the traffic stream to utilize the enhanced network Stream and Vectors Multi-threading (SMT) Global shared memory (a communication overhead reducer) Low overhead message passing Augmenting microprocessors to enhance additional requests (T3E, Impulse) Prefetch mechanisms

41 Execution Models Note: A good model should: –Expose parallelism to compiler & system s/w –Provide explicit performance cost model for key operations –Not constrain ability to achieve high performance –Ease of programming Spatial direct mapped hardware Resource flow Streams Flat vs Dist. Memory (UMA/NUMA vs M.P.) New memory semantics CAF and UPC, first good step Low overhead synchronization mechanisms PIM-enabled: Traveling threads, message-driven, active pages,...

42 Roadmap: When to Expect CA Deployment 5 Years or less –Must have relatively mature support s/w (and/or “friendly users”) 5-10 years –Still open research issues in tools & system s/w –Approaching 10 years if requires mind set change in applications programmers 10-15 years: –After 2015 all that’s left in silicon is architecture

43 Roadmap - 5 Year Period Significant research prototype examples –Berkeley Emulation Engine: $0.4M/TF by 2004 on Immersed Boundary method codes –QCDOC: $1M/TF by 2004 –Merrimac Streaming: $40K/TF by 2006 –Note: several companies are developing custom architecture roadmaps

44 Roadmap - 5 Years or Less Technologies Ready for Insertion High bandwidth network technology can be inserted –No software changes SMT: will be ubiquitous within 5 years –But will vendors emphasize single thread performance in lieu of supporting increased parallelism Spatial direct mapped approach

45 Roadmap - 5 to 10 Years All prior prototypes could be expanded to reach PF sustained at competitive recurring $ Industry is targeting sustained Petaflops –If properly funded Need to encourage transfer of research results Virtually all of prior technology opportunities will be deployable –Drastic changes to programming will limit adoption

46 Roadmap: 10-15 Years Silicon scaling at sunset –Circuit, packaging, architecture, and software opportunities remain Need to start looking now at architectures that mesh with end of silicon roadmap and non-silicon technologies –Continue exponential scaling of performance –Radically different timing/RAS considerations –Spin out: how to use faulty silicon

47 Findings Significant CA-driven opportunities for enhanced Performance/Programmability –10-100X potential above COTS at the same time Multiple, CA-driven innovations identified for near & medium term –Near term: multiple proof of concept –Medium term: deployment @ petaflops scale Above potential will not materialize in current funding culture

48 Findings (2) No one side of the community can realize opportunities of future Custom Architecture: –Strong peer-peer partnering needed between industry, national labs, & academia –Restart pipeline of HEC & parallel-oriented grad students & faculty Creativity in system S/W & programming environments must support, track, & reflect creativity in HEC architecture

49 Findings (3) Need to start now preparing for end of Moore’s Law and transition into new technologies –If done right, potential for significant trickle back to silicon

50 Fundamentally Difficult Challenges Technical Newer applications for HEC OS geared specifically to highly scaled systems How to design HEC for upgradable High Latency, low bandwidth ratios of memory chips and systems File systems Reliability with unreliable components at large scale Fundamentally parallel ISAs

51 Fundamentally Difficult Challenges Cultural Instilling change into programming model Software inertia How should HEC be viewed –As a service vs product I/O, SAN, Storage systems for HEC How to define requirements

52 Universities As A Critical Resource Provide innovative concepts and long term vision Provide students Keeps the research pipeline full Good at early simulations and prototype tools Students no longer commonly exposed to massive parallelism Parallel computing architecture students in significant decline, as well as those interested in HEC Difficult to roll leading edge chips but only place for 1 st generation prototypes of novel concepts Don’t do well at attacking the hard problems of moving beyond 1 st prototype, or productizing Soft money makes it hard to keep teams together

53 Working Group 4: Runtime and Operating Systems Chair: Rick Stevens Vice Chair: Ron Brightwell

54 WG 4– Runtime and OS Charter Charter –Establish baseline capabilities required in the operating systems for projected HEC systems scaled to the end of this decade and determine the critical advances that must be undertaken to meet these goals. Examine the potential, expanded role of low-level runtime system components in support of alternative system architectures. Chair –Rick Stevens, Argonne National Laboratory Vice-Chair –Ron Brightwell, Sandia National Laboratory

55 WG 4– Runtime and OS Guidelines and Questions Establish principal functional requirements of operating systems for HEC systems of the end of the decade Identify current limitations of OS software and determine initiatives required to address them Discuss role of open source software for HEC community needs and issues associated with development/maintenance/use of open source Examine future role of runtime system software in the management/use of HEC systems containing from thousands to millions of nodes. Example topics: –file systems, open source software, Linux, job and task scheduling, security, grid- interoperable, memory management, fault tolerance, checkpoint/restart, synchronization, runtime, I/O systems

56 Working Group Participants Ron Brightwell Neil Pundit Jeff Brown Lee Wand Gary Girder Ron Minnich Leslie Hart DK Panda Thuc Hoang Bob Balance Barney McCabe Wes Felter Keshav Pingali Deborah Crawford Asaph Zemach Dan Reed Rick Stevens

57 Our Charge Establish Principal Functional Requirements of OS/runtime for systems for the end of the decade systems Assumptions: –Systems with 100K-1M nodes (fuzzy notion of node) +-order of magnitude (SMPs, etc.) –COTS and custom targets included Role of Open Source in enabling progress Formulate critical recommendations on research objectives to address the requirements

58 Critical Topics Operating System and Runtime APIs High-Performance Hardware Abstraction Scalable Resource Management File Systems and Data Management Parallel I/O and External Networks Fault Management Configuration Management OS Portability and Development Productivity Programming Model Support Security OS and Systems Software Development Test beds Role of Open Source

59 Recurring Themes Limitations of UNIX Blending of OS and runtime models Coupling apps and OS via feedback mechanisms Performance transparency (visibility) Minimalism and enabling applications access to HW Desire for more hardware support for OS functions “Clusters” are the current OS/runtime targets Lack of full-scale test beds limiting progress OS “people” need to be involved in design decisions

60 OS APIs (e.g. POSIX) Findings: –POSIX APIs not adequate for future systems Lack of performance transparency Global state assumed in POSIX semantics Recommendations: –Determine a subset of POSIX APIs suitable for High- performance Computing at scale –New API development addressing scalability and performance transparency Explicitly support research in developing non-POSIX compatible

61 Hardware Abstractions Findings: –HAs needed for portability and improved resource management Remove dependence on physical configurations –virtual processors abstractions (e.g. MPI processes) Virtualization to improve resource management –virtual PIMs, etc. for improved programming model support Recommendations: –Research to determine what are the right candidates for virtualization –Develop low overhead mechanisms for enabling abstraction –Making abstraction layers visible and optional where needed

62 Scalable Resource Management Findings: –Resource allocation and scheduling at the system and node level are critical for large-scale HEC systems –Memory hierarchy management will become increasingly important –Dynamic process creation and dynamic resource management increasingly important –Systems most likely to be space-shared –OS support required for management of shared resources (network, I/O, etc.)

63 Scalable Resource Management Recommendations: –Investigate new models for resource management Enabling user applications to have as much control of low- level resource management where needed Compute-Node model –Minimal runtime and App can bring as much or as little OS with them Systems/Services Nodes –Need more OS services to manage shared resources I/O systems and fabric need to be managed –Explore cooperative services model Some runtime and traditional OS combined into a cooperative scheme, offload services not considered critical for HEC –Increase the potential use of dynamic resource management at all levels

64 Data Management and File Systems Findings: –The POSIX model for I/O is incompatible with future systems –The passive file system model may also not be compatible with requirements for future systems Recommendations: –Develop an alternative (to POSIX) API for file system –Investigate scalable authentication and authorization schemes for data –Research scalable schemes for handling file systems (data management) metadata –Consider moving processing into the I/O paths (storage devices)

65 Parallel and Network I/O Findings: –I/O channels will be highly parallel and shared (multiple users/jobs) –External network and grid interconnects will be highly parallel –The OS will need to manage I/O and network connections as a shared resource (even in space shared systems) Recommendations: –Develop new scalable approaches to supporting I/O and network interfaces (near term) –Consider integrating I/O and network interface protocols (medium term) –Develop HEC appropriate system interfaces to grid services (long term)

66 Fault Management Findings: –Fault management is increasingly critical for HEC systems –The performance impacts of fault detection and management may be significant and unexpected –Automatic fault recovery may not be appropriate in some cases –Fault prediction will become increasingly critical Recommendations: –Efficient schemes for fault detection and prediction What can be done in hardware? –Improved runtime handling (graceful degradation) of faults –Investigate integration of fault management, diagnostics with advanced configuration management –Autonomic computing ideas relevant here

67 Configuration Management Findings: –Scalability of management tools needs to be improved Manage to a provable state (database driven management) –Support interrupted firmware/software update cycles (surviving partial updates) –New models of configuration (away from file based systems) may be important directions for the future Recommendations: –New models for systems configuration needed –Scalability research (scale invariance, abstractions) –Develop interruptible update schemes (steal from database technologies) –Fall back, fall forward –Automatic local consistency

68 OS Portability Findings: –Improving OS portability and OS/runtime code reuse will improve OS development productivity Device drivers (abstractions) Shared code base and modular software technology Recommendations: –Develop new requirements for device driver interfaces Support unification where possible and where performance permits –Consider developing a common runtime execution software platform –Research toward improving use of modularization and components in OS/runtime development

69 OS Security for HEC Systems Findings: –Current (nearly 30 year old) Unix security model has significant limitations –Multi-level Security (orange book like) may be a requirement for some HEC systems –Current Unix security model is deeply coupled to current OS semantics and limits scalability in many cases Recommendations: –Active resource models Rootless, UIDless, etc. –Eros, Plan 9 models possible starting point –Fund research explicitly different from UNIX

70 Programming Model Support in OS Findings: –MPI has productivity limitations, but is the current standard for portable programming, need to push beyond MPI –UPC and CAF considered good candidates for improving productivity and probably should be targets for improved OS support Recommendations: –Determine OS level support needed for UPC and CAF and accelerate support for these (near term) –Performance and productivity tool support (debuggers, performance tools, etc.)

71 Testbeds for Runtime and OS Findings: –Lack of full scale test beds have slowed research in scalable OS and systems software –Test beds need to be configured to support aggressive testing and development Recommendations: –Establish one or more full scale (1,000’s nodes) test beds for runtime, OS and Systems software research communities –Make test beds available to University, Laboratory and Commercial developers

72 The Role of Open Source Findings: –Open source model for licensing and sharing of software valuable for HEC OS and runtime development –Open source (open community) development model may not be appropriate for HEC OS development –The Open Source contract model may prove useful (LUSTRE model) Recommendations: –Encourage use of open source to increase leverage in OS development –Consider creating and funding an Institute for HEC OS/rumtime Open Source development and maintenance (keeping the HEC community in control of key software systems)

73 Working Group 5 Programming Environments and Tools Chair: Dennis Gannon Vice Chair: Rich Hirsh

74 WG5 – Programming Environments and Tools Charter Charter –Address programming environments for both existing legacy codes and alternative programming models to maintain continuity of current practices, while also enabling advances in software development, debugging, performance tuning, maintenance, interoperability and robustness. Establish key strategies and initiatives required to improve time to solution and ensure the viability and sustainability of applying HEC systems by the end of the decade. Chair –Dennis Gannon, Indiana University Vice-Chair –Rich Hirsh, NSF

75 WG5 – Programming Environments and Tools Guidelines and Questions Assume two possible paths to future programming environments: –incremental evolution of existing programming languages and tools consistent with portability of legacy codes –innovative programming models that dramatically advance user productivity and system efficiency/performance Specify requirements of programming environments and programmer training consistent with incremental evolution, including legacy applications Identify required attributes and opportunities of innovative programming methodologies for future HEC systems Determine key initiatives to improve productivity and reduce time-to-solution along both paths to future programming environments Example topics: –Programming models, portability, debugging, performance tuning, compilers

76 Charter Address programming environments for both existing legacy codes and alternative programming models to maintain continuity of current practices, while also enabling advances in software development, debugging, performance tuning, maintenance, interoperability and robustness. Establish key strategies and initiatives required to improve time to solution and ensure the viability and sustainability of applying HEC systems by the end of the decade.

77 Guidelines Assume two possible paths to future programming environments: –incremental evolution of existing programming languages and tools consistent with portability of legacy codes –innovative programming models that dramatically advance user productivity and system efficiency/performance Specify requirements of programming environments and programmer training consistent with incremental evolution, including legacy applications Identify required attributes and opportunities of innovative programming methodologies for future HEC systems Determine key initiatives to improve productivity and reduce time-to-solution along both paths to future programming environments

78 Key Findings Revitalizing evolutionary progress requires a dramatically increased investment in –Improving the quality/availability/usability of software development lifecycle tools –Building interoperable libraries and component/application frameworks that simplify the development of HEC applications Revitalizing basic research in revolutionary HEC programming technology to improve time-to-solution: –Higher Level programming models for HEC software developers that improve productivity –Research on the hardware/software boundary to improve HEC application performance

79 The Strategy Need an attitude change about software funding for HEC. –Software is a major cost component for all modern complex technologies. Mission critical and basic research HEC software is not provided by industry –Need federally funded, management and coordination of the development of high end software tools. –Funding is needed for Basic research and software prototypes Technology Transfer: –moving successful research prototypes into real production quality software. –Structural changes are needed to support sustained engineering Software capitalization program Institute for HEC advanced software development and support. –Could be a cooperative effort between industry, labs, universities.

80 The Strategy A new approach is needed to education for HEC. –A national curriculum is needed for high performance computing. –Continuing education and building interdisciplinary science research. –A national HEC testbed for education and research

81 The State of the Art in HEC Programming Languages (used in Legacy software) –A blend of traditional scientific programming languages, scripting languages plus parallel communication libraries and parallel extensions (Fortran 66-95, C++, C, Python, Matlab )+MPI+OpenMP/threads, HPF Programming Models in current use –Traditional serial programming –Global address space or partitioned memory space (mpi+on linux cluster) –SPMD vs MPMD

82 The Evolutionary Path Forward Already Exists For Languages –Co-array Fortran, UPC, Adaptive MPI, specialized C++ template libraries For Models –Automatic parallelization of whole-program serial legacies no longer considered sufficient, but it is important for code generation for procedure bodies on modern processors. –multi-paradigm parallel programming is desirable goal and within reach

83 Short Term Needs There is clearly very slow progress in evolving HEC software practices to new languages and programming models. The rest of the software industry is moving much faster. –What is the problem? –Scientists/engineers continue to use the old approaches because it is still perceived as the shortest path to the goal … a running code. In the short term, we need –A major initiative to improve the software design, debugging, testing and maintenance environment for HEC systems

84 The Components of a Solution Our applications are rapidly evolving to multi- language, multi-disciplinary, multi-paradigm software systems –High end computing has been shut out of a revolution in software tools When tools have been available centers can’t afford to buy them. Scientific programmers are not trained in software engineering. –For example, industrial quality build, configure and testing tools are not available for HEC applications/languages. We need portable of software maintenance tools across HEC platforms.

85 The Components of a Solution We need a rapid evolution of all language processing tools –Extensible standards are needed: examples -language object file format, compiler intermediate forms. –Want complete interoperability of all software lifecycle tools. Performance analysis should be part of every step of the life cycle of a parallel program –Feedback from program execution can drive automatic analysis and optimization.

86 The Evolution of HEC Software Libraries The increasing complexity of scientific software (multi- disciplinary, multi-paradigm) has other side effects –Libraries are an essential way to encapsulate algorithmic complexity but Parallel libraries are often difficult to compose because of low level conflicts over resources. Libraries often require low-level flat interfaces. We need a mechanism to exchange more complex and interesting data structures. Software Component Technology and Domain-specific application frameworks are one solution

87 Components and Application Frameworks Provides an approach to factoring legacy into reusable components that can be flexibly composed. –Resource management is managed by the framework and components encapsulate algorithmic functionality Provides for polymorphism and evolvability. Abstract hardware/software boundary and allow better language independence/interoperability. Testing/validating is made easier. Can insure components can be trusted. May enable a marketplace of software libraries and components for HEC systems. However, no free lunch. –It may be faster to build a reliable application from reusable components, but will it have performance scalability? Initial results indicate the answer is yes.

88 Revolutionary Approaches:The Long Range View We still have problems getting efficiency out of large scale cluster architectures. A long range program of research is needed to explore –New programming models for HEC systems –Scientific languages of the future: Scientist does not think about concurrency but rather science. Expressing concurrency as the natural parallelism in the problem. –Integrating locality model into the problem can be the real challenge Languages built from first principles to support the appropriate abstractions for scalable parallel scientific codes (e.g. ZPL).

89 New Abstractions for Parallel Program Models Approaches that promote automatic resource management. Integration of user-domain abstractions into compilation. –Extensible Compilers Telescoping languages –application level languages transformed into high level parallel languages transformed into … Locality may be part of new programming models. To be able to publish and discover algorithms. Automatic generation of missing components. Integrating persistence into programming model. Better support for transactional interactions in applications

90 Programming Abstractions cont. Better separation of data structure and algorithms. Programming by contract –Quality of service, performance and correctness Integration of declarative and procedural programming Roundtrip engineering/model driven software –Reengineering: Specification to design and back Type systems that have better support for architectural properties.

91 Research on the Hardware/Software Boundary Instruction set architecture –Performance counters, interaction with VM and Memory Hierarch Open bidirectional APIs between hardware and software Programming methodology for reconfigurable hardware will be a significant challenge. Changing memory consistence models depending on applications.

92 Research on the Hardware/Software Boundary Predictability (scheduling, fine-grained timing, memory mapping) is essential for scalable optimization. Fault Tolerance/awareness –For systems with millions of processors the applications/runtime/os will need to be aware of and have mechanisms to deal with faults. –Need mechanisms to identify and deal with faults at every level. –Develop programming models that better support non-determinism (including desirable but boundedly-incorrect results).

93 Hardware/Software Boundary: Memory Hierarchy There are limits to what we can do with legacy code that has bad memory locality problems. Software needs better control data structure-to-hierarchy layout. New solutions: –Cache aware/cache oblivious algorithms –Need more research on the role of virtual memory or file caching. –Threads can be used to hide latency. –New ways to think about data structures. First class support for hierarchical data structures. –Streaming models –Integration of persistence and aggressive use of temporal locality. –Separation of algorithm and data structure, i.e. generic programming. –Support from system software/hardware to control aspects of memory hierarchy.

94 Best Practices and Education Education is crucial for the effective use of HEC systems. –Apps are more interdisciplinary Requires interdisciplinary teams of people: –Drives need for better software engineering. –Application scientists does not need to be an expert on parallel programming. Multi-disciplinary teams including computer scientist. –Students need to be motivated to learn that performance is fun. Updated curriculum to use HEC systems. Educators/student need access to HEC systems –Need to increase support for student fellowships in HEC.

95 Working Group 6 Performance Modeling, Metrics and Specifications Chair: David Bailey Vice Chair: Allen Snavely

96 WG6 – Performance Modeling, Metrics, and Specifications Charter Charter –Establish objectives of future performance metrics and measurement techniques to characterize system value and productivity to users and institutions. Identify strategies for evaluation including benchmarking of existing and proposed systems in support of user applications. Determine parameters for specification of system attributes and properties. Chair –David Bailey, Lawrence Berkeley National Laboratory Vice-Chair –Allan Snavely, UC San Diego

97 WG6 – Performance Modeling, Metrics, and Specifications Guidelines and Questions As input to HECRTF charge (2c), provide information about the types of system design specifications needed to effectively meet various application domain requirements. Examine current state and value of performance modeling, metrics for HEC and recommend key extensions Analyze performance-based procurement specifications for HEC that lead to appropriately balanced systems. Recommend initiatives needed to overcome current limitations in this area. Example topics: –Metrics, time to solution, measurement and modeling methods, benchmarking, specification parameters, time to solution and relationship to fault-tolerance

98 Working Group Participants David Bailey Stan Ahalt Stephen Ashby Rupak Biswas Patrick Bohrer Carleton DeTar Jack Dongarra Ahmed Gameh Brent Gorda Adolfy Hoisie Sally McKee David Nelson Allan Snavely Carleton DeTar Jeffrey Vetter Theresa Windus Patrick Worley and others

99 Charter Establish objectives of future performance metrics and measurement techniques to characterize system value and productivity to users and institutions. Identify strategies for evaluation including benchmarking of existing and proposed systems in support of user applications. Determine parameters for specification of system attributes and properties.

100 Fundamental Metrics Best single overriding metric: time to solution. Time to solution includes: Execution time. Time spent in batch queues. System background interrupts and other overhead. Time lost due to scheduling inefficiencies, downtime. Programming time, debugging and tuning time. Pre-processing: grid generation, problem definition, etc. Post-processing: output data management, visualization, etc. System balance depends on workload – there is no one formula for all applications.

101 Related Factors Programming time and difficulty –Must be better understood (and reduced). –Identify key factors affecting development time. –Identify HPC relevant techniques from software engineering. –Closely connected to research in programming models and languages. System-level efficiency –Some metrics exist (i.e. ESP). Performance stability Grid generation, problem definition, etc. –For some applications, this step requires effort more than the computational step. –No good metrics at present time.

102 Current Best Practice for Procurements Characterize machines via micro-benchmarks and synthetic benchmarks run on available machines. –Numerous general specifications. –Results on some standard benchmarks. –Results on application benchmarks (different for each procurement). Identify and track applications of interest. –Use modeling to characterize performance. –Validate models on largest available system of that kind. Optimization problem–solving with constraints, including performance, dollars, floor space, power. –This step is not standardized, currently ad hoc. This approach is inadequate to select systems 10x or more beyond systems in use at a given point in time.

103 Toward Performance-Based System Selection Procurements or other system selections should not be based on any single figure of merit. Can various agencies converge on a reference set of discipline-specific benchmark applications? On a set of micro-benchmarks? How can we better handle intellectual property and classified code issues in procurements? Accurate performance modeling holds the best promise for simplifying procurement benchmarking.

104 Performance Modeling Goals: A set of low-level basic system metrics, plus a solid methodology for accurately projecting the performance of a specific high-level application program on a specific high-end system. Challenges: –Current approaches require significant skill and expertise. –Current approaches require large amounts of run time. –Fast, nearly automatic, easy-to-use schemes are needed. Benefits: –Architecture research –Procurements –Vendors –Users

105 Potential Modeling Impact Influence architecture early in the design cycle Improve applications development –Use modeling in the entire lifecycle of an application, including algorithmic selection, code development, software engineering, deployment, tuning. Impact assessment –Project new science enabled by a proposed petaflop system. Research needed in: –Novel approaches to performance modeling: analytical, statistical, kernels and benchmarks, synthetic programs. –How to deal with exploding quantity of performance data on systems with 10,000+ CPUs. –Online reduction of trace data.

106 System Simulation Salishan Conference, Apr. 2003: “Computational scientists have become quite expert in using high-end computers to model everything except the systems they run on.” Research in the parallel discrete event simulation (PDES) field now makes it possible to: Develop a modular open-source system simulation facility, to be used by researchers and vendors. –Prime application: modeling very large-scale inter-processor networks. Need to work with vendors to resolve potential intellectual property issues.

107 Tools and Standards Characterized workloads from different agencies –Establishing common set of low-level micro-benchmarks predictive of performance. –In-depth characterization of applications incorporated in a common performance modeling framework. –Enables comparability of models and cooperative sharing of workload requirements. A standardized simulation framework for modeling and predicting performance of future machines. Diagnostic tools to reveal factors affecting performance on existing machines. Intelligent, visualization-based facilities to locate “hot spots” and other performance anomalies.

108 Performance Tuning Self-tuning library software: FFTW, Atlas, LAPACK. Near-term (1-5 yrs): –Extend to numerous other scientific libraries. Mid-term (5-10 yrs): –Develop prototype pre-processor tools that can extend this technology to ordinary user-written codes. Long-term (10-15 yrs): –Incorporate this technology into compilers. Example from history–vectorization: –Step 1: Completely manual, explicit vectorization –Step 2: Semi-automatic vectorization, using directives –Step 3: Generate both scalar and vector code, selected with run-time analysis

109 Working Group 7 Application-Driven System Requirements Chair: Mike Norman Vice Chair: John Van Rosendale

110 WG7 – Application-driven System Requirements Charter Charter –Identify major classes of applications likely to dominate HEC system usage by the end of the decade. Determine machine properties (floating point performance, memory, interconnect performance, I/O capability and mass storage capacity) needed to enable major progress in each of the classes of applications. Discuss the impact of system architecture on applications. Determine the software tools needed to enable application development and support for execution. Consider the user support attributes including ease of use required to enable effective use of HEC systems. Chair –Mike Norman, University of California at San Diego Vice-Chair –John Van Rosendale, DOE

111 WG7 – Application-driven System Requirements Guidelines and Questions Identify major classes of applications likely to dominate use of HEC systems in the coming decade, and determine the scale of resources needed to make important progress. For each class indicate the major hardware, software and algorithmic challenges. Determine the range of critical systems parameters needed to make major progress on the applications that have been identified. Indicate the extent to which system architecture effects productivity for these applications. Identify key user environment requirements, including code development and performance analysis tools, staff support, mass storage facilities, and networks. Example topics: –applications, algorithms, hardware and software requirements, user support

112 Discipline Coverage Lattice Gauge Theory Accelerator Physics Magnetic Fusion Chemistry and Environmental Cleanup Bio-molecules and Bio-Systems Materials Science and Nanoscience Astrophysics and Cosmology Earth Sciences Aviation

113 FINDING #1

114 Top Challenges Achieving high sustained performance on complex applications becoming more and more difficult Building and maintaining complex applications Managing data tsunami (input and output) Integrating multi-scale space and time, multi- disciplinary simulations

115 Multi-Scale Simulation in Nanoscience Maciej Gutowski, WP 001

116 Question 1 Identify major classes of applications likely to dominate use of HEC systems in the coming decade, and determine the scale of resources needed to make important progress. For each class indicate the major hardware, software and algorithmic challenges. 1 cm 10 27 cm

117 Question 2 Determine the range of critical systems parameters needed to make major progress on the applications that have been identified. Indicate the extent to which system architecture effects productivity for these applications.

118 Question 3 Identify key user environment requirements, including code development and performance analysis tools, staff support, mass storage facilities, and networks.

119 Findings: HW [1] 100x current sustained performance needed now in many disciplines to reach concrete objectives A spectrum of architectures is needed to meet varying application requirements –Customizable COTS an emerging reality –Closer coupling of application developers with computer designer needed The time dimension is sequential: difficult to parallelize – ultrafast processors and new algorithms are required. –fusion, climate simulation, biomolecular, astrophysics: multiscale problems in general

120 Findings: HW [2] Thousands of CPUs useful with present codes and algorithms; reservations about 10,000 (scalability and reliability) –Some applications can effectively exploit 1000s of cpus only by allowing problem size to grow (weak scaling) Memory bandwidth and latency seems to be a universal issue Communication fabric latency/bandwidth is a critical issue: applications vary greatly in their communications needs

121 Findings: Software SW model of single-programmer monolithic codes is running out of steam – need to switch to a team-based approach (a’la SciDAC) –scientists, application developers, applied mathematicians, computer scientists –modern SW practices for rapid response Multi-scale and/or multi-disciplinary integration is a social as well as a technical challenge –new team structures and new mechanisms to support collaboration are needed –intellectual effort is distributed, not centralized

122 Findings: User Environment Emerging data management challenge in all sciences; e.g., bio-sciences Massive shared memory architectures for data analysis/assimilation/mining –TB’s / day (NCAR/GFDL, NERSC, DOE Genome to Life, HEP) –sequential ingest/analysis codes –I/O-centric architectures HEC Visualization environments a la DOE Data Corridors

123 Strategy and Policy [1] HEC has become essential to the advancement of many fields of science & engineering US scientific leadership in jeopardy without increased and balanced investment in HEC hardware and wetware (i.e., people) 100x increase of current sustained performance needed now to maintain scientific leadership

124 Strategy and Policy [2] A spectrum of architectures is needed to meet varying application requirements New institutional structures needed for disciplinary computational science teams (research facility model) –An integrated answer to Question 3

125 Facilities Analogy Fe Ultra-high vacuum station Sample Neutron Reflectometer National User Facility User Interface “End Station” Materials Science Research Network  Standards Based - Tool Kits  Open Source Repository  Workshops  Education Materials : Math : Computer Scientists HPC Facilities Collaborative Research Teams Users Domain Specific Research Networks Nano-Magnetism Strongly Correlated Materials Magnetism CRT Correlation CRT Microstructure CRT Direction of competition Spallation Neutron Source (SNS) NERSC ORNL-CCS PSC Fusion High Res. Triple Axis Small Angle Scattering QCD Fusion CRT 1 QCD CRT 1 Polymer Science Dynamics

126 Working Group 8 Procurement, Accessibility and Cost of Ownership Chair: Frank Thames Vice-Chair: Jim Kasdorf

127 WG8 – Procurement, Accessibility, and Cost of Ownership Charter Charter –Explore the principal factors affecting acquisition and operation of HEC systems through the end of this decade. Identify those improvements required in procurement methods and means of user allocation and access. Determine the major factors contributing to the cost of ownership of the HEC system over its lifetime. Identify impact of procurement strategy including benchmarks on sustained availability of systems. Chair –Frank Thames, NASA Vice-Chair –Jim Kasdorf, Pittsburgh Supercomputing Center

128 WG8 – Procurement, Accessibility, and Cost of Ownership Guidelines and Questions Evaluate the implications of the virtuous infrastructure cycle i.e. the relationship among the advanced procurement development and deployment for shaping research, development, and procurement of HEC systems. As input to HECRTF charge (3c), provide information about total cost of ownership beyond procurement cost, including space, maintenance, utilities, upgradeability, etc. As input to HECRTF charge (3) overall, provide information about how the Federal government can improve the processes of procuring and providing access to HEC systems and tools Example topics: –procurement, requirements specification, user infrastructure, remote access, allocation policies, security, power and cooling costs, maintenance costs, reliability and support

129 Working Group Participants Frank Thames Jim Kasdorf Bill Turnbull Gary Wohl Candace Culhane James Tomkins Charles W. Hayes Sander Lee Charles Slocomb Christopher Jehn Matt Leininger Mark Seager Gary Walter Graciela Narcho Dale Spangenberg Thomas Zacharia Gene Bal Per Nyberg Scott Studham Rene Copeland Paul Muzio Phil Webster Steve Perry Cray Henry Tom Page

130 WG8 Paper Presentations Per Nyberg: Total Cost of Ownership Matt Leininger: A Capacity First Strategy to U.S. HEC Steve Perry: Improving the Process of Procuring HEC Systems Scott Studham: Best Practices for the Procurement of High Performance Computers by the Federal Government

131 Total Cost of Ownership Procurement of Capital Assets –Hardware –Acquisition cost (FTE) –Cost of money for LTOPS –Software licenses Maintenance of Capital Assets Services (Workforce dominated; will inflate yearly) –Application support/porting –System administration –Operations –Security

132 Total Cost of Ownership Facility –Site Preparation –HVAC –Electrical power –Maintenance –Initial construction –Floor space Networks: Local and WAN Training Miscellaneous –Residual value of equipment –Disposal of assets –Insurance

133 Total Cost of Ownership Can “Lost Opportunity” cost be quantified? –Lost research opportunities –Lower productivity due to lack of tools –Codes not optimized for the architecture –Etc. Replacement cost of human resources Difficulty in valuing system software as it impacts productivity (development and production) vice quantitative methods to measure hardware performance

134 Total Cost of Ownership Other Considerations –If costs are to include end-to-end services Output analysis must be added (e.g., visualization) Mass Storage Application Development –Some architectures are harder to program (ASCI: 4-6 years application development; application lifetime: 10-20 years) –H/W architectures last 3-4 years  applications must last over multiple architectures

135 Total Cost of Ownership – Bottom Line Consider ALL applicable factors Some are not obvious Develop a comprehensive cost candidate list

136 Procurement Requirements Specification Evaluation Criteria Improving the Process Contract Type Other Considerations

137 Procurement Requirements Specification –Elucidate the fundamental science requirement –Emphasize quantifiable Functional requirements –Exploit economies of scale –Application development environment –Make optimum use of contract options and modifications –Maximize the use of technical partnerships –Consider flexible delivery dates where applicable (increases vendor flexibility)

138 Procurement (Continued) Requirements Specification (Continued) –Be careful about “mandatory” requirements  prioritize or weight them –Be aware of specifications which may limit competition –Avoid “over specifying” requirements for advanced systems –Fundamental differences in specification depending on the intended use of the system (Natural tension between Capacity vs Capability and general tool vs specific research tool)

139 Procurement (Continued) Evaluation Criteria –For options on long-term contracts, projected “speedup” of applications –Total Cost of Ownership –Use “Real Benchmarks” Be careful not to water down benchmarks too much On the other hand, don’t push so hard that some vendors can’t afford it Other approaches needed for future advanced systems –Use Best Value –Risks

140 Procurement (Continued) Improving the Process –Insure users are heavily involved in the process Eases vendor risk mitigation Users have “decision proximity” –Non-disclosures required by vendors hamstring government personnel after award –Maintain communications between vendors and customers during the acquisition cycle without compromising fairness

141 Procurement (Continued) Improving the Process (Continued) –Consider DARPA HPCS Process for “Advanced Systems” Multiple down-selects R&D like Leads to production system at end –Attempt to maintain acquisition schedule adherence

142 Procurement (Continued) Contract Type –Consider Cost Plus contracts for new technology systems or those with inherent risks (e.g., development contracts) –Leverage existing contracts that fit what you want to do Other Considerations –Don’t have a single acquisition for ALL HEC in government Leads to “Ivory Tower” syndrome and A disconnect from users Bottom Line: don’t over Centralize

143 Procurement (Continued) Other Considerations (Continued) –Inconsistencies in way acquisition regulations are implemented can lead to inefficiencies (vendor issue) –Practices that would revitalize the HEC industry What size of market is needed: At least several hundred million dollars per year per vendor Recognize that HEC vendors must make an acceptable return to survive and invest

144 Accessibility Key Issue: Funding in the requiring Agency to purchase computational capabilities from other sources There are many valid vehicles to providing interagency agreements to provide accessibility (e.g., Interagency MOU’s) Suggested Process: DOE Office of Science and NSF process – open scientific merit evaluated on a project by project basis Current large sources would add x% capability to supply computational capabilities to smaller agencies Implementation suggestion: Consider providing a single POC for Agencies for HEC access (NCO?)


Download ppt "Working Group 1 Enabling Technologies Chair: Sheila Vaidya Vice Chair: Stu Feldman."

Similar presentations


Ads by Google