Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Slides:



Advertisements
Similar presentations
Service Oriented Architecture For Network Enabled Capability Duncan RussellJie Xu School of Computing University of Leeds.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Priority Research Direction: Portable de facto standard software frameworks Key challenges Establish forums for multi-institutional discussions. Define.
GENI: Global Environment for Networking Innovations Larry Landweber Senior Advisor NSF:CISE Joint Techs Madison, WI July 17, 2006.
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Sponsored by the U.S. Department of Defense © 2005 by Carnegie Mellon University 1 Pittsburgh, PA Dennis Smith, David Carney and Ed Morris DEAS.
The Multikernel: A new OS architecture for scalable multicore systems Andrew Baumann et al CS530 Graduate Operating System Presented by.
1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Ensuring Non-Functional Properties. What Is an NFP?  A software system’s non-functional property (NFP) is a constraint on the manner in which the system.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Chapter 13 Embedded Systems
Figure 1.1 Interaction between applications and the operating system.
Experience with K42, an open- source, Linux-compatible, scalable operation-system kernel IBM SYSTEM JOURNAL, VOL 44 NO 2, 2005 J. Appovoo 、 M. Auslander.
Power is Leading Design Constraint Direct Impacts of Power Management – IDC: Server 2% of US energy consumption and growing exponentially HPC cluster market.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
CERN openlab Open Day 10 June 2015 KL Yong Sergio Ruocco Data Center Technologies Division Speeding-up Large-Scale Storage with Non-Volatile Memory.
Efficient Hardware dependant Software (HdS) Generation using SW Development Platforms Frédéric ROUSSEAU CASTNESS‘07 Computer Architectures and Software.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
Priority Research Direction Key challenges Fault oblivious, Error tolerant software Hybrid and hierarchical based algorithms (eg linear algebra split across.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
UNEP Training Resource ManualTopic 15 Slide 1 Using EIA to move towards sustainability F EIA is a foundation tool F EIA is a tried and tested process F.
Objective 1.2 Cloud Computing, Internet of Services and Advanced Software Engineering Arian Zwegers European Commission Information Society and Media Directorate.
Priority Research Direction (use one slide for each) Key challenges -Fault understanding (RAS), modeling, prediction -Fault isolation/confinement + local.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Architecting Web Services Unit – II – PART - III.
Panel Three - Small Businesses: Sustaining and Growing a Market Presence Open Interfaces and Market Penetration Protecting Intellectual Innovation and.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted.
Workshop on the Future of Scientific Workflows Break Out #2: Workflow System Design Moderators Chris Carothers (RPI), Doug Thain (ND)
Heterogeneous Multikernel OS Yauhen Klimiankou BSUIR
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
07/09/04 Johan Muskens ( TU/e Computer Science, System Architecture and Networking.
Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,
Group 3: Architectural Design for Enhancing Programmability Dean Tullsen, Josep Torrellas, Luis Ceze, Mark Hill, Onur Mutlu, Sampath Kannan, Sarita Adve,
March 2004 At A Glance NASA’s GSFC GMSEC architecture provides a scalable, extensible ground and flight system approach for future missions. Benefits Simplifies.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
Marv Adams Chief Information Officer November 29, 2001.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Breakout Group: Debugging David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan.
Programmability Hiroshi Nakashima Thomas Sterling.
Internet of Things. IoT Novel paradigm – Rapidly gaining ground in the wireless scenario Basic idea – Pervasive presence around us a variety of things.
Interconnection network network interface and a case study.
MIDORI The Windows Killer!! by- Sagar R. Yeole Under the guidance of- Prof. T. A. Chavan.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Tackling I/O Issues 1 David Race 16 March 2010.
Priority Research Direction (use one slide for each) Key challenges What will you do to address the challenges?Brief overview of the barriers and gaps.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Design and Planning Tools John Grosh Lawrence Livermore National Laboratory April 2016.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Architecting Web Services
For Massively Parallel Computation The Chaotic State of the Art
FET Plans FET - Proactive 1.
Architecting Web Services
Structural Simulation Toolkit / Gem5 Integration
Scaling for the Future Katherine Yelick U.C. Berkeley, EECS
Power is Leading Design Constraint
Priority Research Direction (use one slide for each)
Fundamentals of Human Computer Interaction (HCI)
Priority Research Direction (use one slide for each)
Presentation transcript:

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be managed New and evolving programming models – Shifting emphasis from managing cycles to managing data – Programming models require more access to resource management decisions – Hybrid/Mixed programming models (composing applications) Node and Memory structures – On-node RAM, DRAM, Flash – Stacked memory (performance implications for different access patterns) – Explicit cache/hierarchy management – On-node interconnect – Heterogenous cores – On-node power management Global structures – Global address space – Integration of collectives, esp synchronization Resilience (soft errors and damaged cores) HPC OS Sustainability Increasing importance and complexity of resource management

Alternate R&D Strategies Evolve an existing OS – Linux, Plan 9, IBM CNK, Kitten Start with an empty emacs buffer Steal components from existing operating systems Partitioning resources – independent management within a partition – Composibility Collective/Global OS – Global address space? It’s time to define the winner

Research Agenda HPC Community OS – Define basic structure – Individual groups work on components Expose management of critical resources Simulation to evaluate scalability of resource management strategies Enable co-design of hardware to support resource management Define and implement OS mechanisms that will enable global, autonomic runtime systems

Priority Research Direction: Community OS Framework for HPC Systems Key challenges 1.Develop an OS framework specific to the needs of HPC 2.Open system architecture that exposes the management of critical resources 3.Empower developers of libraries and runtime systems 1.HPC applications have unique resource management needs (e.g., memory layout) 2.Anticipated rapid evolution/revolution in architectures and programming models 3.Limited ability to innovate in existing commodity operating systems 4.Sustainability of HPC OS is difficult 1.Context for individual innovation and contribution 2.Common foundation for libraries and runtime environments 1.This will enable full access to hardware resources 2.Timeframe: 2-3 years Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

Priority Research Direction: Scalable System Simulation Key challenges 1.Develop a scalable, full system simulation capability 2.Address multi-scale challenges 3.Adapt techniques that have been used in other branches of computational science 4.Develop common interfaces between simulators 1.Inability to conduct “apples to apples” comparisons in scalable resource management 2.Evolution / revolution in new systems 3.Wide variety of existing simulators 1.Ability to evaluate resource management mechanisms and policies at scale 2.Enable architecture/OS co-design 1.Critical for the OS research/development community 2.Important for runtime community 3.Timeframe: 2-4 years Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

Priority Research Direction: Open System APIs Key challenges 1.Develop community based APIs to expose critical resources 2.Develop prototype runtime environments for common programming models 1.Communication management 2.Thread management 3.Memory management 4.Power management 5.Resilience (fault/failure isolation/management) 1.Provides a fixed point for innovation in API implementation and innovation in the implementation of runtimes (hourglass principle) 2.Differentiation based on performance, not functionality 1.Critical for supporting the development of new programming models 2.Critical for enabling the development of new architectures 3.Timeframe: 3 to 8 years Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

4.1 Operating Systems A Community HPC OS Next Generation Interconnect API Community OS Framework Robust, Scalable System Simulation APIs for energy management API for node resilience Autonomic runtime systems Runtime Environments enabled Prototype implementation of OS Framework