Parallel Objects: Virtualization & In-Process Components

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

Threads, SMP, and Microkernels
The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Operating Systems High Level View Chapter 1,2. Who is the User? End Users Application Programmers System Programmers Administrators.
1: Operating Systems Overview
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Adaptive MPI Milind A. Bhandarkar
Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Advanced / Other Programming Models Sathish Vadhiyar.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
OSes: 3. OS Structs 1 Operating Systems v Objectives –summarise OSes from several perspectives Certificate Program in Software Development CSE-TC and CSIM,
Processes Introduction to Operating Systems: Module 3.
Operating System Organization Chapter 3 Michelle Grieco.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Group Mission and Approach To enhance Performance and Productivity in programming complex parallel applications –Performance: scalable to thousands of.
General requirements for BES III offline & EF selection software Weidong Li.
1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Introduction to Operating Systems Concepts
Computer System Structures
Applied Operating System Concepts
Current Generation Hypervisor Type 1 Type 2.
The Mach System Sri Ramkrishna.
Self Healing and Dynamic Construction Framework:
Evolution of Operating Systems
Operating Systems (CS 340 D)
Performance Evaluation of Adaptive MPI
Operating Systems (CS 340 D)
Using SCTP to hide latency in MPI programs
Chapter 4: Threads.
Department of Computer Science University of California, Santa Barbara
Component Frameworks:
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
CS703 - Advanced Operating Systems
Operating System Concepts
Operating Systems : Overview
Introduction to Operating Systems
Hybrid Programming with OpenMP and MPI
Presented by Neha Agrawal
MPJ: A Java-based Parallel Computing System
Faucets: Efficient Utilization of Multiple Clusters
Operating Systems : Overview
Operating Systems : Overview
CS510 Operating System Foundations
Department of Computer Science University of California, Santa Barbara
Higher Level Languages on Adaptive Run-Time System
Support for Adaptivity in ARMCI Using Migratable Objects
Operating Systems Structure
Operating System Concepts
Lecture Topics: 11/1 Hand back midterms
Presentation transcript:

Parallel Objects: Virtualization & In-Process Components Orion Sky Lawlor Univ. of Illinois at Urbana-Champaign POHLL-2002

Introduction Parallel Programming is hard: Communication takes time Message startup cost Bandwidth & contention Synchronization, race conditions Parallelism breaks abstractions Flatten data structures Hand off control between modules Harder than serial programming

Motivation Parallel Applications are either: Embarrassingly Parallel Trivial, 1 RA-week effort E.g. Monte Carlo, parameter sweep, SETI@home Communication totally irrelevant to performance

Motivation Parallel Applications are either: Embarrassingly Parallel Excruciatingly Parallel Massive, 1+ RA-year effort E.g. “Pure” MPI codes ≥10k lines Communication, synchronization totally determine performance

Motivation Parallel Applications are either: Embarrassingly Parallel Excruciatingly Parallel “We’ll be done in 6 months…” Several parallel libraries & codes & groups, dynamic & adaptive E.g. Multiphysics simulation

Serial Solution: Abstract! Build layers of software High-level: Libc, C++ STL, … Mid-level: OS Kernel Silently schedule processes Keep CPU busy even when some processes block Allows a process to ignore other processes Low-level: assembler

Parallel Solution: Abstract! Middle layers are missing High-level: ScaLAPACK, POOMA.. Mid-level: ? Kernel Silently schedule components Keep CPU busy even when some components block Allows a component to ignore other components Low-level: MPI

The missing middle layer: Provides dynamic computation and communication overlap, even across separate modules Handles inter-module handoff Pipelines communication Improves cache utilization—smaller components Provides nice layer for advanced features, like process migration

Examples: Multiprogramming

Examples: Pipelining

Middle Layer: Implementation Real OS processes/threads Robust, reliable, implemented High performance penalty No parallel features (migration!) Converse/Charm++ In-process components: efficient Piles of advanced features AMPI, MPI interface to Charm Application Framework

Charm++ Parallel library for Object-Oriented C++ applications Messaging via method calls Communication “proxy” objects Methods called by scheduler System determines who runs next Multiple objects per processor Object migration fully supported Even with broadcasts, reductions

Mapping Work to Processors System implementation User View

AMPI MPI interface, implemented on Charm++ Multiple “virtual processors” per physical processor Implemented as user-level threads Very fast context switching MPI_Recv only blocks virtual processor, not physical All the benefits of Charm++

Application Frameworks Domain-specific interfaces: unstructured grids, structured grids, particle-in-cell Provide natural interface to application scientists (Fortran!) “Encapsulate” communication Built on Charm++ Most popular interfaces to Charm++

Charm++ Features: Migration Automatic load balancing Balance load by migrating objects Application-independent Built-in data collection (cpu, net) Pluggable “strategy” modules Adaptive Job Scheduler Shrink/expand parallel job, by migrating objects Dramatic utilization improvment

Examples: Load Balancing 1. Adaptive Refinement 3. Chunks Migrated 2. Load Balancer Invoked

Examples: Expanding Job

Examples: Virtualization

Conclusions Parallel applications need something like a “kernel” Neutral party to mediate CPU use Significant utilization gains Easy to put good tools in kernel Work migration support Load balancing Consider using Charm++ http://charm.cs.uiuc.edu/