For Massively Parallel Computation The Chaotic State of the Art

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Distributed Systems CS
Chorus and other Microkernels Presented by: Jonathan Tanner and Brian Doyle Articles By: Jon Udell Peter D. Varhol Dick Pountain.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Background Computer System Architectures Computer System Software.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Introduction CS 524 – High-Performance Computing.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Memory Management 2010.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Hossein Bastan Isfahan University of Technology 1/23.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Computer System Architectures Computer System Software
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Synchronization and Communication in the T3E Multiprocessor.
Winter 2015 COMP 2130 Introduction to Computer Systems Computing Science Thompson Rivers University Introduction and Overview.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
1 Comp 104: Operating Systems Concepts Java Development and Run-Time Store Organisation.
Introduction and Overview Summer 2014 COMP 2130 Introduction to Computer Systems Computing Science Thompson Rivers University.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Heterogeneous Multikernel OS Yauhen Klimiankou BSUIR
Page 110/20/2015 CSE 30341: Operating Systems Principles So far…  Page  Fixed size pages solve page allocation problem (and external fragmentation) 
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 7 OS System Structure.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 2 (Part II) Operating System Overview.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Programmability Hiroshi Nakashima Thomas Sterling.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
Parallel Programming Models
Distributed Shared Memory
Welcome: Intel Multicore Research Conference
SHARED MEMORY PROGRAMMING WITH OpenMP
Constructing a system with multiple computers or processors
Multi-Processing in High Performance Computer Architecture:
University of Technology
Real-time Software Design
Implementation of IDEA on a Reconfigurable Computer
Chapter 4: Threads.
Chapter 4: Threads.
High Performance Computing
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Hybrid Programming with OpenMP and MPI
MPJ: A Java-based Parallel Computing System
EE 4xx: Computer Architecture and Performance Programming
Chapter 4: Threads & Concurrency
Types of Parallel Computers
Presentation transcript:

For Massively Parallel Computation The Chaotic State of the Art Programming Models For Massively Parallel Computation The Chaotic State of the Art Jack B. Dennis MIT Computer Science and Artificial Intelligence Laboratory

What is a Programming Model ? Application Code Program Libraries OS Services Compilers. Iterface Defined by the Programming Model Runtime Software - Resource Management Hardware: Processors, Memory, Networks

P P Multicore Chip Local Memory More Memory Levels ? How Should On-Chip Memory be Managed A: Scratchpad B: Cache C: Virtual Managed by user software Managed by hardware Managed by OS / RT software

System Architecture Many multicore chips Memory hierarchy Goals: Performance Energy efficiency Programmability - Composablity, Reuse, Portability

Parallel Programming Models Popular Models: OpenMP, Intel TBB, Cilk Limited to Shared Memory System Configurations MPI: The Message Passing Interface Intended for Distributed Memory Systems (Weakly) Scalable

The MPI Programming Model Ideal View: Arbitrary interconnection of Program Modules communicating via message passing Practical View: The Bulk Synchronous Parallel model of Valiant. Why? Efficient utilization of processors and memory. MPI has become the de facto programming model for high performance computing.

The Big Question Users of high performance computing are demanding systems designed to support applications with changing needs for Data and Program Objects Memory and Processing Resources How can a massively parallel system be designed to provide dynamic resource management, and still have high performance and energy efficiency ?

Codelets and Data Blocks I The unit for scheduling processing resources. Contains a block of instructions executed to completion. Activated by availability of input data objects. Signals successor codelets when results are ready.

Codelets and Data Blocks II The unit for dynamic memory allocation. Achieving flexibility of Resource Management: Unique Global Identifiers for codelets and data blocks System supported virtual address space of Global Identifiers. System supported creation and recovery of data blocks. (Garbage Collection).

Some Ideas Write once data blocks No cache consistency issues Enables reference count GC Fixed size data blocks Simplifies memory management With these additions, the programming model becomes pure Lisp (Scheme) with large cells

Resilience A crucial furture problem due to tinier feature size: Inceasing susceptibilty to “soft” failure (cosmic rays and latent radiation). More intermittent faults from manufacturing variability. Dynamic Resource Management and use of Codelets make it easier to add fault-detect/retry mechanisms to mask faults at the codelet level.