4/27/2000 A Framework for Evaluating Programming Models for Embedded CMP Systems Niraj Shah Mel Tsai CS252 Final Project.

Slides:



Advertisements
Similar presentations
An Overview Of Virtual Machine Architectures Ross Rosemark.
Advertisements

Chapt.2 Machine Architecture Impact of languages –Support – faster, more secure Primitive Operations –e.g. nested subroutine calls »Subroutines implemented.
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
Our approach! 6.9% Perfect L2 cache (hit rate 100% ) 1MB L2 cache Cholesky 47% speedup BASE: All cores are used to execute the application-threads. PB-GS(PB-LS)
A Complete GPU Compute Architecture by NVIDIA Tamal Saha, Abhishek Rawat, Minh Le {ts4rq, ar8eb,
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Effects of Virtual Cache Aliasing on the Performance of the NetBSD Operating System Rafal Boni CS 535 Project Presentation.
CS 31003: Compilers ANIRUDDHA GUPTA 11CS10004 G2 CLASS DATE : 24/07/2013.
1 Starting a Program The 4 stages that take a C++ program (or any high-level programming language) and execute it in internal memory are: Compiler - C++
Code Transformations to Improve Memory Parallelism Vijay S. Pai and Sarita Adve MICRO-32, 1999.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Flash: An efficient and portable Web server Authors: Vivek S. Pai, Peter Druschel, Willy Zwaenepoel Presented at the Usenix Technical Conference, June.
Reference: Message Passing Fundamentals.
Introduction CS 524 – High-Performance Computing.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Architectural Support for Operating Systems. Announcements Most office hours are finalized Assignments up every Wednesday, due next week CS 415 section.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Haoyuan Li CS 6410 Fall /15/2009.  U-Net: A User-Level Network Interface for Parallel and Distributed Computing ◦ Thorsten von Eicken, Anindya.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Modified from Silberschatz, Galvin and Gagne Lecture 15 Chapter 8: Main Memory.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
1-1 Embedded Software Development Tools and Processes Hardware & Software Hardware – Host development system Software – Compilers, simulators etc. Target.
Exokernel: An Operating System Architecture for Application-Level Resource Management Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr. M.I.T.
CS294-6 Reconfigurable Computing Day 3 September 1, 1998 Requirements for Computing Devices.
CS533 Concepts of Operating Systems Class 6 Micro-kernels Extensibility via Hardware or Software Based Protection.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 MESCAL Application Modeling and Mapping: Warpath Andrew Mihal and the MESCAL team UC Berkeley.
Educational Computer Architecture Experimentation Tool Dr. Abdelhafid Bouhraoua.
Chapter 6 - Implementing Processes, Threads and Resources Kris Hansen Shelby Davis Jeffery Brass 3/7/05 & 3/9/05 Kris Hansen Shelby Davis Jeffery Brass.
CS533 Concepts of OS Class 16 ExoKernel by Constantia Tryman.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.
Parallel Architectures
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
ISA 562 Internet Security Theory & Practice
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
CS-2710 Computer Organization Dr. Mark L. Hornick web: faculty-web.msoe.edu/hornick – CS-2710 info syllabus, homework, labs… –
A genda for Today What is memory management Source code to execution Address binding Logical and physical address spaces Dynamic loading, dynamic linking,
Operating System Support for Virtual Machines Samuel T. King, George W. Dunlap,Peter M.Chen Presented By, Rajesh 1 References [1] Virtual Machines: Supporting.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
A Methodology for Architecture Exploration of heterogeneous Signal Processing Systems Paul Lieverse, Pieter van der Wolf, Ed Deprettere, Kees Vissers.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Processes Introduction to Operating Systems: Module 3.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
The Mach System Abraham Silberschatz, Peter Baer Galvin, Greg Gagne Presentation By: Agnimitra Roy.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
M. Mateen Yaqoob The University of Lahore Spring 2014.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
System Components ● There are three main protected modules of the System  The Hardware Abstraction Layer ● A virtual machine to configure all devices.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Parallel Computing Presented by Justin Reschke
Background Computer System Architectures Computer System Software.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
1 Security Architecture and Designs  Security Architecture Description and benefits  Definition of Trusted Computing Base (TCB)  System level and Enterprise.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Virtualization Neependra Khare
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
CMSC 611: Advanced Computer Architecture
A Quantitative Analysis of Stream Algorithms on Raw Fabrics
Implementation of Efficient Check-pointing and Restart on CPU - GPU
Section 1: Introduction to Simics
Presentation transcript:

4/27/2000 A Framework for Evaluating Programming Models for Embedded CMP Systems Niraj Shah Mel Tsai CS252 Final Project

CS252 Final Project (Spring 2000) Overview Motivation Target Architectures Programming Model Software Environment Applications Preliminary Results and Conclusions Future Work

CS252 Final Project (Spring 2000) Motivation Embedded multiprocessor systems for are different than their GP counterparts Interprocess communication can be very cheap Communication architecture tailored to application Desirable not to have a heavy OS or large library to handle communication Efficiently programming these systems in an HLL is an absolute necessity How do we evaluate the machine abstraction that is presented to the programmer?

CS252 Final Project (Spring 2000) Target Architectures Instructions to perform communication operations Some simplifying assumptions PE Thanks Scott Instruction Cache FU SFU Register File Memory System

CS252 Final Project (Spring 2000) Programming Model Language specification is a simplified subset of MPI Single Program Multiple Data (SPMD) execution model Separate address spaces for each process Bind each process to a distinct PE Communication primitives Blocking/Non-blocking Sends & Receives MPI Programming Model MPI_Send(data_length, *data_location, type, destination_PE, tag_identifier, MPI_COMM_WORLD); Mescal Programming Model Mescal_Send(data_length, *data_location, destination_PE); How do we evaluate the programming model?

CS252 Final Project (Spring 2000) Software Environment Augmented IMPACT framework (single PE) to target CMPs Compiler Generates optimized code for each PE Understands our programming model Generates code to use our hardware

CS252 Final Project (Spring 2000) Trace Simulator *.X_im_p emulator generator *.c + probes gcc simulator trace data “probed” executable simulation data machine description input data *.c + MPI + probes MPI C compiler emulator generator MP simulator trace data simulation data machine description “probed” executable *.X_im_p

CS252 Final Project (Spring 2000) Application - JPEG JPEG encode/decode splitter encodedecode combinerencodedecode encodedecode process 1 process 2 process 3 process 4 process 5

CS252 Final Project (Spring 2000) Application – Network Routing Based on MIT Click Modulator Router Translated to C (from C++) by the MESCAL team  CRACK (Click Rapidly Adapted to C-Kode) Built router kernel from CRACK “Elements”

CS252 Final Project (Spring 2000) CRACK Parallelized CRACK InfiniteSource CheckIP- Header GetIPAddress Lookup- IPRoute Port % 48.8% 49.3% 0% process 1 process 2 process 3 Idle cycle times Port 1 Port n … process 4 process 5 88%

CS252 Final Project (Spring 2000) Preliminary Conclusions Scheme to better parallelize (load- balance) applications Need way of overlapping computation and communication (i.e. non-blocking) Extensible framework is useful for exploring different programming models Allows for quantitative analysis of the effect of communication primitives

CS252 Final Project (Spring 2000) Future Work Get more detailed numbers from parallelized CRACK Implement non-blocking sends and receives Map multiple processes to a single PE Performance evaluation of different programming models for an application set Support dynamic process creation Incorporate microarchitectural simulation of communication instructions