A Common Machine Language for Communication-Exposed Architectures Bill Thies, Michal Karczmarek, Michael Gordon, David Maze and Saman Amarasinghe MIT Laboratory.

Slides:



Advertisements
Similar presentations
CH14 Instruction Level Parallelism and Superscalar Processors
Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Computer Organization and Architecture
ECE669 L3: Design Issues February 5, 2004 ECE 669 Parallel Computer Architecture Lecture 3 Design Issues.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
The Raw Architecture Signal Processing on a Scalable Composable Computation Fabric David Wentzlaff, Michael Taylor, Jason Kim, Jason Miller, Fae Ghodrat,
TU/e Processor Design 5Z0321 Processor Design 5Z032 Computer Systems Overview Chapter 1 Henk Corporaal Eindhoven University of Technology 2011.
Lecture 2-Berkeley RISC Penghui Zhang Guanming Wang Hang Zhang.
Introduction CS 524 – High-Performance Computing.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Instruction Level Parallelism (ILP) Colin Stevens.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Chapter Hardwired vs Microprogrammed Control Multithreading
ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.
Static Translation of Stream Programming to a Parallel System S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz Programming Language Group School.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Chapter 14 Instruction Level Parallelism and Superscalar Processors
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Advanced Computer Architectures
Computer performance.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Low Power Techniques in Processor Design
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi 1. Prerequisites.
3/15/2002CSE Final Remarks Concluding Remarks SOAP.
A Reconfigurable Architecture for Load-Balanced Rendering Graphics Hardware July 31, 2005, Los Angeles, CA Jiawen Chen Michael I. Gordon William Thies.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Pipelining and Parallelism Mark Staveley
RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.
February 12, 1999 Architecture and Circuits: 1 Interconnect-Oriented Architecture and Circuits William J. Dally Computer Systems Laboratory Stanford University.
Dept. of Computer Science - CS6461 Computer Architecture CS6461 – Computer Architecture Fall 2015 Lecture 1 – Introduction Adopted from Professor Stephen.
Computer Science 340 Software Design & Testing Software Architecture.
Michael I. Gordon, William Thies, and Saman Amarasinghe
Michael Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali Meli, Andrew Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman Amarasinghe.
StreamIt on Raw StreamIt Group: Michael Gordon, William Thies, Michal Karczmarek, David Maze, Jasper Lin, Jeremy Wong, Andrew Lamb, Ali S. Meli, Chris.
ISA's, Compilers, and Assembly
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
Linear Analysis and Optimization of Stream Programs Masterworks Presentation Andrew A. Lamb 4/30/2003 Professor Saman Amarasinghe MIT Laboratory for Computer.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Spring 2003CSE P5481 WaveScalar and the WaveCache Steven Swanson Ken Michelson Mark Oskin Tom Anderson Susan Eggers University of Washington.
INTRODUCTION TO MULTISCALAR ARCHITECTURE
Topics to be covered Instruction Execution Characteristics
SOFTWARE DESIGN AND ARCHITECTURE
A Common Machine Language for Communication-Exposed Architectures
Architecture & Organization 1
5.2 Eleven Advanced Optimizations of Cache Performance
Architecture & Organization 1
Computer Architecture Lecture 4 17th May, 2006
Coe818 Advanced Computer Architecture
EE 4xx: Computer Architecture and Performance Programming
COMS 361 Computer Organization
Husky Energy Chair in Oil and Gas Research
Presentation transcript:

A Common Machine Language for Communication-Exposed Architectures Bill Thies, Michal Karczmarek, Michael Gordon, David Maze and Saman Amarasinghe MIT Laboratory for Computer Science HPCA Work-in-Progress Session, February 2002

A Common Machine Language for Communication-Exposed Architectures Language Designers Have Been Ignoring Architects Bill Thies, Michal Karczmarek, Michael Gordon, David Maze and Saman Amarasinghe MIT Laboratory for Computer Science HPCA Work-in-Progress Session, February 2002

Back in The Good Old Days… Architecture: simple von-Neumann “Common Machine Language”: C –Abstracts away idiosyncratic differences Instruction set Pipeline depth Cache configuration Register layout –Exposes common properties Program counter Arithmetic instructions Monolithic memory –Efficient implementations on many machines –Portable: everyone uses it

Programming Language Evolution

Two choices: Develop cool architecture with complicated, ad-hoc language Bend over backwards to support old languages like C/C++ Two choices: Develop cool architecture with complicated, ad-hoc language Bend over backwards to support old languages like C/C++ Languages Have Not Kept Up Modern architecture Two choices: Develop cool architecture with complicated, ad-hoc language Bend over backwards to support old languages like C/C++ Cvon-Neumann machine

Evidence: Superscalars Huge effort into improving performance of sequential instruction stream Complexity has grown unmanageable Even with 1 billion transistors on a chip, what more can be done? Renaming Out-of-Order Execution Pipelining Speculative Execution Prefetching Branch Prediction Value Prediction

A New Era of Architectures Facing new design parameters –Transistors are in excess –Wire delays will dominate “Communication-exposed” architectures –Explicitly parallel hardware –Compiler-controlled communication –e.g. RAW, Smart Memories, TRIPS, Imagine, the Grid Processor, Blue Gene

Should expose shared properties: –Explicit parallelism (multiple program counters) –Regular communication patterns –Distributed memory banks –No global clock A New Common Machine Language Should expose shared properties: –Explicit parallelism (multiple program counters) –Regular communication patterns Should hide small differences: –Granularity of computation elements –Topology of network interconnect –Interface to memory units C does not qualify! 

The StreamIt Language A high-level language for communication- exposed architectures Computation is expressed as a hierarchical composition of independent filters

The StreamIt Language A high-level language for communication- exposed architectures Computation is expressed as a hierarchical composition of independent filters Features: –High-bandwidth channels –Low-bandwidth messaging –Re-initialization

The StreamIt Compiler We have a compiler for a uniprocessor –Performs comparably to C++ runtime system

The StreamIt Compiler We have a compiler for a uniprocessor –Performs comparably to C++ runtime system Working on a backend for RAW –Fission and fusion transformations –Many optimizations in progress

The StreamIt Compiler We have a compiler for a uniprocessor –Performs comparably to C++ runtime system Working on a backend for RAW –Fission and fusion transformations –Many optimizations in progress Goal: High-performance, portable language for communication-exposed architectures

For more information, see: Thank you!