Microprocessor Performance, Phase II Yale Patt The University of Texas at Austin STAMATIS Symposium TU-Delft September 28, 2007.

Slides:



Advertisements
Similar presentations
The Implications of Multi-core. What I want to do today Given that everyone is heralding Multi-core –Is it really the Holy Grail? –Will it cure cancer?
Advertisements

CS 6290 Instruction Level Parallelism. Instruction Level Parallelism (ILP) Basic idea: Execute several instructions in parallel We already do pipelining…
Avenues for Research The Microarchitecture of Future Microprocessors.
Pentium microprocessors CAS 133 – Basic Computer Skills/MS Office CIS 120 – Computer Concepts I Russ Erdman.
Microarchitecture is Dead AND We must have a multicore interface that does not require programmers to understand what is going on underneath.
Yale Patt The University of Texas at Austin World University Presidents’ Symposium University of Belgrade April 4, 2009 Future Microprocessors: What must.
Yale Patt The University of Texas at Austin Universidade de Brasilia Brasilia, DF -- Brazil August 12, 2009 Future Microprocessors: Multi-core, Multi-nonsense,
RISC vs CISC Yuan Wei Bin Huang Amit K. Naidu. Introduction - RISC and CISC Boundaries have blurred. Modern CPUs Utilize features of both. The Manufacturing.
Computer Architecture & Organization
Development of a track trigger based on parallel architectures Felice Pantaleo PH-CMG-CO (University of Hamburg) Felice Pantaleo PH-CMG-CO (University.
ECE 109 / CSCI 255 What’s next.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Multicore & Parallel Processing P&H Chapter ,
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Instruction Level Parallelism (ILP) Colin Stevens.
ISBN Chapter 1 Preliminaries. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.1-2 Chapter 1 Topics Motivation Programming Domains.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
ISBN Lecture 01 Preliminaries. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.1-2 Lecture 01 Topics Motivation Programming.
ISBN Chapter 1 Topics Motivation Programming Domains Language Evaluation Criteria Influences on Language Design Language Categories Language.
1 Chapter 4 The Central Processing Unit and Memory.
Mechanisms: Run-time and Compile-time. Outline Agents of Evolution Run-time Branch prediction Trace Cache SMT, SSMT The memory problem (L2 misses) Compile-time.
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
2007 Sept 06SYSC 2001* - Fall SYSC2001-Ch1.ppt1 Computer Architecture & Organization  Instruction set, number of bits used for data representation,
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
To Flip or Not to Flip David Kaeli Northeastern University Boston, MA.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
Led the WWII research group that broke the code for the Enigma machine proposed a simple abstract universal machine model for defining computability devised.
My major mantra: We Must Break the Layers. Algorithm Program ISA (Instruction Set Arch)‏ Microarchitecture Circuits Problem Electrons.
Yale Patt The University of Texas at Austin University of California, Irvine March 2, 2012 High Performance in the Multi-core Era: The role of the Transformation.
Η Μικροαρχιτεκτονική Πέθανε. Ζήτω η Μικροαρχιτεκτονική! Η Μικροαρχιτεκτονική Πέθανε. Ζήτω η Μικροαρχιτεκτονική! Christos Kozyrakis Stanford University.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
3/15/2002CSE Final Remarks Concluding Remarks SOAP.
Microprocessor Microarchitecture Instruction Fetch Lynn Choi Dept. Of Computer and Electronics Engineering.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
Rethinking Computer Architecture Wen-mei Hwu University of Illinois, Urbana-Champaign Celebrating September 19, 2014.
Introduction to Computer Organization
Alpha Supplement CS 740 Oct. 14, 1998
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
A Common Machine Language for Communication-Exposed Architectures Bill Thies, Michal Karczmarek, Michael Gordon, David Maze and Saman Amarasinghe MIT Laboratory.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Multi-core, Mega-nonsense. Will multicore cure cancer? Given that multicore is a reality –…and we have quickly jumped from one core to 2 to 4 to 8 –It.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
CC311 Computer Architecture Chapter 1 Computer Abstraction & Technology.
15-740/ Computer Architecture Lecture 2: ISA, Tradeoffs, Performance Prof. Onur Mutlu Carnegie Mellon University.
Chapter 5 Memory Hierarchy Design. 2 Many Levels in Memory Hierarchy Pipeline registers Register file 1st-level cache (on-chip) 2nd-level cache (on same.
Spring 2003CSE P5481 WaveScalar and the WaveCache Steven Swanson Ken Michelson Mark Oskin Tom Anderson Susan Eggers University of Washington.
CS 352H: Computer Systems Architecture
??? ple r B Amulya Sai EDM14b005 What is simple scalar?? Simple scalar is an open source computer architecture simulator developed by Todd.
Computer Architecture Education: Black Boxes Proved Harmful
15-740/ Computer Architecture Lecture 3: Performance
A Common Machine Language for Communication-Exposed Architectures
CS161 – Design and Architecture of Computer Systems
5.2 Eleven Advanced Optimizations of Cache Performance
Hyperthreading Technology
EE 193: Parallel Computing
The University of Texas at Austin
Introduction, Focus, Overview
Morgan Kaufmann Publishers
Levels of Parallelism within a Single Processor
Coe818 Advanced Computer Architecture
The University of Texas at Austin
Overview Prof. Eric Rotenberg
Mattan Erez The University of Texas at Austin
Introduction, Focus, Overview
IA-64 Vincent D. Capaccio.
Presentation transcript:

Microprocessor Performance, Phase II Yale Patt The University of Texas at Austin STAMATIS Symposium TU-Delft September 28, 2007

Algorithm Program ISA (Instruction Set Arch) Microarchitecture Circuits Problem Electrons

Up to Now (Phase I) Maintain the artificial walls between layers Keeps the abstraction layers secure –Makes for a better comfort zone (Mostly) Improve the Microarchitecture –Pipelining –Caches –Branch Prediction, Speculative Execution –Out-of-order Execution –Trace Cache

Up to Now (Phase I) BUT, even now: Makes for a poorly utilized chip Future process technology won’t allow it –Too many cores (bandwidth problem) –Too many transistors (power, energy problem)

The Answer: Break the Layers (We already have in limited cases) Pragmas in the Language The algorithm, the compiler, & the microarchitecture –The Alpha experiment The Refrigerator X + Superscalar

Examples Compiler, Microarchitecture –Multiple levels of cache –Block-structured ISA –Part by compiler, part by uarch –Fast track, slow track Algorithm, Compiler, Microarchitecture –X + superscalar – the Refrigerator –Niagara X / Pentium Y Microarchitecture, Circuits –Verification Hooks –Internal fault tolerance

Problems: Computer People work within their layers Too few understand outside their layer Multiple cores: people think sequential

Yale Patt The University of Texas at Austin STAMATIS Symposium TU-Delft September 28, 2007 Microprocessor Performance, Phase II The Future of Computing: Education

At least two problems

Problem 1: “Abstraction” Misunderstood Taxi to the airport The Scheme Chip (Deeper understanding) Sorting (choices) Microsoft developers (Deeper understanding) Wireless networks (Layers revisited)

Problem 2: Thinking in Parallel is Hard Perhaps: Thinking is Hard What if the Programmer understood shared memory, and Synchronizing Primitives –Would it matter? Some simple programs for freshmen –Pipelining (aka Streaming) –Factorial –Parallel Search

Top-down vs. Bottom-up –Learning vs. Designing Applications can drive Microarchitecture –IF we can speak the same languages Thousands of cores, Special function units –Ability to power on/off under program control –Algorithm, Compiler, Microarchitecture talking to each other We have an Education Problem We have an Opportunity

On Education Object-oriented FIRST does not work –Students do not get it –Memorizing isn’t Learning (or, Understanding) Motivated bottom up –Students build on what they already know –Continually raises the level of Abstraction Don’t be afraid to work the student hard –Students can digest serious meat –Students won’t complain if they are learning No substitute for: Design, Debug, and Fix …by themselves

Algorithm Program ISA (Instruction Set Arch) Microarchitecture Circuits Problem Electrons