Alternative Computing Technologies

Slides:



Advertisements
Similar presentations
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Advertisements

Computer Abstractions and Technology
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
Chapter 01 Introduction Chapter 0 Introduction. Chapter 02 History of Computing - Early Computers Abacus (ancient orient, still in use) Slide rule (17C,
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
CS 300 – Lecture 2 Intro to Computer Architecture / Assembly Language History.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
COMP25212 SYSTEM ARCHITECTURE Antoniu Pop Jan/Feb 2015COMP25212 Lecture 1.
Design and Analysis of Algorithms
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
F LEX J AVA : Language Support for Safe and Modular Approximate Programming Jongse Park, Hadi Esmaeilzadeh, Xin Zhang, Mayur Naik, William Harris Alternative.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Welcome CSC 480/580 – Digital Logic & Computer Design Term: Winter 2002 Instructor: William T Krieger.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Computer Engineering Rabie A. Ramadan Lecture 1. 2 Welcome Back.
Computer Architecture CPSC 350
Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Morgan Kaufmann Publishers
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
CENTRAL PROCESSING UNIT. CPU Does the actual processing in the computer. A single chip called a microprocessor. Composed of an arithmetic and logic unit.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Lecture 3: Computer Architectures
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Generations of Computing. The Computer Era Begins: The First Generation  1950s: First Generation for hardware and software Vacuum tubes worked as memory.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
CS203 – Advanced Computer Architecture
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
1 A simple parallel algorithm Adding n numbers in parallel.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
Computer Organization Exam Review CS345 David Monismith.
William Stallings Computer Organization and Architecture 6th Edition
Computer Organization and Architecture Lecture 1 : Introduction
Conclusions on CS3014 David Gregg Department of Computer Science
CSC235 Computer Organization & Assembly Language
CS203 – Advanced Computer Architecture
Lynn Choi School of Electrical Engineering
Anshul Gandhi 347, CS building
How do we evaluate computer architectures?
Morgan Kaufmann Publishers
ECE 154A Introduction to Computer Architecture
INTRODUCTION TO MICROPROCESSORS
Architecture & Organization 1
Prof. Gennady Pekhimenko University of Toronto Fall 2017
Computer Architecture and Organization
INTRODUCTION TO MICROPROCESSORS
Computer Architecture CSCE 350
Architecture & Organization 1
BIC 10503: COMPUTER ARCHITECTURE
CS/EE 6810: Computer Architecture
Chapter 5: Computer Systems Organization
Chapter 1 Introduction.
COMS 361 Computer Organization
Chapter 0 Introduction Introduction Chapter 0.
Husky Energy Chair in Oil and Gas Research
Sculptor: Flexible Approximation with
Presentation transcript:

Alternative Computing Technologies CS 8803 ACT Spring 2014 Hadi Esmaeilzadeh hadi@cc.gatech.edu Georgia Institute of Technology Color code: 6699CC, 3399CC

Hadi Esmaeilzadeh From Khoy, Iran

PhD in CSE, University of Washington Doug Burger and Luis Ceze 2013 William Chan Memorial Dissertation Award MSc in CS, The University of Texas at Austin MSc and BSc in ECE, University of Tehran

Research: ACT Lab Alternative Computing Technologies General-purpose approximate computing Bridging neuromorphic and von Neumann models of computing Analog computing System design for online machine learning System design for perpetual devices General-purpose approximate computing Algorithmic transformations Programming models Architecture design Acceleration

Agenda Course organization Who is Hadi Why alternative computing technologies How we became and industry of new possibilities Why we might become an industry of replacement Possible alternative computing technologies Quiz # 1

Objective Explore cutting-edge research on new and alternative paradigms of computing Empower you with higher order critical thinking Improve your technical writing and speaking Innovate in alternative computing technologies

Format Seminar course Mostly your presentations Reading papers Critiquing and discussing the papers Brainstorming about new ideas Developing new technologies Mostly your presentations I will only lecture three times

Grading rubric Component Fraction Class Presentation 30% Class Participation 10% Critiques 25% Final Project 35%

Class presentation 4 points: Clearly presenting the key ideas Objective: Communicate and analyze ideas 4 points: Clearly presenting the key ideas 1 points: Clear, well-organized slides 5 points: Stimulating interesting discussion 1 point bonus

Class participation You have to say something interesting! By 9pm the night before, two comments/questions Your new ideas Critical questions about methodologies and conclusions Why will the paper be cite What you learned Main insights from the papers

Critiques Objective: developing high-order critical thinking Summary (quarter a page) Strengths (1-3 sentences) Weaknesses (1-3 sentences) Analysis I (1 paragraph) Analysis II (1 paragraph) Please read the “The task of the referee by Alan Jay Smith”

Reading material for writing critiques The task of the referee Allen Jay Smith Style: The Basics of Clarity and Grace Joseph M. Williams

Final project Groups of two Options Evaluation Implementing a new idea Extending an existing paper Re-implement a paper Survey at least ten papers Evaluation Implementation Writing Oral presentation

Prerequisites Understand a subset of Do VLSI Circuits Computer architecture Programming Languages Machine learning Do Programming

Agenda Why alternative computing technologies Who Hadi is Course organization Why alternative computing technologies How we became and industry of new possibilities Why we might become and industry of replacement Possible alternative computing technologies Quiz # 1

What has made computing pervasive What has made computing pervasive? What is the backbone of computing industry? I would like to start by asking a question!

Programmability Networking I would like to start by asking a question!

What makes computers programmable?

von Neumann architecture General-purpose processors Components Memory (RAM) Central processing unit (CPU) Control unit Arithmetic logic unit (ALU) Input/output system Memory stores program and data Program instructions execute sequentially

Programmability versus Efficiency

Programmability versus Efficiency General-Purpose Processors FPGAs ASICs GPUs SIMD Units

What is the difference between the computing industry and the paper towel industry? I would like to start by asking a question!

? Industry of replacement Industry of new possibilities 1971 2013 The difference is we buy paper towels when we run out of them and we buy new computers because they get better! We an industry of new possibilities and no and industry of replacement! Industry of new possibilities

Can we continue being an industry of new possibilities? Personalized healthcare Virtual reality Real-time translators

Agenda How we became and industry of new possibilities Who Hadi is Course organization Why alternative computing technologies How we became and industry of new possibilities Why we might become and industry of replacement Possible alternative computing technologies Quiz # 1

Moore’s Law Or, how we became an industry of new possibilities Every 2 Years Double the number of transistors Build higher performance general-purpose processors Make the transistors available to masses Increase performance (1.8×↑) Lower the cost of computing (1.8×↓) But, somebody needs to make those transistors available to the rest of the community. All of this sounds nice and dandy, but what is the catch?

What is the catch? Powering the transistors without melting the chip Moore’s Law The catch is powering exponentially increasing number of transistors without melting the chip down. As this graph shows even thought we have been doubling the number of transistors every year, but we have been increasing the chip power consumption much slower and actually we have already hit the chip power budget limits. W W

Dennard scaling: Doubling the transistors; scale their power down Transistor: 2D Voltage-Controlled Switch Dimensions Voltage Doping Concentrations ×0.7 Area 0.5×↓ In mid 2000 Dennard Scaling broke Capacitance 0.7×↓ Frequency 1.4×↑ Power = Capacitance × Frequency × Voltage2 Power 0.5×↓

Power = Capacitance × Frequency × Voltage2 Dennard scaling broke: Double the transistors; still scale their power down Transistor: 2D Voltage-Controlled Switch Dimensions Voltage Doping Concentrations ×0.7 Area 0.5×↓ So we had to do something in architecture to deal with the power problem! Let’s lower the frequency and make the cores wimpier Before I tell how these events overlap with the evolution of processors let me tell you about a side effect of this breakdown Make all black! Add another big Capacitance 0.7×↓ Frequency 1.4×↑ Power = Capacitance × Frequency × Voltage2 Power 0.5×↓

Dark silicon If you cannot power them, why bother making them? Area 0.5×↓ Power Dark Silicon This transistors that we cannot be powered on at all times have a low utility and cannot be transformed to performance easily! All black not the red x Initially, it is OK to have you can have a bunch of accelerators but it is not scalable solution Fraction of transistors that need to be powered off at all times due to power constraints

Looking back Evolution of processors 2004 2013 Multicore Era Dennard scaling broke Single-core Era 3.4 GHz 3.5 GHz 740 KHz The general consensus was that with exploiting parallelism in the applications and increasing the number of cores we can overcome the transistor scaling trends! We will be able to continue scaling many more generations! There was diminishing return but if the power We didn’t have the power to spend a We have shown how to With a small cores but many of them you can exploit task level parallelism! There is enormous value in improving the performance of single cores and in performance of multicore Our community has largely moved to multicore a larger fration of the community outgh to be investigating other paths Right now an enormous fraction of the researh community is invested in multicore research 1971 2003

Are multicores a long-term solution or just a stopgap?

Agenda Why we might become an industry of replacement Who Hadi is Course organization Why alternative computing technologies How we became and industry of new possibilities Why we might become an industry of replacement Possible alternative computing technologies Quiz # 1

Modeling future multicores Quantify the severity of the problem Predict the performance of best-case multicores From 45 nm to 8 nm Parallel benchmarks Fixed power and area budget We wanted to quantify the severity of the power problem and see how the trends in the transistor level affects the performance gains that we get from The multicore Transistor Scaling Model Single-Core Scaling Model Multicore Esmaeilzadeh, Belem, St. Amant, Sankaralingam, Burger, “Dark Silicon and the End of Multicore Scaling,” ISCA 2011

Transistor scaling model From 45 nm to 8 nm [Dennard, 1974] Historical Scaling 32× ↓ 5.7× ↑ [ITRS, 2010] Optimistic Scaling Model 32× ↓ 8.3× ↓ 3.9× ↑ [VLSI-DAT, 2010] Conservative Scaling Model 32× ↓ 4.5× ↓ 1.3× ↑ Area Power Get rid of the boxes Our transistor scaling models define the area, power, and speed scaling factors for transistors from 45 nm down to 8 nm. To avoid any bias, we used two source that predict how the physical properties of transistors change in the future to derive these factors! One source is the International Technology Roadmap for Semiconductors (ITRS) that sets goals and expectations for the future process technology nodes and is optimistic! The other source is is a published work by Shekhar Borkar that takes a more measured approach in its predictions. As you can see ITRS is significantly more optimistic than the conservative model so we have a good balance in our predictions! Before I move on the Single-core scaling model, I want to draw your attention to a very important phenomenon that is called Dark Silicon. As you can see, since the power of the transistors is not scaling with the same rate as their area We may have a situation in the future technology nodes that we cannot power up all the transistors that we put on the chip! Therefore, we define Dark Silicon to be the percentage of transistors that need to be powered off at all time due to power constraints! Keep this in mind. We will come back to it! Now. Let’s see how we build the Single-core model! Speed

Single-core model (45 nm) The single-core model is a search space that allows our modeling framework to find e best cores to put in the multicore chip to best utilize our power or area budgets! Here, I am showing you a design space. On the x dimension you have a single core’s performance measured in SPECmark! That is what we architects use to measure performance of single cores! And on the y dimension, you have the power that the core will consume to provide that level of performance! For example given a fixed power budget, if your application is very very parallel, it is more optimal to use a low power core so that you can put many of them on the chip and still meet your budget. But, if your application has only a little bit of parallelism, then you rather spend your budget on more powerful core! To derive this model, first we populated this design space by real measurements from modern processors. Then we derived the power-performance Pareto optimal frontier! Power-Performance and Area-Performance Pareto Optimal Frontiers

Single-core scaling model From 45 nm to 8 nm Transistor Speed Scaling Factor Transistor Power Scaling Factor Single-core Scaling Model: Single-core Model × Transistor Scaling Model

Multicore scaling model From 45 nm to 8 nm Exhaustive search of multicore design space (Examine 800 design points for every technology node) Single Core Search Space (Scaled Area and Power Pareto Frontiers) Application Characteristics (% Parallel, % Memory Accesses) Constraints (Area and Power Budget) Microarchitectural Features (Cache and Memory Latencies, CPI, Memory Bandwidth) Multicore Topology (Symmetric, Asymmetric, Dynamic, Composable) Multicore Organization: CPU-Like, GPU-Like (# of HW Threads, Cache Sizes)

Multicore model (Amdahl’s Law) \text{\small Speedup} &=\frac{1}{\frac{1 - f_{Parallel}}{\text{Serial Speedup}}+\frac{f_{Parallel}}{\text{Parallel Speedup}}}\\ \text{\small Serial Speedup}&=\text{\small 1} \times \text{\small Core Performance} \\\small \text{\small Parallel Speedup}&=\text{\small N} \times \text{\small Core Performance}

Dark silicon 40%

Evaluation Setup Applications: Baseline: Constraints: 12 PARSEC Parallel Benchmarks Baseline: The best multicore design available at 45 nm Constraints: Driven from the best multicore design at 45 nm Fixed Power Budget: 125 W Fixed Area Budget: 111 mm2

10 years 1% 17% 36% 40% 51% Dark Silicon 2013 18× 7.9× 3.7× 45 nm The goal is not use more transistors the goal here is to get more performance out of silicon! The dark silicon is not fundamentally bad but it prcludes you from getting perrfromance from previous approaches If we are right and the current direction are not going to provide enough performance gains then what are the some alternative paths forward! When we cannot justify the cost if developing new technology nodes then that mean the end of Moore’s Law! To answer to this question, we conducted a study and projected the performance of multicores in the future technology nodes. I refer you to our ISCA 2011 paper for the details of modeling and experiments! We are going to look at the speedup benefits of multicores over ten years from 45 nm to 8 nm. We are going to set an aspirational goal, which is improving performance by 18x over. This is the historical trend we saw prior to 45nm. To perform these projections, we used highly parallel benchmarks! Let’s look at the first projection that is based on ITRS. ITRS is an industry consortium that set goals and targets for transistor scaling and is optimistic. As you can see instead of 18x, we will achieve less than 8x over ten years. Let’s look a the second projection which based on a published work by Shekhar Borkar that take a more measured approach toward predicting transistor scaling. Shekhar is an Intel Fellow! As you can see instead of 18x, we will achieve less than 4x over ten years. The reality could be worse since we were optimistic in our projections and used highly parallel benchmarks. Let’s look at the transistor utilization These results show that multicore scaling is not likely to sustain the historical performance scaling trend. By the time we get to 8 nm, half of the chip needs to power off at all times. This is when the economics of scaling get into trouble and the slim benefits does not justify paying the cost of developing a new technology node. That is why the end of Moore’s law will be economic not due to reaching to the physical atomic limits! 10 years Dark Silicon 45 nm 32 nm 22 nm 16 nm 11 nm 8 nm 1% 17% 36% 40% 51%

Industry of replacement? Multicores are likely to be a stopgap Not likely to continue the historical trends Do not overcome the transistor scaling trends The performance gap is significantly large Radical departures from conventional approaches are necessary Extract more performance and efficiency from silicon while preserving programmability Explore other sources of computing

Agenda Possible alternative computing technologies Who Hadi is Course organization Why alternative computing technologies How we became and industry of new possibilities Why we might become and industry of replacement Possible alternative computing technologies Quiz # 1

Alternative computing technologies Approximate Computing Analog Computing Biological Computing Neuromorphic Computing Human-based Computing Stochastic Computing I told you the problem now I can rely on my colleagues to solve the problem! That is easy for me but not particularly interesting! By you! This is the path forward that I have been exploring in my thesis! Perpetual Computing

Approximate computing Embracing error Relax the abstraction of near-perfect accuracy in general-purpose computing Allow errors to happen in the computation Run faster Run more efficiently This sounds crazy but let me show some applications that having some amount of error in the computation in entirely acceptable!

Now let me tell you, what I really want to do with these applications

New landscape of computing Personalized and targeted computing

Classes of approximate applications Programs with analog inputs Sensors, scene reconstruction Programs with analog outputs Multimedia Programs with multiple possible answers Web search, machine learning Convergent programs Gradient descent, big data analytics I think there are four classes of approximate applications: There are programs that take analog inputs There are programs that produce analog output There are programs that have multiple possible answers There are convergent programs! And this actually makes up a large body very important applications!

Adding a third dimension Embracing Error Now let’s see what I mean by approximate computing! Error

A fertile ground for innovation Error We can do a lot of things talk about truffle! Joke: You can imagine an extreme case that the error is 100% and the system is just giving random outputs. In this case, we will have infinite speedup and near zero energy dissipation. However, for now, I am not aiming that high !

Approximate computing techniques Same Model From Model to Model Sampling Loop perforation (MIT) Compression Sage (Michigan) Early termination Green (MSR) Replacement Lower voltage Truffle (Rice, UW) von Neumann to Neural NPUs (UW, GaTech)

Analog Computing Computing with Physics http://youtu.be/dAyDi1aa40E http://youtu.be/dAyDi1aa40E

Agenda Quiz # 1 Who Hadi is Course organization Why alternative computing technologies How we became and industry of new possibilities Why we might become an industry of replacement Possible alternative computing technologies Quiz # 1

Alternative Computing Technologies (ACT) Lab