On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

Slides:



Advertisements
Similar presentations
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.
Advertisements

Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
AP STUDY SESSION 2.
1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2003 Chapter 11 Ethernet Evolution: Fast and Gigabit Ethernet.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) Customer Supplier Customer authorizes Enrollment ( )
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Conversion Problems 3.3.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Discrete Mathematical Structures: Theory and Applications
Bellwork Do the following problem on a ½ sheet of paper and turn in.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Subtraction: Adding UP
Equal or Not. Equal or Not
Slippery Slope
: 3 00.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Converting a Fraction to %
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
Presentation transcript:

On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs shown

2 Presentation Setup main( ) { signal(SIGINT, welcome); while (slides( ) && time( )) { talk( ); }

3 Why Do We Care? Toasted CPU: about 2 sec after removing cooler. (Toms Hardware Guide)

4 Power and Power Density Data from Fred Polack, Intel, MICRO 32 Assuming constant die size, no power management

5 Power Density Distribution Chip surface Data from Fred Polack, Intel, MICRO 32

6 Outline Introduction Power and Energy Efficiency –data from Bob Brodersen, Berkeley wireless group Synchronous Hardware Efficiency Asynchronous Hardware Efficiency ASH Efficiency Conclusions

7 Energy Efficiency Metric How much computing can we can do......with a finite energy source?

8 Some Arithmetic

9 Energy and Power Efficiency The energy efficiency metric for energy constrained applications (OP/nJ) = thermal (power) considerations when maximizing throughput (MOPS/mW). JouleWatt OP/nJ = MOPS/mW

10 ISSCC Chips (.18 m-.25 m) #YearDescription #YearDescription 11997S/ Graphics 22000PPC (SOI) Multimedia 31999G Multimedia 42000G Mpg decoder 52000Alpha Multimedia 61998P Encryption Processor 71998Alpha Hearing Aid Processor 81999PPC FIR for Disk Read Head 91998StrongArm MPEG Encoder Comm a Baseband MicroprocessorsDedicatedDSPs #YearDescription

11 Energy Efficiency (MOPS/mW or OP/nJ) 3 orders of magnitude!

12 Outline Introduction Power and Energy Efficiency Synchronous Hardware Efficiency Asynchronous Hardware Efficiency ASH Efficiency Conclusions

13 Explaining the Difference Operations per second: MOPS = f clk £ N op Operations per clock Chip area per operation Efficiency: MOPS/P chip = (f clk £ N op )/ (A chip £ C sw £ V dd 2 £ f clk ) =1/(A op £ C sw £ V dd 2 ) Normalized switched capacitancePower: P chip = A chip £ C sw £ V dd 2 £ f clk

14 Supply Voltage, V dd MOPS/P chip =1/(A op £ C sw £ V dd 2 )

15 Normalized Switched Capacitance, C sw MOPS/P chip =1/(A op £ C sw £ V dd 2 ) 3x

16 Area per operation, A op A op = A chip /N op MOPS/P chip =1/(A op £ C sw £ V dd 2 ) AHA!

17 Focusing In PPC NEC DSP a

18 P: MOPS/mW=.13 Useful arithmetic N op = 2 (two ways) f clock = 450 MHz ) 900 MIPS A op = A chip /2= 42mm 2 Power = 7 Watts

19 DSP: MOPS/mW=7 4 processors £ 4 ops each N op = 16 f clock = 50 MHz ) 800 MOPS A op = A chip /16= 5.3mm 2 Power = 110 mW

20 Dedicated Design: MOPS/mW=200 N op = 96 f clock = 25 MHz ) 2400 MOPS A op = 5.4 mm 2 /96 =.15 mm 2 Power = 12 mW Complex MAC = 8 ops Fully parallel mapping of adaptive correlator algorithm.

21 Memory is More Power-Efficient Hint: use on-chip caches

22 Energy Distribution in P useful (includes local clock)

23 Efficiency and Performance V dd + ! f clock +, MOPS + Power + MOPS/mW * Better metric: Energy £ delay –Roughly independent of V dd

24 Efficiency and Technology MOPS / mW feature size [µ] hardwired microprocessors [T. Claasen, ISSCC 1999] DSP

25 How Low Can You Go? Energy required to compute is ZERO If computation is quasistatic......and no information is destroyed (reversible) Ops/nJ ! 1 Rolf Landauer

26 Outline Introduction Power and Energy Efficiency Synchronous Hardware Efficiency Asynchronous Hardware Efficiency ASH Efficiency Conclusions

27 Lutonium Performance Asynchronous microcontroller Designed and implemented at Caltech 0.18 m technology 1.8V supply, 0.4V/0.5V th 200 MIPS 1.8 ops/nJ DSP-like Alain Martin

28 Efficiency and Supply Voltage

29 Async Processor Breakdown useful

30 Outline Introduction Power and Energy Efficiency Synchronous Hardware Efficiency Asynchronous Hardware Efficiency ASH Efficiency Conclusions

31 Application-Specific Hardware C code Compiler for Application Specific Hardware Asynchronous Circuits Memory

32 Tool-Flow C CASH core Verilog back-end Synopsys, Cadence P/R ASIC 180nm std. cell library, 2V ~1999 technology Mediabench kernels (1 hot function/benchmark) Memory

33 Caveat Memory we model this part accurately optimistic speed model, no power accounting

34 ASH Performance

35 ASH vs 600MHz CPU

36 ASH Area minimal RISC core

37 Normalized Area many C macros

38 ASH Energy Efficiency

39 All Together Now

40 Conclusions Performance comes at a price Energy efficiency is expressed in ops/nJ or MOPS/mW Dedicated hardware is more power-efficient than microprocessors ASH efficiency competitive with dedicated hardware