We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byOsbaldo Allard
Modified over 3 years ago
Jan M. Rabaey Dejan Marković Low Power Design Essentials ©2008 Chapter 5 Optimizing Power @ Design Time Architectures, Algorithms and Systems
Low Power Design Essentials ©2008 5.2 Chapter Outline The architecture/system trade-off space Concurrency improves energy-efficiency Exploring alternative topologies Removing inefficiency The cost of flexibility
Low Power Design Essentials ©2008 5.3 Motivation Optimizations at the architecture or system level can enable more effective power minimization at the circuit level (while maintaining performance), such as –Enabling a reduction in supply voltage –Reducing the effective switching capacitance for a given function (physical capacitance, activity) –Reducing the switching rates –Reducing leakage Optimizations at higher abstraction levels tend to have greater potential impact –While circuit techniques may yield improvements in the 10-50% range, architecture and algorithm optimizations have reported orders of magnitude power reduction
Low Power Design Essentials ©2008 5.4 [Ref: D. Markovic, JSSC’04] Circuit Optimization Limited in Range Case study: Tree adder Result of joint (V DD, V TH, W) optimization: –65% of energy saved without delay penalty –25% smaller delay without energy cost Need higher level optimizations for larger gain Lessons Learned from Circuit Optimization D/D ref E/E ref 00.511.5 1 0.5 0 65% ref Ref: min delay @ nominal V dd, V th 25%
Low Power Design Essentials ©2008 5.5 Logic/RT (Micro-)Architecture Software Circuit Device System/Application Increasing Return-on- Investment (ROI) at higher levels of the stack Chapter 4 The Design Abstraction Stack
Low Power Design Essentials ©2008 5.6 Removing inefficiencies (1) Discrete options (3) Alternative topologies (2) D E D E D E Architecture and system transformations and optimizations reshape the E-D curves Expanding the Playing Field
Low Power Design Essentials ©2008 5.7 (while maintaining performance) Concurrency: trading off clock frequency versus area to reduce power F1 Consider the following reference design F2 R R R f ref R: register, F1,F2: combinational logic blocks (adders, ALUs, etc) C ref : average switching capacitance [A. Chandrakasan, JSSC’92] Reducing the Supply Voltage
Low Power Design Essentials ©2008 5.8 F1 F2 R R R f ref /2 F1 F2 R R R f ref /2 Running slower reduces required supply voltage Yields quadratic reduction in power Almost cancels A Parallel Implementation
Low Power Design Essentials ©2008 5.9 Assuming ov par = 7.5% Example: 90nm Technology 0.50.60.70.80.91 1 1.5 2 2.5 3 3.5 4 4.5 5 V DD (norm.) t p (norm.)
Low Power Design Essentials ©2008 5.10 F1 F2 R R R f ref R R Assuming ov pipe = 10% Shallower logic reduces required supply voltage A Pipelined Implementation (this example assumes equal V dd for par / pipe designs)
Low Power Design Essentials ©2008 5.11 Can combine parallelism and pipelining to drive V DD down But, close to process threshold overhead of excessive concurrency starts to dominate Assuming constant % overhead Increasing use of Concurrency Saturates 246810121416 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Concurrency Power
Low Power Design Essentials ©2008 5.12 P Fixed Throughput V DD Concurrency Nominal design (no concurrency) P min Only option: Reduce V TH as well! But: Must consider Leakage … Overhead + leakage Increasing use of Concurrency Saturates
Low Power Design Essentials ©2008 5.13 Delay = 1/Throughput E Op nominalN=2N=3N=4N=5 increasing level of parallelism Fixed throughput Optimum Energy-Delay point Mapping into the Energy-Delay Space For each level of performance, optimum amount of concurrency Concurrency only energy-optimal if requested throughput larger than optimal operation point of nominal function [Ref: D. Markovic, JSSC’04] © IEEE 2004
Low Power Design Essentials ©2008 5.14 time-mux reference Introduce Time-Multiplexing! A f f A f f A ff ff 2f2f (that is, at no concurrency) Absorb unused time slack by increasing clock frequency (and voltage …) Again comes with some area and capacitance overhead! What if the Required Throughput is Below Minimum?
Low Power Design Essentials ©2008 5.15 Max E Op 11212 1313 1414 1515 2 3 4 D target A = A ref 1515 Data for 64-b ALU parallelism time-mux AREA SMALL LARGE Concurrency and Multiplexing Combined
Low Power Design Essentials ©2008 5.16 For maximum performance –Maximize use of concurrency at the cost of area For given performance –Optimal amount of concurrency for minimum energy For given energy –Least amount of concurrency that meets performance goals For minimum energy –Solution with minimum overhead (that is – direct mapping between function and architecture) Some Energy-Inspired Design Guidelines
Low Power Design Essentials ©2008 5.17 [Ref: R. Subramanyan, Tampere’99] Concepts Slowly Embraced in Late 90’s Normalized processor speed [mA/ MIP] computational efficiency memory Transistors/chip 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 11 10 12 10 2 10 1 10 0 196019701980 199020002010 0.001 0.01 0.1 1 10 100 microprocessor/DSP
Low Power Design Essentials ©2008 5.18 Confirmed by Actual Processors … [Courtesy: J. DeVale and B. Black, Intel, ‘05 ]
Low Power Design Essentials ©2008 5.19 [Ref: S. Chou, ISSCC’05] 20002008+2004 1 10 100 Processor performance (for constant power envelope) 10x 3x Single Core Dual/Many Core And Finally Accepted in the 00’s
Low Power Design Essentials ©2008 5.20 Xilinx Vertex 4 IBM/Sony Cell Processor Intel Montecito ARMARM Heterogeneous reconfigurable fabric fabric UCB Pleiades NTT Video codec (4 Tensilica cores) AMD DualCore Fully Accepted in 00’s [© Xilinx, Intel, AMD, IBM, NTT]
Low Power Design Essentials ©2008 5.21 Amdahl’s Law: Serial = 6.7% Serial = 20% Serial = 0% The Quest for Concurrency
Low Power Design Essentials ©2008 5.22 13mm, 100W, 48MB Cache, 4B Transistors, in 22nm 12 Cores 48 Cores 144 Cores Single Core Performance 1 0.5 0.3 0 0.2 0.4 0.6 0.8 1 1.2 Large Med Small Relative Performance [Courtesy: S. Borkar, Intel, 2006] The Quest for Concurrency
Low Power Design Essentials ©2008 5.23 [Ref: A, Chandrakasan, TCAD’95; D. Markovic, JSSC’07] Loop folding / unfolding Others: loop retiming, loop pipelining, (de)-interleaving, … Algebraic transformations F F F F F F Manipulating Concurrency Through Transformations
Low Power Design Essentials ©2008 5.24 Example: Visualizing MPEG-4 encoder Parallelism [Courtesy: W.M. Hwu, Illinois] Concurrent Compilers to Pick Up the Pace
Low Power Design Essentials ©2008 5.25 Single transformation Combined transformations [Courtesy: W.M. Hwu, Illinois] Impact of Code Transformations
Low Power Design Essentials ©2008 5.26 D E D E F’ F” Choosing Between Alternative Topologies Multiple computational topologies for single function F E.g.: adders, ALUs, multipliers, dividers, goniometric functions Each topology comes with its own optimal E-D curve Absolute optimal E-D for function F obtained by composite Plotting unarguably the best possible implementation of F for a given E or D and technology
Low Power Design Essentials ©2008 5.27 R2: Radix 2; R4: Radix 4 64 bit CLA Adders; 130 nm CMOS; Static adders low power but slow Dynamic adders the preferred choice for higher performance Higher radix orders improve efficiency overall [Ref: R. Zlatanovici, ESSCIRC’03] Adder Example: Static versus Domino Static R2 Domino R4 Domino R2 Compound Domino R2 8 13 18 23 28 33 38 579111315 Delay [FO4] Energy [pJ]
Low Power Design Essentials ©2008 5.28 Conventional CLA Higher stack in first stage Simple sum precompute Ling CLA Lower stack in first stage Complex sum precompute Higher speed Adder Example: Static CLA versus Ling 0 10 20 30 40 50 68101214 Delay [FO4] Energy [pJ] R2 Ling R2 CLA R4 CLA R4 Ling [Ref: R. Zlatanovici, ESSCIRC03] © IEEE 2003
Low Power Design Essentials ©2008 5.29 D E Implementations for a given function maybe inefficient and can often be replaced with more efficient versions without penalty in energy or delay Improving Computational Efficiency Inefficiencies arise from: Over-dimensioning or over-design Generality of function Design methodologies Limited design time Need for flexibility, re-use and programmability
Low Power Design Essentials ©2008 5.30 Some simple guidelines: Match computation and architecture –Dedicated solutions superior by far Preserve locality present in algorithm –Getting data from far away is expensive Exploit signal statistics –Correlated data contains less transitions than random data Energy on demand –Only spend energy when truly needed Improving Computational Efficiency
Low Power Design Essentials ©2008 5.31 Choice of computational architecture can have major impact on energy efficiency (see further) Example: Compute y = A∙x 2 + B∙x + C or Matching Computation and Architecture
Low Power Design Essentials ©2008 5.32 Example: Word-length optimization Most algorithms (wireless, multimedia) simulated and developed in floating point Programmable architectures operate on fixed word length (16, 32,64 bit) Creates substantial amount of switching overhead for many computations Careful selection of word length leads to substantial power reduction Matching Computation and Architecture quantizers
Low Power Design Essentials ©2008 5.33 Design Example: SVD Processor for MIMO MIMO channel: Matrix H 1 st path, 1 = 1 2 nd path, 2 = 0.6 x y Tx array Rx array [Ref: D. Markovic, JSSC’07] V†V† V 11 U 44... U†U† z'1z'1 z'4z'4 Tx Rx Encoding & Modulation Demodulation & Decoding Channel H = U ∙ Σ ∙ V † © IEEE 2007
Low Power Design Essentials ©2008 5.34 Clock Period Energy sizing Area interleaving +folding word-length initial synthesis word-length sizing max Vdd Vdd scaling Clock Period target speed (40%) (20%) (30%) (7x) (36x) SVD: Optimization Techniques [Ref: D. Markovic, JSSC’07] © IEEE 2007
Low Power Design Essentials ©2008 5.35 Impact of combined optimizations Folding, interleaving, sizing, word length, voltage scaling 64x area & 16x energy reduction compared to direct mapping [Ref: D. Markovic, JSSC’07] Energy-Area-Delay Tradeoff in SVD Energy DelayArea 0 40% 16b design word-size sizing 30% Initial synthesis V DD scaling Optim. V DD, W 30% 20% Interl. 13.8x Fold 2.6x Final design © IEEE 2007
Low Power Design Essentials ©2008 5.36 2.1 GOPS/mW –100 MHz clock –70 GOPS –Power = 34mW 20 GOPS/mm 2 –3.5mm 2 –70 GOPS Power/Area Optimal 4x4 SVD Chip 34mW, 3.5mm 2 2.1 GOPS/mW @ V DD = 0.4V 0.010.1 110 0.01 0.1 1 10 100 Energy efficiency (GOPS/mW) Area efficiency (GOPS/mm 2 ) 1998 18-6 SVD 2000 4-2 1999 15-5 1998 7-6 2004 18-5 2000 14-8 1998 18-3 2000 14-5 Comparison with ISSCC chips [Ref: D. Markovic, JSSC’07] © IEEE 2007
Low Power Design Essentials ©2008 5.37 Prime example: memory hierarchy register files, caches, instruction loop buffers, memory partitioning, distributed memory Fetching data and instructions from local rather than global resources reduces access cost (interconnect, access energy) Main memory PP Instr.PC Main memory Cache Memory PP Instr.PC small, fast, efficient Slow, expensive Locality of Reference
Low Power Design Essentials ©2008 5.38 (Hardware) instruction loop buffer IMEM Loop Buffer Proc. Core PC IC + LC Locality of Reference On first iteration, code cached in loop buffer Fetched from loop buffer on subsequent iterations Popular feature in DSPs
Low Power Design Essentials ©2008 5.39 Reference code Compiler Loop1 Loop2 Loop1 Loop2’ or Improved temporal locality of data [Ref: H. De Man, ISSCC’05] Software Optimizations Crucial © IEEE 2005
Low Power Design Essentials ©2008 5.40 VGA quality MPEG 4 on 1.6 GHz Pentium M [Ref: H. De Man, ISSCC’05] Software Optimizations – Example © IEEE 2005
Low Power Design Essentials ©2008 5.41 Sequential data vary often displays temporal correlation Temporally uncorrelated data maximizes transitions Preserving correlations (= avoiding time sharing) good idea. Cntr1 Cntr2 Cntr1 Cntr2 mbus bus2 or Exploiting Signal Statistics [Courtesy: A. Chandrakasan]
Low Power Design Essentials ©2008 5.42 30% reduction in signal activity Exploiting Signal Statistics
Low Power Design Essentials ©2008 5.43 Programmable solutions very attractive –Shorter time to market –High reuse –Field updates (reprogramming) But come at a large efficiency cost –Energy/function and throughput-latency/function substantially higher than dedicated implementation How to combine flexibility and efficiency? –Simple versus complex processors –Stepping away from “completely flexible” to “somewhat dedicated” –Concurrency versus clock frequency –Novel architectural solutions such as reconfiguration The Cost of Flexibility
Low Power Design Essentials ©2008 5.44 D E # Apps 1 N Dedicated Programmable The Cost of Flexibility
Low Power Design Essentials ©2008 5.45 Embedded Processors SA110 0.4 MIPS/mW ASIPs DSPs DSP: 3 MOPS/mW Dedicated HW Flexibility (Coverage) Energy Efficiency MOPS/mW (or MIPS/mW) 0.1 1 10 100 1000 Reconfigurable Processor/Logic Pleiades 10-80 MOPS/mW [Ref: J. Rabaey, Tampere’99] Approximately three orders of magnitude in inefficiency from general-purpose to dedicated! Benchmark numbers @ 1999 The Cost of Flexibility
Low Power Design Essentials ©2008 5.46 [Ref: T. Claasen, ISSCC’99; H. De Man, ISSCC’05] 32 bit IPE GP microprocessor 22.214.171.124.07 feature size( m) 1000 100 10 1 0.1 0.01 0.001 Power efficiency PE (GOPS/Watt ) Reconfigurable // computing Muxed data paths IS Computing mpu asip-dsp cg fg The Cost of Flexibility – Evolution
Low Power Design Essentials ©2008 5.47 Least-Mean-square Pilot Correlators for CDMA (1.67 MSymbols Data Rate) Complexity: 300 Mmult/sec and 360 Mmac/sec [Ref: N. Zhang, PhD’01] ASIC implementation 1.2-2.4 GOP @ 12 mW Architecture comparison – single correlator The Cost of Flexibility – Example TypePowerArea Commercial DSP460 mW1100 mm 2 Configurable Proc.18 mW5.5 mm 2 Dedicated3 mW1.5 mm 2
Low Power Design Essentials ©2008 5.48 PP AC1 M AC2 Bus Dedicated accelaratorsApplication-specific processor M1 M2 M$ ALU M Bus General-purpose processor PP M RC1 Bus Reconfigurable processor RC2 RC3 RN The Architectural Choices
Low Power Design Essentials ©2008 5.49 Best explored using Energy-Delay curves For each proposed architecture and parameters set, determine average energy-delay over library of benchmark examples Modern computer-aided design tools allow for quick synthesis and analysis –Leads to fair comparison Example: Subliminal Project - University of Michigan –Explores processor architecture over the following parameters: Depth and number of pipeline stages; Memory: Von Neumann or Harvard; ALU Width(8/16/32); With or without explicit register file Simple versus Complex Processors?
Low Power Design Essentials ©2008 5.50 Pareto analysis over 19 processors Simple versus Complex Processors [Ref: D. Blaauw, ISCA’05]
Low Power Design Essentials ©2008 5.51 Tailor processor to be efficient for sub-set of applications –Memory architecture, interconnect structure, computational units, instructions Digital-signal processors best known example –Special memory architecture provides locality –Datapath optimized for vector-multiplication (originally) Examples now available in many other areas (graphics, security, control, etc) Application-Specific Processors
Low Power Design Essentials ©2008 5.52 The first type of application-specific processor to become popular Initially mostly for performance, but energy benefit now also recognized Key properties: dedicated memory architecture (multiple data memories), data path specialized for specific functions such as vector multiplies and FFTs Over time: introduction of more and more concurrency (VLIW) RamY N 16 Mult 16 16 Acc 40 ALU 40 RamX N 16 Example 1: DSPs
Low Power Design Essentials ©2008 5.53 Energy efficiency of DSPs doubles every 18 months (“Gene’s Law”), but… [Ref: G. Frantz, TI] DSPs Deliver Improving Energy-Efficiency DSP Power Dissipation Trends 1,000 100 10 1 0.1 0.01 0.001 0.0001 0.00001 mW / MMACs Year Gene’s Law DSP Power 19821986199019941998200220062010
Low Power Design Essentials ©2008 5.54 DSP Proc.198219922002 2012 (?) Techno (nm)300080018020 # Gates50K500K5G50G V DD (V)5.0 1.00.2 GHz0.0200.080.510 MIPS5405K50K MIPS / W48010K1G mW / MIPS25012.50.10.001 [Ref: G. Frantz, TI] Performances of DSP Processors
Low Power Design Essentials ©2008 5.55 [Courtesy: C. Rowen, Tensilica’01] Combines spatial and temporal processing Core processor with extendible instruction set Application Specific Instruction Processors (ASIP)
Low Power Design Essentials ©2008 5.56 4 extra instructions 1700 additional gates No cycle time impact Code size reduction Impact of adding special instructions [Courtesy: C. Rowen, Tensilica’01] Advantage of Application Specific Processors
Low Power Design Essentials ©2008 5.57 Diamond 388VDO Video Processor Top Level Block Diagram ISA extensions to support Context-adaptive Binary Arithmetic Coding (CABAC) in H.264 decoding* unaugmented core ISA extended core 710 Mcycles/sec13 Mcycles/sec CABAC cycles 164mJ5mJ Energy/sec Area cost for CABAC ISA Extensions: 20 Kgates [Courtesy: C. Rowen, Tensilica’07] Optimizing Energy in Video * 5Mbps H.264 MP stream with MBAff enabled, at D1 resolution
Low Power Design Essentials ©2008 5.58 Often executed functions implemented as dedicated modules and executed as co-processors Opportunities: Network processing, MPEG Encode/Decode, Speech, Wireless Interfaces Advantage: Energy- efficiency of custom implementation Disadvantage: Area- overhead Hardware Accelerators Example: Computational core of Texas Instruments OMAP 2420 Platform TM ARM11 TMS320C55 DSP 2D/3D Graphics Accelerator Imaging Video Accelerator Security Accelerator: SHA-1, DES, AES, PKA, Secure WDT Timers, Interrupt Controllers Shared Memory Controller, DMA [Ref: OMAP Platform, TI]
Low Power Design Essentials ©2008 5.59 2.23 mm 3.54 mm, 260K transistors TCP Offload Engine [Courtesy: S. Borkar, Intel’05] Example: networking coprocessor Hardware Accelerators 10 2 10 3 10 4 10 5 10 6 19952000200520102015 MIPS GP MIPS @75W APS MIPS @~2W Year
Low Power Design Essentials ©2008 5.60 Configuration Bus “Programming in space” Create dedicated co- processors by reconfiguring interconnect between dedicated computational models. Efficiency of hardwired accelerators, but increased flexibility and reuse (smaller area) Configurable Interconnect Arithmetic Module Configurable Logic PP (Re)configurable Processors [Ref: H. Zhang, JSSCC’00]
Low Power Design Essentials ©2008 5.61 for(i=1;i<=L;i++) for(k=i;k<=L;k++) phi[i][k]= phi[i-1][k-1] +in[NP-i]*in[NP-k] -in[NA-1-i]*in[NA-1-k]; end start Embedded processor AddrGen MEM: in ALU AddrGen MEM: phi MPY Code seg Example: Co-Variance Matrix Computation Programming in space Programming in time Programming in Space versus Time
Low Power Design Essentials ©2008 5.62 VCELP coder for wireless (1 mW @ 250 nm CMOS) [Ref: H. Zhang, JSSCC’00] Example: Reconfigurable Processor for Wireless
Low Power Design Essentials ©2008 5.63 79.7% of VCELP Code maps onto Reconfigurable Datapath VCELP code breakdownVCELP Energy breakdown Compared to state-of-art 17mW DSP Results of VCELP Voice Coder [Ref: H. Zhang et al., JSSCC’00]
Low Power Design Essentials ©2008 5.64 Other examples: ADRES, Cluster, CoolDSP, SiliconHive Dynamic Reconfigurable vector engine Reconfigured on the fly One cycle context switch Coarse grain heterogeneous type Native 24bit data-width Max Clock Freq. 166MHz Deployed in portable music and game players Example: Sony Virtual Mobile Engine (VME) [Ref: K. Seno, HotChips’04]
Low Power Design Essentials ©2008 5.65 Effectiveness of alternative architectures (ASIP, Accelerator, Reconfigurable) determined by the amount of code spawned from GP Mostly effective for repetitive kernels 80-20% rule typically seems to apply Transformations can help to improve effectiveness Most important: code development and algorithm selection that encourages concurrency Remember: Amdahl’s Law Still Holds
Low Power Design Essentials ©2008 5.66 Domain-specific platforms combine multiple computational concepts to optimize flexibility, performance and energy- efficiency TM-xxxx D$ I$ TriMedia CPU DEVICE I/P BLOCK...... DVP System Silicon PI BUS SDRAM MMI DVP MEMORY BUS DEVICE I/P BLOCK PRxxxx D$ I$ MIPS CPU DEVICE I/P BLOCK...... PI BUS TriMedia TM MIPS TM Example: NXP Nexperia Platform for Multimedia Applications Bringing It All Together [Ref: Nexperia, NXP]
Low Power Design Essentials ©2008 5.67 Example: A programmable HDTV media processor Combines VLIW DSP with configurable media co-processors [Ref: Nexperia, NXP] A Heterogeneous Platform Configurable accelerator for image filtering
Low Power Design Essentials ©2008 5.68 Combines “enhanced ARM processor”, multiple accelerator processors, I/O modules and sophisticated interconnect network OMAP Platform for Wireless [Ref: OMAP, TI]
Low Power Design Essentials ©2008 5.69 Architectural and algorithmic optimization can lead to drastic improvements in energy-efficiency Concurrency is an effective means to improve throughput at fixed currency or reduce energy for fixed throughput Energy-efficient architectures specialize the implementation of often recurring instructions or functions Summary and Perspectives
Low Power Design Essentials ©2008 5.70 Theses: M. Potkonjak, “Algorithms for high level synthesis: resource utilization based approach,” PhD thesis, UC Berkeley, 1991. N. Zhang, “Algorithm/Architecture Co-Design for Wireless Communication Systems,” PhD thesis, UC Berkeley, 2001. Articles: D. Blaauw, B. Zhai, “Energy Efficient Design for Subthreshold Supply Voltage Operation,” IEEE International Symposium on Circuits and Systems (ISCAS), April, 2006 S. Borkar, “Design challenges of technology scaling,” IEEE Micro, vol.19, no.4, p.23-29, July-Aug. 1999. A.P. Chandrakasan, S. Sheng, R.W. Brodersen, “Low-power CMOS digital design,” IEEE Journal of Solid-State Circuits, vol. 27, no. 4, pp. 473-84, April 1992. A. Chandrakasan, M. Potkonjak, J, Rabaey and R. Brodersen, "Optimizing Power using Transformations", IEEE Transactions on Computer Aided Design, vol. 14, No 1, pp. 12-31. Jan. 1995. S. Chou, “Integration and innovation in the nanoelectronics era, “ Keynote presentation, Digest of Technical Papers Solid-State Circuits Conference (ISSCC05), pp. 36-41, February 2005. T. Claasen, “High speed: not the only way to exploit the intrinsic computational power of silicon,” Keynote presentation, Digest of Technical Papers Solid-State Circuits Conference (ISSCC99), pp. 22–25, February 1999. H. De Man, “Ambient intelligence: gigascale dreams and nanoscale realities,” Keynote presentation, Digest of Technical Papers International Solid-State Circuits Conference (ISSCC '05), pp. 29–35, February 2005. G. Frantz, http://blogs.ti.com/2006/06/23/what-moore-didn%e2%80%99t-tell-us-about-ics/http://blogs.ti.com/2006/06/23/what-moore-didn%e2%80%99t-tell-us-about-ics/ K. Keutzer, S. Malik, R. Newton, J. Rabaey and A. Sangiovanni-Vincentelli, “System Level Design: Orthogonalization of Concerns and Platform-Based Design,” IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems, vol.19, no.12, pp.1523-43, Dec. 2000,. References
Low Power Design Essentials ©2008 5.71 References Articles (cntd) T. Kuroda, T. Sakurai, “Overview of low-power ULSI circuit techniques,” IEICE Trans. on Electronics, vol. E78-C, no. 4, pp. 334-344, April 1995. D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True Energy- Performance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1282-1293, Aug. 2004. D. Markovic, B. Nikolic, R.W. Brodersen, “Power and Area Minimization for Multidimensional Signal Processing,” IEEE Journal of Solid-State Circuits, vol. 42, no. 4, pp. 922-934, April 2007. Nexperia, NXP Semiconductors, http://www.nxp.com/products/nexperia/about/index.html OMAP, Texas Instruments, http://focus.ti.com/general/docs/wtbu/wtbugencontent.tsp?templateId=6123&navigationId=11988&conte ntId=4638 J. Rabaey, “System-on-a-Chip – A Case for Heterogeneous Architectures”, Invited Presentation, Wireless Technology Seminar, Tampere, May 1999. Also in HotChips’2000. K. Seno, “A 90nm embedded DRAM single chip LSI with a 3D graphics, H.264 codec engine, and a reconfigurable processor“, HotChips 2004. R. Subramanyan, “Reconfigurable Digital Communications Systems on a Chip”, Invited Presentation, Wireless Technology Seminar, Tampere, May 1999. H. Zhang, V. Prabhu, V. George, M. Wan, M. Benes, A. Abnous, J. Rabaey, “A 1V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications,” IEEE Journal of Solid-State Circuits, vol. 35, no. 11, pp. 1697-1704, Nov. 2000. (also ISSCC 2000) R. Zlatanovici, B. Nikolic, “Power-Performance Optimal 64-bit Carry-Lookahead Adders,” in Proc. European Solid-State Circuits Conf. (ESSCIRC), pp. 321-324, Sept. 2003.
DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Computer Abstractions and Technology
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Survey of Digital Signal Processors Michael Warner ECD: VLSI Communication Systems.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Understanding the Sources of Inefficiency in General-Purpose Chips.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
L27:Lower Power Algorithm for Multimedia Systems 성균관대학교 조 준 동
Zheming CSCE715. A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
© 2018 SlidePlayer.com Inc. All rights reserved.