We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byIzabella Allender
Modified over 2 years ago
High Performance Embedded Computing © 2007 Elsevier Chapter 2, part 3: CPUs High Performance Embedded Computing Wayne Wolf
© 2006 Elsevier Topics Bus encoding. Security-oriented architectures. CPU simulation. Configurable processors.
© 2006 Elsevier Bus encoding Encode information on bus to reduce toggles and dynamic energy consumption. Count energy consumption by toggle counts. Bus encoding is invisible to rest of architecture. Some schemes transmit side information about encoding. memCPUencdec encoded bus side information
© 2006 Elsevier Bus-invert coding Stan and Burleson: take advantage of correlation between successive bus values. Choose sending true or complement form of bus values to minimize toggles. Can break bus into fields and apply bus-invert coding to each field.
© 2006 Elsevier Working zone encoding Mussoll et al.: working-zone encoding divides address bus into working zones. Address in a working zone is sent as an offset from the base in a one-hot code. [Mus98] © 1998 IEEE
© 2006 Elsevier Address bus encoding Benini et al: cluster correlated address bits, encode clusters, use combinational logic to encode/decode. Compute correlation coefficients of transition variables to determine clusters:
© 2006 Elsevier Benini et al. results [Ben98] © 1998 IEEE
© 2006 Elsevier Dictionary-based encoding Considers correlations both within a word and between successive words. ENS(x,y) is 0 if both lines stay the same, 1 if one changes, 2 if both change. Energy model includes transitions (ET) and interwire (EI).
© 2006 Elsevier Lv et al. pattern frequency results [Lv03] © 2003 IEEE
© 2006 Elsevier Lv et al. dictionary-based architecture [Lv03] © 2003 IEEE
© 2006 Elsevier Lv et al. energy savings [Lv03] © 2003 IEEE
© 2006 Elsevier Security-oriented architectures A variety of attacks: Typical desktop/server attacks, such as Trojan horses and viruses. Physical access allows side channel attacks. Cryptographic instruction sets have been developed for several architectures. Embedded systems architecture must add protection for side effects, consider energy consumption.
© 2006 Elsevier Smart cards Used to identify holder, carry money, etc. Self-programmable one-chip microcomputer architecture: Allows processor to change code or data. Memory is divided into two sections. Registers allow program in one section to modify the other section without interfering with the executing program.
© 2006 Elsevier Secure architectures MIPS, ARM offer security extensions, including encryption instructions, memory management, etc. SAFE-OPS embeds a watermark into code using register assignment. FPGA accelerator checks the validity of the watermark during execution.
© 2006 Elsevier Power attacks Kocher et al.: Adversary can observe power consumption at pins and deduce data, instructions within CPU. Yang et al.: Dynamic voltage/frequency scaling (DVFS) can be used as a countermeasure. [Yan05] © 2005 ACM Press
© 2006 Elsevier CPU simulation Performance vs. energy/power simulation. Temporal accuracy. Trace vs. execution. Simulation vs. direct execution.
© 2006 Elsevier Engblom embedded vs. SPEC comparison [Eng99b] © 1999 ACM Press
© 2006 Elsevier Trace-based analysis Instrumentation generates side information. PC-sampling checks PC value during execution. Can measure control flow, memory accesses.
© 2006 Elsevier Microarchitecture-modeling simulators Varying levels of detail: Instruction scheduler is not cycle-accurate. Cycle timers are cycle-accurate. Can simulate for performance or energy/power. Typically written in general-purpose programming language, not hardware description language.
© 2006 Elsevier PC sampling Example: Unix prof. Interrupts are used to sample PC periodically. Must run on the platform. Doesnt provide complete trace. Subject to sampling problems: undersampling, periodicity problems.
© 2006 Elsevier Call graph report Main100% f1 f2 --- f137% g1 g2 --- f223% g3 g4 Cumulative execution time
© 2006 Elsevier Program instrumentation Example: dinero. Modify the program to write trace information. Track entry into basic blocks. Requires editing object files. Provides complete trace.
© 2006 Elsevier Cycle-accurate simulator Models the microarchitecture. Simulating one instruction requires executing routines for instruction decode, etc. Models pipeline state. Microarchitectural registers are exposed to the simulator. reg IR PC I-box
© 2006 Elsevier Trace-based vs. execution-based Trace-based: Gather trace first, then generate timing information. Basic timing information is simpler to generate. Full timing information may require regenerating information from the original execution. Requires owning the platform. Execution-based: Simulator fully executes the instruction. Requires a more complex simulator. Requires explicit knowledge of the microarchitecture, not just instruction execution times.
© 2006 Elsevier Sources of timing information Data book tables: Time of individual instructions. Penalties for various hazards. Microarchitecture: Depends from the structure of machine. Derived from execution of the instruction in the microarchitecture.
© 2006 Elsevier Levels of detail in simulation Instruction schedulers: Models availability of microarchitectural resources. May not capture all interactions. Cycle timers: Models full microarchitecture. Most accurate, requires exact model of the microarchitecture.
© 2006 Elsevier Modular simulators Model instructions through a description file. Drives assembler, basic behavioral simulation. Assemble a simulation program from code modules. Can add your own code.
© 2006 Elsevier Early approaches to power modeling Instruction macromodels: ADD = 1 w, JMP = 2 w, etc. Data-dependent models: Based on data value statistics. Transition-based models.
© 2006 Elsevier Power simulation Model capacitance in the processor. Keep track of activity in the processor. Requires full simulation. Activity determines capacitive charge/discharge, which determines power consumption.
© 2006 Elsevier SimplePower simulator Cycle-accurate simulator. SimpleScalar-style cycle-accurate simulator. Transition-based power analysis. Estimates energy of data path, memory, and busses on every clock cycle.
© 2006 Elsevier RTL power estimation interface A power estimator is required for each functional unit modeled in the simulator. Functional interface makes the simulator more modular. Power estimator takes same arguments as the performance simulation module.
© 2006 Elsevier Switch capacitance tables Model functional units such as ALU, register files, multiplexers, etc. Capture technology-dependent capacitance of the unit. Two types of model: Bit-independent: each bit is independent, model is one bit wide. Bit-dependent: bits interact (as in adder), model must be multiple bits. Analytical models used for memories. Adder model is built from sub-model for adder slice.
© 2006 Elsevier Wattch power simulator Built on top of SimpleScalar. Adds parameterized power models for the functional units.
© 2006 Elsevier Array model Analytical model: Decoder. Wordline drive. Bitline discharge. Sense amp output. Register file word line capacitance: C diff (word line driver) + C gate (cell access)*n bit_lines + C metal * Word_line_length
© 2006 Elsevier Bus, function unit models Bus model based upon length of bus, capacitance of bus lines. Models for ALUs, etc. based upon transistion models.
© 2006 Elsevier Clock network power model Clock is a major power sink in modern designs. Major elements of the clock power model: Global clock lines. Global drivers. Loads on the clock network. Must handle gated clocks.
© 2006 Elsevier Automated CPU design Customize aspects of CPU for application: Instruction set. Memory system. Busses and I/O. Tools help design and implement custom CPUs. FPGAs make it easier to implement custom CPUs. Application-specific instruction processor (ASIP) has custom instruction set. Configurable processor is generated by a tool set.M
© 2006 Elsevier Types of customization New instructions: operations, operands, remove unused instructions. Specialized pipelines. Specialized memory hierarchy. Busses and peripherals.
© 2006 Elsevier Techniques Architecture optimization tools help choose the instruction set and microarchitecture. Configuration tools implement the microarchitecture (and perhaps compiler). Early example: MIMOLA analyzed programs, created microarchitecture and instructions, synthesized logic.
© 2006 Elsevier CPU configuration process
© 2006 Elsevier Tensilica configuration options © 2004 Tensilica
© 2006 Elsevier Tensilica EEMBC comparison © 2004 Tensilica
© 2006 Elsevier Tensilica energy consumption by subsystem © 2006 Tensilica
© 2006 Elsevier Toshiba MePcore
© 2006 Elsevier LISA language [Hof01] © 2001 IEEE
© 2006 Elsevier LISA descriptions and generation Memory model includes registers and other memories. Uses clause binds operations to hardware. Timing specified by PIPELINE, IN, ACTIVATION, ENTITY. Generates hierarchical VHDL design.
© 2006 Elsevier PEAS-III Synthesis driven by: Architectural parameters such as number of pipeline stages. Declaration of function units. Instruction format definitions. Interrupt conditions and timing. Micro-operations for instructions and interrupts. Generates both simulation and synthesis models in VHDL.
© 2006 Elsevier Instruction set synthesis Generate instruction set from application program, other requirements. Sun et al. analyzed design space for simple BYTESWAP() program. [Sun04] © 2004 IEEE
© 2006 Elsevier Holmer and Despain 1% rule---dont add instruction unless it improves performance by 1%. Objective function (C = # cycles, I = # instruction types, S = # instructions in program): 100 ln C + I 100 ln C + 20 ln S + I Used microcode compaction algorithms to find instructions.
© 2006 Elsevier Other techniques Huang and Despain used simulated annealing to search the design space. Moves could reschedule micro-op, exchange micro-ops, insert or delete time steps. Kastner et al. used clustering to generate instructions. Clusters must cover the program graph.
© 2006 Elsevier Biswas et al. Biswas et al. used Kernighan-Lin partitioning to find instructions. Argue against large functions. [Bis05] © 2005 IEEE Computer Society
© 2006 Elsevier Complex function definition Atasu et al. try to combine many operations into an instruction: Disjoint operator graphs. Multi-output instructions. Operator graph must be convex---value cannot leave, then re-enter the instruction. [Ata03] © 2003 ACM Press
© 2006 Elsevier Other techniques Pozzi and Ienne extract large instructions, generating multi-cycle operations and multiple memory ports. Tensilica Xpres compiler designs instruction sets: operator fusion, vector, etc.
© 2006 Elsevier Limited-precision arithmetic Fang et al. used affine arithmetic to analyze numerical characteristics of algorithms. Mahlke synthesize variable bit-width architectures given bit-width requirements. Cluster operations to find a small number of distinct bit widths. [Mah01] © 2001 IEEE
High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
© 2004 Wayne Wolf Topics Goals of program performance anlaysis. Worst-case execution time analysis. Execution-based timing analysis.
1 Wattch: A Framework for Architecture- Level Power Analysis and Optimizations Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma University.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
Processor Structure and Function Chapter8:. CPU Structure CPU must: Fetch instructions –Read instruction from memory Interpret instructions –Instruction.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Extensible Processors. 2 ASIP Gain performance by: Specialized hardware for the whole application (ASIC). − Almost no flexibility. −High cost. Use.
CAD for Physical Design of VLSI Circuits 1. Course Outline by Topics Circuit Partitioning Floorplanning Placement Global / Detailed Routing Technology.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Network II.5 simulator.. Overview of network simulator Simulation package for modeling and simulating computer networks and communication systems Can.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Lecture 15 Microarchitecture Level: Level 1. Microarchitecture Level The level above digital logic level. Job: to implement the ISA level above it. The.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture The Five Classic Components of a Computer Chapter 4 Topic: Processor Design Control.
Computer Organization and Architecture William Stallings 8th Edition Chapter 3 Top Level View of Computer Function and Interconnection.
Modern VLSI Design 4e: Chapter 8 Copyright 2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Intro to CS Chapt 2 Data Manipualtion 1 Data Manipulation How is data manipulated inside a computer? –How is data input? –How is it stored? –How is it.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Topics covered: CPU Architecture Computer Organization.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
Question What technology differentiates the different stages a computer had gone through from generation 1 to present?
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems Programmable machines Hardware + Software (program) HardwareProgram.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
EEE226 MICROPROCESSORBY DR. ZAINI ABDUL HALIM School of Electrical & Electronic Engineering USM.
6-1 Chapter 6 - Datapath and Control Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer Architecture.
FPGA-Based System Design: Chapter 6 Copyright 2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Configurable, reconfigurable, and run-time reconfigurable computing.
COMPUTER ORGANIZATION AND ASSEMBLY LANGUAGE Lecture 1 & 2 Introduction and Basics Course Instructor: Aisha Danish.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
Microprogrammed Control Chapter11:. Two methods for generating the control signals are: 1)Hardwired control o Sequential logic circuit that generates.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Educational Computer Architecture Experimentation Tool Dr. Abdelhafid Bouhraoua.
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
Architecture Support for OS CSCI 444/544 Operating Systems Fall 2008.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) September 2003.
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
CISC. What is it? CISC - Complex Instruction Set Computer CISC is a design philosophy that: 1) uses microcode instruction sets 2) uses larger.
Henry Hexmoor1 Chapter 10- Control units We introduced the basic structure of a control unit, and translated assembly instructions into a binary representation.
Automated Design of Custom Architecture Tulika Mitra
© 2017 SlidePlayer.com Inc. All rights reserved.