We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published bySonny Harbach
Modified over 2 years ago
High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based on slides and textbook from Wayne Wolf
© 2006 Elsevier Topics Bus encoding. Security-oriented architectures. CPU simulation. Configurable processors.
© 2006 Elsevier Bus encoding Encode information on bus to reduce toggles and dynamic energy consumption. Count energy consumption by toggle counts. Bus encoding is invisible to rest of architecture. Some schemes transmit side information about encoding. memCPUencdec encoded bus side information
© 2006 Elsevier Bus-invert coding Stan and Burleson: take advantage of correlation between successive bus values. Choose sending true or complement form of bus values to minimize toggles. Why might this approach work well? Can break bus into fields and apply bus-invert coding to each field. How might the bus be divided?
© 2006 Elsevier Working zone encoding Mussoll et al.: Used to encode address buses Uses the observation that the majority of the execution time for a program is spent in a small range of addresses Divides addresses into sets called working-zone Address in a working zone is sent as an offset from the base in a one-hot code. Why is a one-hot code used? Addresses that are not in a working zone have the entire value sent. Compared to bus-invert coding, what would you expect to be the advantages and disadvantages of this approach?
© 2006 Elsevier Address bus encoding Benini et al: cluster correlated address bits and then encode clusters Compute correlation coefficients of transition variables to determine clusters: Need to ensure clusters dont become too large, since this can increase encode/decode logic. Use logic synthesis to design encoders and decoders for each cluster
© 2006 Elsevier Benini et al. results [Ben98] © 1998 IEEE What important tradeoffs of the address encoding technique are not shown in the table below?
© 2006 Elsevier Dictionary-based encoding Takes advantage of the observation that many values are repeated on buses.
© 2006 Elsevier Dictionary-based encoding Takes advantage of the observation that many values are repeated on buses. Divides bus into three parts: Only the upper bits of the bus are stored in the dictionary and used to match dictionary values that are indexed by the index part. When the upper bits match, they are put in a high-Z state and the remaining bits are sent; otherwise all bits are sent.
© 2006 Elsevier Lv et al. dictionary-based architecture [Lv03] © 2003 IEEE
© 2006 Elsevier Lv et al. energy savings [Lv03] © 2003 IEEE
© 2006 Elsevier Security-oriented architectures There are a variety of security attacks: Typical desktop/server attacks, such as Trojan horses and viruses. Physical access allows side channel attacks. Cryptographic instruction sets have been developed for several architectures. Embedded systems architecture must add protection for side effects, consider energy consumption.
© 2006 Elsevier Secure architectures SmartMIPS and ARM SecureCore offer security extensions Include encryption instructions, specialized memory management units, etc. SAFE-OPS Designed to protect against software modification Compiler embeds a watermark into code based on register assignment. FPGA accelerator checks the validity of the watermark during execution.
© 2006 Elsevier Power attacks Kocher et al.: Adversary can observe power consumption at pins and deduce data, instructions within CPU. Yang et al.: Dynamic voltage/frequency scaling (DVFS) can be used as a countermeasure. [Yan05] © 2005 ACM Press
© 2006 Elsevier CPU simulation Performance vs. energy/power simulation. Temporal accuracy. Trace vs. execution. Simulation vs. direct execution. Simulate using appropriate benchmarks for embedded systems Dont use SPEC CPU Benchmarks! Embedded Benchmarks include EEMBC, MediaBench, MiBench Benchmarks often should be domain-specific
© 2006 Elsevier Trace-based analysis Instrumentation generates side information. PC-sampling checks PC value during execution. Can measure control flow, memory accesses.
© 2006 Elsevier Program counter (PC) sampling Example: Unix prof. Interrupts are used to sample PC periodically. Must run on the platform. Doesnt provide complete trace. Subject to sampling problems: undersampling, periodicity problems. Generates a call-graph report that indicates the percentage execution time spent in each program.
© 2006 Elsevier Program instrumentation Example: dinero. Modify the program to write trace information. Track entry into basic blocks. Requires editing object files. Provides complete trace.
© 2006 Elsevier Microarchitecture-modeling simulators Varying levels of detail: Instruction scheduler is not cycle-accurate. Cycle timers are cycle-accurate. Can simulate for performance or energy/power. Typically written in general-purpose programming language (e.g., C), not hardware description language.
© 2006 Elsevier Cycle-accurate simulator Models the microarchitecture. Simulating one instruction requires executing routines for instruction fetch, decode, execute, etc. Models pipeline state. Microarchitectural registers are exposed to the simulator. reg IR PC I-box
© 2006 Elsevier Trace-based vs. execution-based Trace-based: Gather trace first, then generate timing information. Basic timing information is simpler to generate. Full timing information may require regenerating information from the original execution. Requires owning the platform. Execution-based: Simulator fully executes the instruction. Requires a more complex simulator. Requires explicit knowledge of the microarchitecture, not just instruction execution times.
© 2006 Elsevier Power simulation Model capacitance in the processor. Keep track of activity in the processor. Requires full simulation. Activity determines capacitive charge/discharge, which determines power consumption. CPU Power Simulators include: Simple Power and Wattch for embedded GP Trimaran with EPIC Explorer for embedded VLIW
© 2006 Elsevier Automated CPU design Customize aspects of CPU for application: Instruction set. Functional units. Memory system (including register files). Busses, I/O, and peripherals. Tools help design and implement custom CPUs. FPGAs make it easier to implement custom CPUs. Application-specific instruction processor (ASIP) has custom instruction set. Configurable processor is generated by a tool set.
© 2006 Elsevier Techniques Architecture optimization tools help choose the instruction set and microarchitecture. Configuration tools implement the microarchitecture (and perhaps compiler). Early example: MIMOLA  analyzed programs, created microarchitecture and instructions, synthesized logic.
© 2006 Elsevier CPU configuration process
© 2006 Elsevier Tensilica configuration options © 2004 Tensilica
© 2006 Elsevier Tensilica EEMBC comparison © 2004 Tensilica
© 2006 Elsevier Tensilica energy consumption by subsystem © 2006 Tensilica
© 2006 Elsevier Toshiba MePcore
© 2006 Elsevier LISA language [Hof01] © 2001 IEEE
© 2006 Elsevier LISA descriptions and generation Memory model includes registers and other memories. Uses clause binds operations to hardware. Timing specified by PIPELINE, IN, ACTIVATION, ENTITY. Generates hierarchical VHDL design.
© 2006 Elsevier PEAS-III Synthesis driven by: Architectural parameters such as number of pipeline stages. Declaration of function units. Instruction format definitions. Interrupt conditions and timing. Micro-operations for instructions and interrupts. Generates both simulation and synthesis models in VHDL.
© 2006 Elsevier Instruction set synthesis Generate instruction set from application program, other requirements. Sun et al. analyzed design space for simple BYTESWAP() program. [Sun04] © 2004 IEEE
© 2006 Elsevier Complex function definition Atasu et al. try to combine many operations into an instruction: Disjoint operator graphs. Multi-output instructions. Operator graph must be convex---value cannot leave, then re-enter the instruction. Textbook discusses several other approaches [Ata03] © 2003 ACM Press
© 2006 Elsevier Limited-precision arithmetic Fang et al. used affine arithmetic to analyze numerical characteristics of algorithms. Mahlke synthesize variable bit-width architectures given bit-width requirements. Cluster operations to find a small number of distinct bit widths. What advantages and disadvantages might this approach have? [Mah01] © 2001 IEEE
High Performance Embedded Computing © 2007 Elsevier Chapter 2, part 3: CPUs High Performance Embedded Computing Wayne Wolf.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
Extensible Processors. 2 ASIP Gain performance by: Specialized hardware for the whole application (ASIC). − Almost no flexibility. −High cost. Use.
Processor Structure and Function Chapter8:. CPU Structure CPU must: Fetch instructions –Read instruction from memory Interpret instructions –Instruction.
Computer Organization and Assembly language
High Performance Embedded Computing © 2007 Elsevier Lecture 7: Memory Systems & Code Compression Embedded Computing Systems Mikko Lipasti, adapted from.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Instruction Set Design
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
Configurable, reconfigurable, and run-time reconfigurable computing.
Lecture 6 Programming the TMS320C6x Family of DSPs.
William Stallings Computer Organization and Architecture 6 th Edition Chapter 3 System Buses.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
Computer Abstractions and Technology
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
Computer Organization and Architecture William Stallings 8th Edition
High Performance Embedded Computing © 2007 Elsevier Lecture 3: Design Methodologies Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Choice for the rest of the semester New Plan –assembler and machine language –Operating systems Process scheduling Memory management File system Optimization.
CS1104: Computer Organisation School of Computing National University of Singapore.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Automated Design of Custom Architecture Tulika Mitra
Configurable System-on-Chip: Xilinx EDK
CS 104 Introduction to Computer Science and Graphics Problems
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
William Stallings Computer Organization and Architecture 6th Edition
Intro to CS Chapt 2 Data Manipualtion 1 Data Manipulation How is data manipulated inside a computer? –How is data input? –How is it stored? –How is it.
Lecture 15 Microarchitecture Level: Level 1. Microarchitecture Level The level above digital logic level. Job: to implement the ISA level above it. The.
Architecture Support for OS CSCI 444/544 Operating Systems Fall 2008.
Stored Program Concept: The Hardware View
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
ECEG-3202 Computer Architecture and Organization Chapter 3 Top Level View of Computer Function and Interconnection.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Computer operation is of how the different parts of a computer system work together to perform a task.
Lecture 14 Today’s topics MARIE Architecture Registers Buses
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture The Five Classic Components of a Computer Chapter 4 Topic: Processor Design Control.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning. Bus-based systems.
System Calls 1.
Computer Organization and Architecture
© 2017 SlidePlayer.com Inc. All rights reserved.