Presentation is loading. Please wait.

Presentation is loading. Please wait.

A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Dept. of Computer Science & Engineering University.

Similar presentations


Presentation on theme: "A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Dept. of Computer Science & Engineering University."— Presentation transcript:

1 A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside *also with the Center for Embedded Computer Systems, UC Irvine Roman Lysecky Department of IP Management Conexant Newport Beach This work was supported by the National Science Foundation under grants CCR- 9811164 and CCR-9876006, and by a Design Automation Conference graduate scholarship.

2 Introduction: advent of cores In the past, board-level embedded systems were built using discrete IC’s ProcessorMemoryPeripheral Board PeripheralMem Processor IP cores Core library PeripheralA PeripheralB ProcessorX Today, single-IC systems are increasingly being built, using IP’s (Intellectual Property) A.k.a. “cores” Hard core: layout Firm core: structure (HDL) Soft core: synthesizable behavior (HDL) “System-on-a-chip” (SOC)

3 Introduction: embedded systems SOC’s implementing an embedded system have a unique feature Implements a particular application Thus, the processor may execute a single fixed program that never changes Unlike desktop systems, which execute a variety of programs Examples: digital camera, automobile cruise-controller We can exploit this fixed-program feature For example, by using mask-programmed ROM But much more can be done

4 Introduction: architecture tuning Architecture tuning A way to exploit the fixed- program feature of embedded systems First, do architecture design for the particular application Then, “tune” the core- based system architecture to the particular application program, before IC fabrication Goals: better performance, power, size Core library PeripheralA PeripheralB ProcessorX PeripheralProg. Processor Architecture design Fixed program HDL Architecture tuning Prog. Processor Peripheral HDL IC Prog. Processor Peripheral Fabrication Tuned cores

5 Introduction: architecture tuning Examples of tuning optimizations Memory hierarchy: no cache, L1 cache, L1+L2 cache Cache organization: size, associativity, line size Bus structure, data/address encoding Microprocessor optimizations Internal small-loop table Controller partitioning Datapath shortcuts Register file copies

6 Introduction: Tuning is a special case of Y-Chart iteration Philips/TriMedia approach of simultaneously developing architecture and its applications ArchitectureApplications Numbers Mapping Analysis Our focus

7 Problem description Focus of this work: Tuning a microcontroller to its program Goal is reduced power without performance loss Restrict tuning to maintain exact instruction set compatibility No instructions may be added or deleted Thus, no modification to software development environment Also, no problems with porting software to/from other versions of the microcontroller Instruction set incompatibility can be a show stopper

8 Previous work Application-specific instruction-set processors [Fisher99] Customize a microprocessor to its application(s) e.g., Tensilica Customized instruction-set, requiring customized tools Tuning compiler to architecture [Tiwari et al 94] Architectural description languages to inform compiler of architecture features [Halambi et al 99] Tuning cache and cache/bus [Givargis et al 99] organization to application

9 Tuning environment Currently for the 8051 microcontroller Starts from VHDL synthesizable model of 8051 (soft core) Uses Synopsys synthesis, simulation and power analysis Uses 8051 instruction-set simulator Uses numerous scripts Goal of the enviroment Understand how power is being consumed for a particular application, so that modifications to the architecture (or application) can be made to minimize that power Three main tools Architectural view Instruction-set view Program/data memory view

10 Tuning environment: architectural view tool Microprocessor structure Program binary ROM generator ROM entity Simulator and power analyzer “Flat” power data Structural hierarchical power data translator and xdu display Microprocessor soft core RT-synthesizer ROM 1.04 mW ALU 1.62 mW RAM 1.42 mW CTRL 2.69 mW DECODER 0.07 mW Total 7.66 mW

11 Tuning environment: instruction-set view tool Flat power data for instruction 3 Flat power data for instruction 2 Binaries to exe instruction 3 Binaries to exer instruction 2 Microprocessor structure Binaries to exercise instruction 1 ROM generator ROM entity Simulator and power analyzer Flat power data for instruction 1 Power data collector, structural power data translator, and xdu display InstructionPower (mW) ADDC_17.340834 ADD_17.350741 ANL_16.631394 CLR_13.76228 CPL_15.481627 DA5.28897 DEC_15.368807 DIV7.716592 INC_14.662862 MOVC_16.078014 MOVC_25.021021 MOV_15.577664 MOV_26.164267 MUL5.522886 NOP4.900275 ORL_16.954121 POP8.103867 PUSH8.7116

12 Tuning environment: program/data memory view tool Program binary Instruction-set simulator Per-instruction power data Program hierarchy power translator and xdu display Program/data memory access frequencies and power AddrInsFreqPwrFreq*Pwr 00000LJMP100 00003MOV_91085.46067589.752 00005MOV_91085.46067589.752 00007MOV_91085.46067589.752 00009MOV_91085.46067589.752 00011RET10800 00012MOV_9275.46067147.438 00014MOV_9275.46067147.438 00016MOV_9275.46067147.438 00018MOV_9275.46067147.438 00020MOV_4274.83507130.547 00022LCALL2700 AddrPurposeAccesses 00128P0 1311 00129SP 70317 00130DPL 31189 00131DPH 7977 00144P1 161 00208PSW 413527 00224ACC 360949 00240B2598

13 Tuning environment Program binaryMicroprocessor core Program/data memory view tool (seconds) Architectural view tool (1 hour) Instruction-set power view tool (1 day) Program power data Architecture power data Instruction-set power data

14 Design flow using the tuning environment Change application DONE Change architecture Run program / data memory view tool Run architecture view tool Run instruction-set view tool Satisfied? Yes No

15 Sample tuning optimization Observation RAM consumes much power Address 224 accessed frequently Possible tuning optimization Replace this RAM location by a register inside the CTRL module Steps Modify VHDL model Run all three view tools Results Power reduction: 7.67 to 7.27 mW ROM 1.04 mW ALU 1.62 mW RAM 1.42 mW CTRL 2.69 mW DECODER 0.07 mW Total 7.66 mW AddrPurposeAccesses 00128P0 1311 00129SP 70317 00130DPL 31189 00131DPH 7977 00144P1 161 00208PSW 413527 00224ACC 360949 00240B2598

16 Some recent data Applied the tuning environment for a particular application Converted two frequently-accessed RAM locations to registers 15% total power savings Introduced datapath shortcuts for the two most common register-to-register moves of the application, thus bypassing the ALU 10% total power savings Partitioned the controller into two, one small one implementing the frequently-executed instructions 10-15% power savings, but we expect much more if we do a better job partitioning the design

17 Conclusions Described an environment for tuning a microprocessor to its application for low power Full instruction set compatibility Multiple views helps find power hogs Fully automated Focus is now on developing tuning optimizations Controller partitioning, small-loop table, datapath shortcuts, register-file copies, etc. Investigate possibility of automating tuning optimizations, develop more general tuning methodology Environment for the 8051 is available on the web: http://www.cs.ucr.edu/~dalton


Download ppt "A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Dept. of Computer Science & Engineering University."

Similar presentations


Ads by Google