Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.

Similar presentations


Presentation on theme: "Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis."— Presentation transcript:

1 Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece nivas@physics.auth.gr Aristotle University of Thessaloniki

2 Aristotle University of Thessaloniki, IDAACS ‘05 2 Outline Motivation Scope Benchmarking Suite Design Flow Implementation Experimental Results

3 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 3 Great expansion of network applications High performance demands Requirements for flexibility to support new protocols and future applications Fast Time-to-Market requirements  While ASICs lack flexibility and GPP are prohibitively expensive in terms of energy-performance ASIPs exploits special characteristics of the application domain to meet the desired specifications Motivation

4 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 4 Scope Design an ASIP for network applications based on low-cost enhancement of an existing processor Follow a methodology for the implementation of the ASIP from a hardware-software perspective Use the NetBench benchmarking suite for network applications as a vehicle for representing the application domain

5 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 5 NetBench-benchmarking suite A benchmarking suite for network applications which containing a large variety of tasks Four kernels of the suite were used with a set of typical stimulus as inputs – CRC: CRC-32 checksum calculation – DRR: Deficit-round robin (DRR) scheduling – ROUTE: Table lookup implementation along with internet checksum – URL: URL-based switching

6 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 6 ASIP Design Flow A RISC, MIPS-like machine is used as the base processor The application is described in C/C++ The instruction set is extended by special instructions in order to increase performance and reduce power consumption

7 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 7 Network algorithm – Huge source code size – Only a small part of the code is responsible for power and time consumption Pruning – Simulation, profiling, and analysis are performed – The crucial parts of the code are identified – Undesired parts of the code are hide ASIP Design Flow – Pruning

8 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 8 The GNU-GCC is used as the compiler The compiler is cross- configured for the target architecture (MIPS) The pruned code is used Assembly code is generated ASIP Design Flow – Assembly Generation

9 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 9 The assembly code is analyzed with the GNU tools Static analysis – The code is parsed – Basic Blocks are identified Dynamic analysis – The code is simulated – Basic Blocks are weighted with execution frequencies – Frequently executed instructions are identified ASIP Design Flow – Analysis

10 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 10 Frequently executed instructions are consider Instructions are reorder to form patterns The ISA is extended with new complex instructions Hardware modifications for the support of the new ISA are performed ASIP Design Flow – Instruction Generation

11 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 11 Code generation – Identified patterns are substituted by the new defined instructions in the application assembly code A hardware model (VHDL) is constructed and synthesized Evaluation – Execution cycles – Clock speed – Power consumption ASIP Design Flow – Code Generation/Evaluation

12 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 12 Instruction Set Extensions New InstructionCycle Reduction (%) Avg. (%) CRCDRRROUTEURL Inc;Branch8.00.53.03.23.7 DEC;Branch0.00.38.03.83.0 CMP;Branch7.715.112.18.210.8 LDR; Jump0.02.70.0 0.7 ADD; Jump0.0 4.91.3 ADD;LW0.018.613.010.610.5 AND;LW7.70.0 1.9 ADD;SW0.01.611.43.24.0 Shift+XOR15.30.0 3.8 Delay slots reduction 8.211.312.210.110.5 9 New instructions – 5 Control flow Instr. – 3 Addressing modes – 1 pure computation Delay slot reduction mechanism Cycle reduction up to 18.6 %

13 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 13 Hardware Modifications Addition of a shifter for Shift+ALU operations Enhancement of the Control Flow Unit with “Increment/Decrement and Branch” capability Control logic to reduce the delay slots for Branch operations New addressing modes combining addition/AND with Load/Store operations Slightly hardware overhead and no degradation of performance was introduced

14 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 14 Experimental Results The ARM7TDMI and MIPS-like processors were consider for comparison with the designed ASIP The hardware models of MIPS and ASIP were synthesized in STM 0.13um process For the ARM7TDMI core information were taken from the datasheets ARM7TDMIMIPS-likeASIP Pipeline Stages 355 Clock Speed (0.13um process) 133Mhz253Mhz

15 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 15 Performance Results Cycles (10^6) App.ARM7MIPSASIP CRC17.139.625.18 URL1692.25689.70431.62 DRR15.7111.575.73 ROUTE92.1462.7026.99 Cycle accurate simulations were performed – The ARMulator was used for the ARM core – The VHDL model was used for the MIPS and ASIP cores Significant performance improvements are achieved – 80% avg. compared to ARM7 – 50% avg. compared to MIPS

16 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 16 Energy Results *Power (uW) App.ARM7MIPSASIP CRC355270145 URL343292111412127 DRR326325161 ROUTE21751761758 Major source of energy consumption for embedded processors => instruction memory accesses 0.13um SRAM Single port memories models were used Significant energy reductions are achieved – 60% avg. compared to ARM7TDMI – 50% avg. compared to MIPS * Models obtained from Dolphin Integration Embedded Memory Generator

17 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 17 Conclusions An ASIP for multimedia applications was designed following a simple design flow The ASIP was designed with small enhancements of a popular processor Experimental results prove that significant speedups and power savings can be achieved with these small enhancements – 50% avg. performance and energy consumption improvements compared to the base processor Improvements come with small development cost

18 Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 18 Thank You Questions???


Download ppt "Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis."

Similar presentations


Ads by Google