Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, 2000. Proceedings. 2000 International.

Similar presentations


Presentation on theme: "A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, 2000. Proceedings. 2000 International."— Presentation transcript:

1 A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, 2000. Proceedings. 2000 International Conference on 2000 IEEE Yi-hsin Tseng Date : 11/06/2007

2 Outline Introduction & motivation Code Compression Architecture Decompression Engine Design Experimental results Conclusion & Contributions of the paper Our project Relate to CSE520 Q & A

3 Introduction & motivation

4 For Embedded system More complicated architecture in embedded system nowadays. Available memory space is smaller. A reduced executable program can also indirectly affect the chip on…  Size  Weight  Power consumption

5 Why code compression/decompression? Compress the instruction segment of the executable running on the embedded system…  Reducing the memory requirements and bus transaction overheads Compression  Decompression

6 Related work on compressed instructions A logarithmic-based compression scheme where 32-bit instructions map to fixed but smaller width compressed instructions.  (The system using memory area only) Frequently appearing instructions are compressed to 8 bits.  (fixed-length 8 or 32 bits)

7 The compressed method in this paper Give comprehensive results for the whole system including  buses  memories (main memory and cache)  decompression unit  CPU

8 Code Compression Architecture

9 Architecture in this system (Post-cache) Reason ? -Increase the effective cache size -Improve instruction bandwidth

10 Code Compression Architecture Use SAMC to compress instructions  (Semiadaptive Markov Compression) Divide instructions into 4 groups  based on SPARC architecture  appended a short code (3-bit) in the beginning of each compressed instruction

11 4 Groups of Instructions Group 1  instructions with immediates Ex: sub %i1, 2, %g3 ; set 5000, %g2 Group 2  branch instructions Ex: be, bne, bl, bg,... Group 3  instructions with no immediates Ex: add %o1,%o2,%g3 ; st %g1,[%o2] Group 4  Instructions that are left uncompressed

12 Decompression Engine Design ( Approach)

13 The Key idea is…. Present an architecture for embedded systems that decompresses offline- compressed instructions during runtime  to reduce the power consumption  a performance improvement (in most cases)

14 Pipelined Design

15 Pipelined Design (con’t)

16 Pipelined Design – group 1 (stage 1) Index the Dec. Table Input Compressed Instructions Forward instructions

17 Pipelined Design – group 1 (stage 2)

18 Pipelined Design – group 1 (stage3)

19 Pipelined Design – group 1 (stage 4)

20 Pipelined Design – group 2 branch instructions (stage 1)

21 Pipelined Design – group 2 branch instructions (stage 2)

22 Pipelined Design – group 2 branch instructions (stage 3)

23 Pipelined Design – group 2 branch instructions (stage 4)

24 Pipelined Design – group 3 instructions with no immediates (stage 1) 256 entry table No immediate instructions may appear in pairs. -> compressed in one byte. ( 64 bits) 8 bits as index to address

25 Pipelined Design – group 3 instructions with no immediates (stage 2)

26 Pipelined Design – group 3 instructions with no immediates (stage 3)

27 Pipelined Design – group 3 instructions with no immediates (stage 4)

28 Pipelined Design – group 4 uncompressed instructions

29 Experimental results

30 Use different applications:  an algorithm for computing 3D vectors for a motion picture ("i3d“)  a complete MPEGII encoder ("mpeg ")  a smoothing algorithm for digital images ("smo")  a trick animation algorithm ("trick") A simulation tool written in C for obtaining performance data for the decompression engine

31 Experimental results (con’t) The decompression engine is application specific.  for each application -- build a decoding table and a fast dictionary table that will decompress that particular application only.

32 Experimental results for energy and performance

33 Worse performance on smo 512-byte instruction cache? - Do not require large memory. (Execute in tight loops) - Generates very few misses for this cache size. (So the compressed architecture therefore does not help an already almost perfect hit ratio and the slowdown by the decompression engine prevails)

34 Conclusion & Contributions of the paper This paper designed an instruction decompression engine as a soft IP core for low power embedded systems. Applications run faster as opposed to systems with no code compression (due to improved cache performance). Lower power consumption (due to smaller memory requirements for the executable program and smaller number of memory accesses)

35 Relate to CSE520 Implement the system performance and power consumption by using Pipeline Architecture in system. A different architecture design for lower power consumption on the Embedded system. Smaller cache size perform better on compressed architecture ; larger cache perform better on no-compressed architecture.  Cache hit ratio

36 Our project Goal:  How to improve the efficiency of power management in embedded multicore system Idea:  Use different power mode within a given power budget, global power management policy (In Jun Shen’s presentation)  Use the SAMC algorithm and this decompress architecture as another factor to simulate (This paper) How?  SimpleScalar tool set try simple function at first, then try the different power mode

37 Thank you! Q & A

38 Backup Slides

39 Critique The decompression engine will slowdown the system if the cache generate very few misses for some cache size.

40 Post-cache & Pre-cache Pre-cache: The instruction stored in the I-cache is decompressed. Post-cache: The instruction stored in the I-cache is still decompressed.

41 Problems for post-cache arch Memory Relocation  The compression will change the instruction location in the memory. In pre-cache arch:  Decompression is done before fetch into I- cache, so the address in the I-cache needn’t to be fixed.

42 SPARC Instruction Set Instruction groups  load/store (ld, st,...) Move data from memory to a register / Move data from a register to memory  integer arithmetic (add, sub,...) Arithmetic operations on data in registers  bit-wise logical (and, or, xor,...) Logical operations on data in registers  bit-wise shift (sll, srl,...) Shift bits of data in registers  integer branch (be, bne, bl, bg,...)  Trap (ta, te,...)  control transfer (call, save,...)  floating point (ldf, stf, fadds, fsubs,...)  floating point branch (fbe, fbne, fbl, fbg,...)

43 SPARC Instruction Example


Download ppt "A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, 2000. Proceedings. 2000 International."

Similar presentations


Ads by Google