Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar.

Similar presentations


Presentation on theme: "Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar."— Presentation transcript:

1 Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar

2 Embedded Systems GroupIIT Delhi Slide 2 Introduction  Need for high performance low power processors  Synergistic hardware -compiler design for EPIC or VLIW like architectures  A new variable instruction length scheme  Full predication support in hardware

3 Embedded Systems GroupIIT Delhi Slide 3 Outline  Instruction-Level Parallelism  Power Consumption in VLSI Circuits  A Look at Available Mobile and DSP Processors  High-Level Evaluation of A Low-Power VLIW Processor  The DEVIL Low-Power Processor  A Step Towards Predicated Execution  Conclusion

4 Embedded Systems GroupIIT Delhi Slide 4 ILP : Concepts and Limitations  Data Dependences Flow Dependence or RAW Anti Dependence or WAR Output Dependence or WAW  Reduction of critical path  Control Dependences  Resource Conflicts

5 Embedded Systems GroupIIT Delhi Slide 5

6 Embedded Systems GroupIIT Delhi Slide 6 Achieving ILP : Pipelining  Control dependencies affect pipelined execution  Data dependencies affect pipelined execution  Resource conflicts affect pipelined execution

7 Embedded Systems GroupIIT Delhi Slide 7 Achieving ILP: Superscalar Architectures In-order issue with in-order completion In-order issue with out-of-order completion Out-of-order issue with out-of-order completion

8 Embedded Systems GroupIIT Delhi Slide 8

9 Embedded Systems GroupIIT Delhi Slide 9

10 Embedded Systems GroupIIT Delhi Slide 10

11 Embedded Systems GroupIIT Delhi Slide 11 Achieving ILP: VLIW Processors  Low circuit overhead than Superscalar Processors  Limited number of resources  Explicit insertion of NOPs increases code size

12 Embedded Systems GroupIIT Delhi Slide 12

13 Embedded Systems GroupIIT Delhi Slide 13 Extracting ILP : BasicBlock Scheduling

14 Embedded Systems GroupIIT Delhi Slide 14 Extracting ILP: Superblock Scheduling

15 Embedded Systems GroupIIT Delhi Slide 15 Extracting ILP: Predicated Execution

16 Embedded Systems GroupIIT Delhi Slide 16 Power Consumption in CMOS Circuits : Parallelism for Energy Efficiency

17 Embedded Systems GroupIIT Delhi Slide 17

18 Embedded Systems GroupIIT Delhi Slide 18 Available Mobile and VLIW Processors  The ARM Family The ARM7 Generation The StrongARM The ARM Thumb Option The ARM Piccolo Option The ARM9 and ARM10

19 Embedded Systems GroupIIT Delhi Slide 19 Available Mobile and VLIW Processors  The Motorola M-Core  The LSI TinyRisc  The Hitachi SuperH Family  VLIW Processors The Motorola-Lucent Star*Core The Philips TriMedia The HP/Intel IA-64

20 Embedded Systems GroupIIT Delhi Slide 20 High Level Evaluation of A Low-Power VLIW Processor  Energy consumption distribution

21 Embedded Systems GroupIIT Delhi Slide 21 High Level Evaluation of A Low-Power VLIW Processor  NOP Elimination in VLIW Processor

22 Embedded Systems GroupIIT Delhi Slide 22 High Level Evaluation of A Low-Power VLIW Processor  Speed-up Comparison

23 Embedded Systems GroupIIT Delhi Slide 23 High Level Evaluation of A Low-Power VLIW Processor  Energy Comparison

24 Embedded Systems GroupIIT Delhi Slide 24 High Level Evaluation of A Low-Power VLIW Processor  Energy-Delay Product Comparison

25 Embedded Systems GroupIIT Delhi Slide 25 The DEVIL Low-Power Processor  Complexity in VLIW Architectures Hardware Duplication  FUs and number of registers as well as ports  Number of FUs versus type of FU  Number of FUs versus available ILP

26 Embedded Systems GroupIIT Delhi Slide 26 The DEVIL Low-Power Processor Code Memory

27 Embedded Systems GroupIIT Delhi Slide 27 The DEVIL Low-Power Processor

28 Embedded Systems GroupIIT Delhi Slide 28 The DEVIL Low-Power Processor  Instruction Fetch Mechanism

29 Embedded Systems GroupIIT Delhi Slide 29 The DEVIL Low-Power Processor  Branch Prediction Mechanism

30 Embedded Systems GroupIIT Delhi Slide 30 The DEVIL Low-Power Processor  Performance with and without superscalar optimizations

31 Embedded Systems GroupIIT Delhi Slide 31 The DEVIL Low-Power Processor  Effect of SuperScalar optimization on code size

32 Embedded Systems GroupIIT Delhi Slide 32 The DEVIL Low-Power Processor  Effect of NOP elimination on code size

33 Embedded Systems GroupIIT Delhi Slide 33 The DEVIL Low-Power Processor  Effect of NOP elimination on the number of accesses to code memory

34 Embedded Systems GroupIIT Delhi Slide 34 The DEVIL Low-Power Processor  Effect of instruction fetch mechanism on code size

35 Embedded Systems GroupIIT Delhi Slide 35 The DEVIL Low-Power Processor  Code size comparison with existing mobile processors

36 Embedded Systems GroupIIT Delhi Slide 36 A Step Towards Predicated Execution  Compiler techniques for reducing predicate code size Reduction of number of Control Instructions Predicate promotion and Instruction merging Instruction reduction for advanced code generation

37 Embedded Systems GroupIIT Delhi Slide 37 A Step Towards Predicated Execution: Reduction of number of Control Instructions

38 Embedded Systems GroupIIT Delhi Slide 38 A Step Towards Predicated Execution: Predicate promotion and Instruction merging

39 Embedded Systems GroupIIT Delhi Slide 39 A Step Towards Predicated Execution  Introducing predication support into processor Effect on code size of full predication Predication code size and Execution Characterstics Prefix based predication

40 Embedded Systems GroupIIT Delhi Slide 40 A Step Towards Predicated Execution  Relative number of predicated instructions

41 Embedded Systems GroupIIT Delhi Slide 41 A Step Towards Predicated Execution  Code expansion considering predication

42 Embedded Systems GroupIIT Delhi Slide 42 A Step Towards Predicated Execution  Code reductions due to predicated execution

43 Embedded Systems GroupIIT Delhi Slide 43 Conclusions  A synergistic hardware-compiler approach for low-power processors  A new VLIW architecture to reduce increase in code size  A prefix based predicated execution architecture framework


Download ppt "Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar."

Similar presentations


Ads by Google