Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application of Binary Translation to Java Reconfigurable Architectures Antonio Carlos S. Beck Filho Luigi Carro Instituto.

Similar presentations


Presentation on theme: "Application of Binary Translation to Java Reconfigurable Architectures Antonio Carlos S. Beck Filho Luigi Carro Instituto."— Presentation transcript:

1 Application of Binary Translation to Java Reconfigurable Architectures Antonio Carlos S. Beck Filho caco@inf.ufrgs.br Luigi Carro carro@inf.ufrgs.br Instituto de Informática - GME Universidade Federal do Rio Grande do Sul

2 / 24 The embedded system market is expanding 1Introduction 1

3 / 24 The embedded system market is expanding 1 More performance is required Introduction 1

4 / 24 1Introduction Moreover… Shorter Design cycle The complexity of these embedded systems is increasing as well Battery dependent 2

5 / 24 These embedded systems are adopting Java Devices with Java as cellular phones and PDAs: 176 million in 2001 721 million in 2006 [1] 80% of cellular phones will support Java [2] 10 times more embedded system developers than general-purpose software ones by the year 2010 [3] [1] D. Takahashi, Java Chips Make a Comeback, Red Herring, 2001 [2] G. Lawton, “Moving Java into Mobile Phones”, Computer, vol. 35, n. 6, 2002, pp. 17-20 [3] R.W. Atherton, “Moving Java to the Factory”. IEEE Spectrum, 1998, pp. 18-23, [1] D. Takahashi, Java Chips Make a Comeback, Red Herring, 2001 [2] G. Lawton, “Moving Java into Mobile Phones”, Computer, vol. 35, n. 6, 2002, pp. 17-20 [3] R.W. Atherton, “Moving Java to the Factory”. IEEE Spectrum, 1998, pp. 18-23, 1Introduction 3

6 / 24 The Java Language... Object Oriented Modeling Programation Validation Widely spread Safe Small size of ROM memory (CISC) Multiplataform 1Introduction 4

7 / 24 2Motivation 5 How to increase the performance with low power consumption?

8 / 24 2Motivation 5 How to increase the performance with low power consumption? Using a reconfigurable array!

9 / 24 2Motivation 5 How to increase the performance with low power consumption? Using a reconfigurable array! Special tools and compilers are needed!

10 / 24 2Motivation 5 How to increase the performance with low power consumption? Using a reconfigurable array! Special tools and compilers are needed! No software portability! And the design cycle?

11 / 24 2Motivation 5 How to increase the performance with low power consumption? Using a reconfigurable array! Special tools and compilers are needed! No software portability! And the design cycle?

12 / 24 Outline Java processors Using Binary Translation with reconfigurable arrays The reconfigurable array Results Area Performance Power consumption Conclusions and Future Work 3 6

13 / 24 Femtojava Low-Power 7 4

14 / 24 Instruction Fetch Decoder Operand Fetch Execution Write Back Five stages: Femtojava Low-Power 4 8

15 / 24 Instruction Fetch Decoder Operand Fetch Execution Write Back With a instruction queue of 9 bytes long to handle with variable size instructions IADD Femtojava Low-Power 4 8

16 / 24 Instruction Fetch Decoder Operand Fetch Execution Write Back Responsible for the generation of the microOPs and for checking data dependence IADD11011… Femtojava Low-Power 4 8

17 / 24 Instruction Fetch Decoder Operand Fetch Execution Write Back It has a register bank with two ports Stack and local variable storage implemented in this register file 4 Top of Stack 2 7 8 3 9 2 4 POP Femtojava Low-Power 4 8

18 / 24 Instruction Fetch Decoder Operand Fetch Execution Write Back It has a register bank with two ports Stack and local variable storage implemented in this register file 4 Top of Stack 2 7 8 3 9 2 4 POP Allows comparisons with RISC machines! Femtojava Low-Power 4 8

19 / 24 Instruction Fetch Decoder Operand Fetch Execution Write Back Six functional units: multiplier, ALU, shifter, constant generator, branch and LD/ST 2 4 + = 6 Femtojava Low-Power 4 8

20 / 24 Instruction Fetch Decoder Operand Fetch ExecutionWrite Back Write the results back to the stack or local variable storage Top of Stack 6 7 8 3 9 Femtojava Low-Power 4 8

21 / 24 VLIW Architecture Instruction Fetch Decoder Operand Fetch Execution Write Back VLIW packet has a variable size In this case, The VLIW packet can have 1 or 2 instructions/packet Instruction 1 2 instructions/VLIW packet: Instruction 2 9 5

22 / 24 Instruction Fetch Decoder 2 Operand Fetch Execution Write Back Decoder 2 doesn’t support calls and return of methods Instruction 111011… Decoder 1 Instruction 211011… 9 VLIW Architecture 5

23 / 24 Instruction Fetch Decoder Operand Fetch Execution Write Back Each flow has its own operand stack The local variable pool of the method is shared 4 2 7 8 3 9 No mechanism is necessary for communication among the flows! Register Bank 1 Operand Stack Local Variable Pool 6 1 Register Bank 2 Operand Stack 9 VLIW Architecture 5

24 / 24 Instruction Fetch Decoder Operand Fetch Execution Write Back Six functional units: multiplier, ALU, shifter, constant generator, branch and LD/ST They are replicated in each flow 9 VLIW Architecture 5

25 / 24 Instruction Fetch Decoder Operand Fetch ExecutionWrite Back Write the results back to the operand stack of each flow OR to local variable storage of the 1st register bank 9 VLIW Architecture 5

26 / 24 Why use a reconfigurable array? Hypothesis: substitution of a sequence of instructions by a combinational circuit saves power (we loose area) Let us see the multiplication algorithm example TC alg = n*(TPFF+n*T  +Tset) TC CC = n* n*T  (very pessimistic)

27 / 24 The Binary Translation 6 10 BT: take a binary code and produce another binary for a different machine BT advantages when used with reconfiguration: One can detect paralelism and reconfigure the array at run-time No need for special tools or compilers anymore! We solve the sw-compatibility problem

28 / 24 The Binary Translation 6 How it works? Observe the bytecodes looking for frequently executed sequences Save this sequence in a special cache When this sequence of instructions is found again, the array is reconfigured and set as active functional unit 10

29 / 24 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Considering these bytecodes Bytecodes Detection 7 11

30 / 24 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bytecodes Detection 7 11

31 / 24 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bytecodes Detection 7 11

32 / 24 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bytecodes Detection 7 11

33 / 24 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul The instructions depend on each other! Bytecodes Detection 7 11

34 / 24 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bytecodes Detection 7 11

35 / 24 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul These two blocks are independent !!! Bytecodes Detection 7 11

36 / 24 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Operand Block 1 – First Sequence Operand Block 2 – Second Sequence Bytecodes Detection 7 11

37 / 24 The Reconfigurable Array 8 The array is coarse-grain It allows to save a great number of sequences in the cache The reconfiguration is fast 12

38 / 24 The Reconfigurable Array 8 The array is coarse-grain It allows to save a great number of sequences in the cache The reconfiguration is fast It is formed by one or more basic cells With one multiplier and a sequence of seven sets of basic functional units 13

39 / 24 General Overview 9 Detector Unit...... Reconfiguration Cache Array 14

40 / 24 CACO-PS C ycle A ccurate CO nfigurable P ower S imulator Power Simulator Based on the switching activity Pd = α. fc. C. Vdd² Result is given in number of gate capacitances that switch 15 10

41 / 24 Results A set of algorithms were executed in the architectures Sin Calculation Sort – Bubble Sort – Select Sort – Quick (10 and 100 elements) Search – Binary Search – Sequential IMDCT (plus three unrolled versions) Floating Point Sums emulation Full MP3 PLAYER 16 11

42 / 24 Performance 17 11

43 / 24 17 Performance 11

44 / 24 17 The same number of different sequences of instructions Performance 11

45 / 24 17 Parallelism exposed by loop unrolling Performance 11

46 / 24 17 Parallelism exposed by loop unrolling Performance 11

47 / 24 17 No more parallelism available! Performance 11

48 / 24 17 No more parallelism available! Performance 11

49 / 24 17 There is room for improvement! Performance 11

50 / 24 17 Performance 11 Compare these two and you can save reconfiguration memory

51 / 24 Energy in memory accesses 11 18

52 / 24 19 Energy in the cores 11

53 / 24 20 Total Energy Consumption 11

54 / 24 VLIW 2 21 Area 11

55 / 24 VLIW 2 22 Final Results 11

56 / 24 With BT, a reconfigurable array and Java we achieve at the same time: The Java concept of write once, run everywhere Software portability for different machines Performance Low Energy Consumption thanks to combinational circuits and paralelism we still can reduce Vdd HW upgrades with SW compatibility 23 Conclusions 12

57 / 24 Use Binary Translation with CMP At run-time detect what is the best core to execute the software at certain time 24 Future Works (I) 12

58 / 24 Future works (II) Implement the BT and reconfigurable array in traditional RISC machines What are the differences of implementation?

59 / 24 Questions? carro@inf.ufrgs.br caco@inf.ufrgs.br The end... ?


Download ppt "Application of Binary Translation to Java Reconfigurable Architectures Antonio Carlos S. Beck Filho Luigi Carro Instituto."

Similar presentations


Ads by Google