Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang.

Similar presentations


Presentation on theme: "A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang."— Presentation transcript:

1 A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

2 Outline Introduction Introduction Motivation and Observation Motivation and Observation The Proposed Bus Architecture The Proposed Bus Architecture Experimental Results Experimental Results Conclusion Conclusion

3 Introduction Crosstalk is the effect due to the coupling capacitances. Crosstalk is the effect due to the coupling capacitances. Crosstalk causes additional delay, power consumption and incorrect result of a circuit. Crosstalk causes additional delay, power consumption and incorrect result of a circuit. Crosstalk effect becomes much more serious in long on-chip bus. Crosstalk effect becomes much more serious in long on-chip bus.

4 3C T i-1 TiTiTiTi W j-1 WjWjWjWj W j+1 4C T i-1 TiTiTiTi W j-1 WjWjWjWj W j+1 Crosstalk Type Crosstalk is classified into 4 types [Duan2001] Crosstalk is classified into 4 types [Duan2001] 1C T i-1 TiTiTiTi W j-1 WjWjWjWj W j+1 2C T i-1 TiTiTiTi W j-1 WjWjWjWj W j+1

5 Delay with / without Crosstalk Delay comparison for bus length 10mm in 100 nm process [Duan2001] Time (ps)

6 Bit Ratio of 3C and 4C 3C and 4C types of crosstalk cause serious delay penalty but take only a small portion of the total transmitted data. 3C and 4C types of crosstalk cause serious delay penalty but take only a small portion of the total transmitted data. benchmark bits of instruction bits of 3C and 4C ratio of 3C and 4C (%) multiply update convolution dot_product fir2dim fir irr_nsection matrix lms

7 Fetch Rate and Commit Rate In superscalar architecture, the instruction fetch rate is much higher than instruction commit rate in bus transmission. In superscalar architecture, the instruction fetch rate is much higher than instruction commit rate in bus transmission. 0% 20% 40% 60% 80% 100% multiplydot_productupdatematrixconvolutionirr_nsectionfir lms fir2dim average 36.03% commit rate

8 Basic Architecture Memory Processor Prefetch unit mbbus de- assembler bb+nassembler b bus

9 bus width = 128, channel number = 4, channel size = 32 bus width = 128, channel number = 4, channel size = 32 Bus Structure Memory bus Prefetch unit channel 1 channel 2 channel 3 channel 4 data T, 1 data T, 2 data T, 3 data T, 4

10 An Example at Cycle t crosstalk Memory channel 1 channel 2 channel 3 channel 4 Prefetch unit data sent at cycle t-1 are recorded crosstalk data t, 3 no crosstalk data t, 4 data t, 3 no crosstalk data t-1, 1 data t-1, 2 data t-1, 3 data t-1, 4 data t, 2 data t, 1 data t, 3 data t, 2 crosstalk? NOP data t, 1 NOP data t, 2 bus width = 128, channel number = 4, channel size = 32 bus width = 128, channel number = 4, channel size = 32

11 data t, i+1 Separation Bits Crosstalk elimination between adjacent data segments. Crosstalk elimination between adjacent data segments. Distinguish data segment from NOP segment. Distinguish data segment from NOP segment. data t, i ?XX 00 0 data or NOP ? ? NO crosstalk

12 Crosstalk Free Connection X data t, i X data t, i+1 ? X

13 Crosstalk Free Connection

14 Crosstalk Free Cyclic Any pairs in crosstalk free cyclic incur no crosstalk Any pairs in crosstalk free cyclic incur no crosstalk

15 Data Segment Combination data t, i is REAL DATA data t, i is NOP data t, i data t, i+1 data t, i data t, i+1

16 Separation Bits Assignment data t, i is REAL DATA data t, i is NOP data t, i separation bits is1 0 separation bits is 1 0 separation bits is0 0 separation bits is 0 0 data t, i+1 data t, i data t, i+1

17 De-Assembler Architecture NOP regregreg reg cross- detector Sel_logic MUX1 MUX1 MUX1 MUX1 MUX2MUX2MUX2MUX2 separation unit [134:103] channel 1 [100:69] channel 2 [66:35] channel 3 [32:1] channel 4 separation bits [102:101] separation bits [68:67] separation bits [34:33] separation bits [0] data 1 [127:96] data 2 [95:64] data 3 [63:32] data 4 [31:0]

18 De-Assembler Architecture NOP regregreg reg cross- detector Sel_logic MUX1 MUX1 MUX1 MUX1 MUX2MUX2MUX2MUX2 separation unit [134:103] channel 1 [100:69] channel 2 [66:35] channel 3 [32:1] channel 4 separation bits [102:101] separation bits [68:67] separation bits [34:33] separation bits [0] data 1 [127:96] data 2 [95:64] data 3 [63:32] data 4 [31:0]

19 De-Assembler Architecture NOP regregreg reg cross- detector Sel_logic MUX1 MUX1 MUX1 MUX1 MUX2MUX2MUX2MUX2 separation unit [134:103] channel 1 [100:69] channel 2 [66:35] channel 3 [32:1] channel 4 separation bits [102:101] separation bits [68:67] separation bits [34:33] separation bits [0] data 1 [127:96] data 2 [95:64] data 3 [63:32] data 4 [31:0]

20 De-Assembler Architecture NOP regregreg reg cross- detector Sel_logic MUX1 MUX1 MUX1 MUX1 MUX2MUX2MUX2MUX2 separation unit [134:103] channel 1 [100:69] channel 2 [66:35] channel 3 [32:1] channel 4 separation bits [102:101] separation bits [68:67] separation bits [34:33] separation bits [0] data 1 [127:96] data 2 [95:64] data 3 [63:32] data 4 [31:0]

21 De-Assembler Architecture NOP regregreg reg cross- detector Sel_logic MUX1 MUX1 MUX1 MUX1 MUX2MUX2MUX2MUX2 separation unit [134:103] channel 1 [100:69] channel 2 [66:35] channel 3 [32:1] channel 4 separation bits [102:101] separation bits [68:67] separation bits [34:33] separation bits [0] data 1 [127:96] data 2 [95:64] data 3 [63:32] data 4 [31:0]

22 De-Assembler Architecture NOP regregreg reg cross- detector Sel_logic MUX1 MUX1 MUX1 MUX1 MUX2MUX2MUX2MUX2 separation unit [134:103] channel 1 [100:69] channel 2 [66:35] channel 3 [32:1] channel 4 separation bits [102:101] separation bits [68:67] separation bits [34:33] separation bits [0] data 1 [127:96] data 2 [95:64] data 3 [63:32] data 4 [31:0]

23 Assembler Architecture Prefetch unit (buffer queue) MUX 1 MUX 2 MUX 3 MUX 4 DSel_logic separation bits [102:101] separation bits [68:67] separation bits [34:33] separation bits [0] [134:103] channel 1 [100:69] channel 2 [66:35] channel 3 [32:1] channel 4

24 Performance Improvement tech100nm70nm buslength10mm15mm20mm10mm15mm20mm 0C C C C C deassembler assembler improvement ratio (%)

25 Extra Wires Number Comparison The number of extra wires compares with Victors work. [Victor2001] The number of extra wires compares with Victors work. [Victor2001] bus width Ours VictorstheoreticalVictorspractical Channel size

26 Cycle Count Overhead Ratio Channel Size complex_multiply0.17%0.04%0.09%0.26% complex_update0.13%0.04%0.17%0.50% Convolution0.13%0.32%0.06%0.28% dot_product0.08%00.13%0.21% Fir2dim0.03%0.08%0.06%0.18% Fir0.11%0.01%0.02%0.08% iir_Nsection0.14%0.11%0.06%0.37% iir_1section0.17%0.08%0.13%0.43% Lms0.09%0.07%0.12%0.15% Matrix0.01%0.01%0.06%0.02% Matrix1x30.14%0.07%0.11%0.18% n_complex_update0.05%0%0.07%0.21% n_real_update0.08%0.02%0.08%0.28% real_update0.05%0.88%0.18%0.39% average0.10%0.12%0.09%0.25%

27 Conclusion A novel bus structure to eliminate 3C and 4C crosstalk. A novel bus structure to eliminate 3C and 4C crosstalk % performance improvement ratio in the best case % performance improvement ratio in the best case. With only 7 extra wires as compared with 85 [Victor2001 ]. With only 7 extra wires as compared with 85 [Victor2001 ].

28 Appendix The area overhead for 128-bit bus width with channel size 32 The area overhead for 128-bit bus width with channel size 32 area type OursVictors logic circuit deassembler / encoder gate count area (μm ) # storage element (bits) 1280 assembler / decoder gate count area (μm ) # extra wires (bits) 785

29 Appendix The overall improvement on bus transmission.


Download ppt "A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang."

Similar presentations


Ads by Google