Presentation is loading. Please wait.

Presentation is loading. Please wait.

Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu

Similar presentations


Presentation on theme: "Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu"— Presentation transcript:

1 BAT: Performance-Driven Crosstalk Mitigation Based on Bus-grouping Asynchronous Transmission
Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences

2 Outline Introduction Proposed BAT Scheme Implementation of BAT
Experimental Results Conclusions 2019/1/2

3 Introduction Technology improvement Lower voltage Higher frequency
Higher transistor density Smaller feature size Q: What are the implications for bus wires? 2019/1/2

4 Introduction This kinds of Capacitance is dominate
2019/1/2

5 Introduction Crosstalk Speed: 1.8x slower 0.25um Length = 100um
Y 0.25um Length = 100um Fan-out = 2 Z Speed: 1.8x slower 2019/1/2

6 Crosstalk Factor Crosstalk Insensitive Crosstalk Sensitive
[P.P. Sotiriadis et al, 2001] Crosstalk Insensitive Crosstalk Sensitive 2019/1/2

7 Introduction As the technology advanced, the impact of crosstalk gets worse! Aspect ratio gets bigger Bus width gets wider The bus transmission is likely to encounter depressing crosstalk delay. Q: How to alleviate the crosstalk delay effects on bus transmission? 2019/1/2

8 The conventional approaches
Codec [B. Victor et al, 01] [P.P. Sotiriadis et al, 01] Pros: Relatively low bandwidth overhead: but at least 47% Cons: Hard-constructed Codec algorithm for large bus width Shield: Passive Shield, Active Shield [H.Kaul et al, 02] [R.Arunachalam et al, 03] High performance, but Area-hungry: usually 100% area overhead 2019/1/2

9 Several new approaches
Delay-line bus [M. Ghoneima et al, 04] Pros: Nearly zero bandwidth overhead Cons: Very complicated synchronization Lack of scalability Variable cycle transmission [L. Li et al, 04] Low area overhead High performance for relatively narrow buses Due to “Cask Effect”, it is likely to fail for wide buses (width>64-bit or more) 2019/1/2

10 Variable cycle transmission (DYN)
Supposing a transition between two patterns If the two patterns is: { } → { } Transition: {– ↓↑↑↓↑ – ↓} Delay vector: {1, 3, 2, 2, 4, 3, 1, 1} If { } → { } Transition: {– ↓↑- ↓↑ – ↓ } Delay vector: {1, 3, 3, 1, 3, 3, 1, 1} Q: What if the bus width gets larger? The probability of emergency of “4” tends to get higher, and thereby makes DYN not efficient enough! 2019/1/2

11 Proposed BAT scheme We propose BAT scheme by extending the Variable Cycle Transmission (DYN) scheme What is the BAT ? 2019/1/2

12 Crosstalk Insensitive
BAT scheme Crosstalk Sensitive All transitions are Crosstalk Sensitive Crosstalk Insensitive Not all sub-transitions are Crosstalk Sensitive Asynchronous 2019/1/2

13 BAT How to group the bus into sub-buses? Q: Which one is best? Or Or
It depends on the crosstalk factor distribution! (or application-specific) 2019/1/2

14 Crosstalk Factor Distribution
Instruction bus VS. Data bus Grouping according CF locality Unequally grouping Equally grouping 2019/1/2

15 Implementation of BAT Grouping line Valid indicating line 2019/1/2

16 Differential Counter Cluster Synchronizing Mechanism
Hold C(i, j) is a bi-directional counter, Range: –L ~ +L (L: buffer length) ‘OF’ short for ‘OverFlow’ ‘UF’ short for ‘UnderFlow’ ‘+’ means logical OR OF UF Hold Hold i th sub-bus, if and only if {OF(C(i, 1)) + OF(C(i, 2)) + · · · + OF(C(i, i−1)) + OF(C(i, i+1)) + · · · + OF(C(i, n)) } is true; Hold j th sub-bus, if and only if {UF(C(1, j)) + UF(C(2, j)) + · · · + UF(C(j − 1, j)) + UF(C(j +1, j)) + · · · + UF(C(n, j)) } is true. 2019/1/2

17 DAS Scheme (merge the grouping lines and valid-indicating lines)
Delay Line Active Shield Simultaneous switch Delayed switch 2019/1/2

18 DAS Scheme Delay Active Shield Skew: T/2
Reuse the data-valid indicating line as the group line to reduce wire overhead 2019/1/2

19 Experiment Simplescalar 3.0 SPEC CPU2000 Benchmarks On-chip buses
Instruction bus: Instruction buffer to L1 I$ Data bus: Datapath to L1 D$ Compare against ORI (Original conservative approach) 4 cycle/pattern DYN (Variable cycle transmission) 1~4 cycle/pattern CDC (Codec approaches) 2 cycle/pattern 2019/1/2

20 Results /1 BAT applied to 64-bit instruction bus
Compared with ORI, DYN and Codec approaches, the average performance improvement using BAT scheme with 4-Group configuration is 55.3%, 30.4% and 10.5% respectively. 2019/1/2

21 Results /2 BAT applied to 32-bit data bus
Compared with DYN scheme, we still gain 12.5% performance improvement on average with 4-Group configuration. 2019/1/2

22 Overhead Analysis /1 Wire routing overhead and avg. cycle/pattern
64-bit bus 4-group configuration Approach Normalized Area Avg. cycle/pattern ORI 100 4 DYN 103 2.57 CPC 145 2 PSD 199 ASD 201 1 BAT 113 1.79 2019/1/2

23 Overhead Analysis /2 How much buffer is sufficient to synchronize the data receiving? Buffer Size VS. Avg. cycle/pattern 2019/1/2 4 ~ 8-word is optimum!

24 Conclusions Proposing (BAT) Bus-grouping Asynchronous Transmission scheme Optimizing BAT with the locality of CF (crosstalk factor) Distribution Proposing DCC synchronizing mechanism and DAS scheme Improving performance by 30+% and 10+% compared with DYN and Codec approaches at the cost of 13% routing overhead when applied to a 64-bit bus 2019/1/2

25 Thanks for your attention!


Download ppt "Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu"

Similar presentations


Ads by Google