Download presentation

Presentation is loading. Please wait.

Published byChaim Spink Modified over 2 years ago

1
Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

2
Outline Introduction Classification of Cross-talk types The Story so far.. Eliminating 3C and 4C sequences Eliminating 4C sequences Eliminating 2C sequences Eliminating 1C sequences Experimental Results Conclusions

3
Introduction Verified cross-talk trends Accurate 3-D capacitance extraction Delay variation 2.47:1 (200 m wires, 10X drivers, 0.1 m technology) Deep sub-micron process s t w a v a CICI CLCL v a CLCL CLCL CICI a v a CLCL v a CLCL CICI CICI CLCL a a v a CLCL v CLCL CLCL CICI CICI a CICI a a v v CICI CLCL CLCL CLCL CICI CICI CLCL CLCL CLCL CICI CLCL CICI CLCL CLCL

4
Cross-talk vs Bus Data Pattern When λ ~ 0.1μm, r = C I /C L ~ 10 (metal 4) Effective total capacitance depends on bus data sequence : Best case: 0 x C I Worst case: 4 x C I 0·C I C total = 0 ·C I C total = 4 ·C I 0·C I 2·C I

5
Classification of Cross-talk 4·C sequence: 3·C sequence: 2·C sequence: 1·C sequence: 0·C sequence: Forbidden patterns (010 and 101) Maximum bus data rate depends on total capacitance seen by any bit

6
Previous work – Eliminating 3C & 4C Sequences Simple approach: shielding No 3C/4C sequences ; bus-width is doubled Theorem: If no forbidden patterns are allowed on the bus, Proof: see Analysis and Avoidance of Cross-talk in Buses – Duan, Tirumala, Khatri (Hot Interconnects August 2001). So we simply encode the data on the bus to get rid of the forbidden patterns Recurrence equation for asymptotic bus overhead CODEC implementation to demonstrate practicality

7
Eliminating 3C & 4C sequences 44% asymptotic overhead Look-Up Table, straightforward, can achieve minimum overhead (44%), but not practical Our implementation 62.5% overhead (higher than minimum) Modular and straightforward Break bus into 4-bit groups Encode each group independently (4bit -> 5 bit) Additional logic to handle across- group forbidden patterns

8
Previous Work - Eliminating 4C sequences Less aggressive: eliminating 4C sequences only Less overhead (33%) Simpler algorithm: Divide the bus into 3 bit groups When 4C sequence occurs, complement group data Insert group complement indicator Special handling for across-group 4C sequences (see paper for details) 101 001 -> 010 010 1010 0010 -> 1011 0100

9
CODEC Results Compare waveform with and without coding Random input sequence Random sequence Recovered sequence encoderdecoder driver receiver Random sequence Recovered sequence encoderdecoder driver receiver Encoder/decoder delay ~250ps (memoryless) Max data rate more than 2X compared to scheme with no encoding Speedup is data pattern independent

10
CODEC Results … 2 Bus length 5mm, 10mm or 20mm Driver strength 30X, 60X and 120X of minimum

11
Further Speedup Possible? Can we exploit crosstalk to further speed up the bus? Eliminate 2C sequences Eliminate 1C sequences Simulation shows that eliminating 2C sequences results in a speedup of 2X – 4X over eliminating 3C/4C sequences Note that we seek memory-less CODEC based techniques Lets look at eliminating 2C and 1C sequences next…

12
Eliminating 2C sequences How to guarantee a 2C free sequence? Find a vector clique such that any pair of elements in this clique only exhibit 1C transitions between them For an n bit bus, we need a k bit encoded bus (k > n) such that the new bus has a 2C free clique of cardinality greater than or equal 2 n Solution is memoryless (no need to remember the last transmit word) Fast and simple CODEC implementation We have an inductive method to construct 2C free cliques

13
Constructing 2C free Cliques Inductive method, extends a known clique C n = {v} Let v = v. v n First set C n+1 = {}, and C n+1 <= C n+1 U v Definition: the 0-extended subset of C n+1 is: Definition: the 1-extended subset of C n+1 is: Constructing Create a new vector and Add the vector unless there exist a vector in S 1 such that: and Constructing : similar to Finally where Theorem: Both sets of the previous step are 2C free cliques. Proof - see paper

14
Constructing 2C free Cliques … 2 Some observations about the construction Vectors ending with 01 and 10 can not co-exist in C n The first n-bits of any vector of C n+1 is the same as some vector of C n and the last two bits are 00 or 11. In other words, C n+1 is at least as large as C n Because of (a), we know there will be no 011 or 100 in the same clique C n+1 So we can construct vectors of C n+1 ending in 001 or 110 by add 1 to vectors ending with 00 or add 0 to vectors end with 11. However, we can not have both

15
Constructing 2C free Cliques … 3 Consider the construction of C 4 from C 3 : 000 100 001 111 0000 1000 0011 1111 0001 1001 0010 1110 0000 1000 0011 1110 1111 Quadratic number of tests required as described above. We can do better…

16
Constructing C n+1 from C n using the 0-extended subset Similar algorithm when we use the 1-extended subset Clique Extension Algorithm append 0 to n-bit vectors ending with 0 append 1 to n-bit vectors ending with 1 since we use the 0-extended subset of C n+1 If there is no n-bit vector ending with 01 Append 1 to vectors ending with 00 If there is no n-bit vector ending with 11 Append 1 to vectors ending with 10 The new clique has no vectors ending with 10

17
Clique Extension Algorithm … 2 Simply perform both versions of the clique extension algorithm Select the result according to the rule: where Some values of clique sizes: NClique size 34 45 57 69 710

18
Area Overhead Trends Asymptotic overhead is 146% Lower for smaller bus sizes. Suggests partitioning of bus into smaller sections

19
1C free Configurations 1C free sequences have least delay (typically 50% of 2C free sequences) Just send any data bit multiple times (3/5…) No encoder/decoder needed (no extra codec delay) Simulation shows its the fastest compared to any other techniques with similar area overhead: 3x (or 5x) separation between wires Widening the trace (3x): small R, bigger C A B C A B C A B C

20
Bus configurations for 1C delay We simulated the delay of several different bus configurations Different configurations yield different delay and area trade-offs w w w w wvariablew w w w w A: 3-wire group, fixed spacing within group, variable spacing between groups. w w w w wvariablew w w w w B: similar to A but with a ground shielding between groups. variable C: no shielding wires, vary wire sizes and spacing w w w w w w w w wvariable w w w w w w w w w D: 5-wire group, fixed spacing within group, variable spacing between groups. largest overhead variable

21
1C free Configurations Circuit parameters are extracted using SPACE3D Bus simulations CODEC was not modeled Spice3f5, 0.1μm BPTM model Transmission line with inter-wire coupling Quantify actual delay of 1C free bus vector sequences for the 4 configurations described 20mm wire, 30X driver (IDEAL 1C free delay 153ps, 3C free delay 793ps)

22
Delays for 1C free Configurations Configuration C has significantly larger delay than others (3X) since its essentially a 3C free configuration (has no shielding) All other configurations shows up to 2.5X speed up over 3C free bus. For all configurations, the actually delays are larger than IDEAL 0C delay This is caused by skew on the outer shielding wires Transition of dynamic shields of any wire are slightly misaligned Verified by intentionally skewing the delay on signals

23
Conclusions Inter-wire capacitance increasingly significant for DSM VLSI bus delays We have developed an array of CODECs to trade off bus area overhead with delay 4C free = 33% 3C free = 62% 2C free = 146% (asymptotic), up to 4X to 6X faster Inductive algorithm for 2C free clique construction Simulated several 1C free configurations for area overhead and delays (no CODECs) 1C free techniques not as fast as expected

24
Thank You!

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google