Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Analysis, Design, and Measurement Techniques for Multi-Gb/s Data Links Frank O’Mahony Bryan Casper Circuit Research.

Similar presentations


Presentation on theme: "Advanced Analysis, Design, and Measurement Techniques for Multi-Gb/s Data Links Frank O’Mahony Bryan Casper Circuit Research."— Presentation transcript:

1

2 Advanced Analysis, Design, and Measurement Techniques for Multi-Gb/s Data Links Frank O’Mahony (frank.o’mahony@intel.com) Bryan Casper Circuit Research Lab, Intel Hillsboro, OR

3 2 Outline Overview of I/O trends System-level link modeling –Worst-case data eye –Statistical data eye –Design example: 20Gb/s link On-die measurement techniques

4 3 Chip-to-Chip Signaling Trends Decade SpeedsTransceiver Features 1980’s >10Mb/sInverter out, inverter in 1990’s >100Mb/sTermination Source-synchronous clk. 2000’s >1 Gb/sPt-to-pt serial streams Pre-emphasis equalization Future >10 Gb/sAdaptive Equalization, Advanced low power clk. Alternate channel materials Lumped capacitance … Transmission line Lossy transmission line h(t)  Channel noise Sampler Slicer Linear Equalize r Transmit Filter CDR

5 4 CMOS transceiver data rates Plot showing link rate vs. year Courtesy of Prof. Ken Yang, UCLA Technology limited Power/channel limitations

6 5 Power density [Watts/cm 2 ] 1 10 100 1000 1.510.70.50.350.250.180.130.10.07 i386 i486 Pentium® Pentium® Pro Pentium® II Pentium® III Nuclear reactor Pentium® 4 Power Density Increases Exponentially! Rocket Nozzle Hot plate Max power density envelope Process Technology node [μm]

7 6

8 7 Teraflops Research Chip 100 Million Transistors ● 80 Tiles ● 275mm 2 First tera-scale programmable silicon: –Teraflops performance –Tile design approach –On-die mesh network –Power-aware capability Tera-scale many-core μP’s will drive aggregate I/O rates aggressively

9 8 Power efficiency and process technology Process scaling enables lower power data links Channel characteristics can limit achievable power efficiency Courtesy of Prof. Ken Yang, UCLA

10 9 I/O Data Rate and Power Efficiency 01015205 0 40 20 60 Data Rate (Gb/s) Power Efficiency (mW/Gb/s) BNV ISSCC’06 J. Wong VLSI’03 7.5 11.7 Prete ISSCC’07 9.6 R. Palmer ISSCC’07 2.2

11 10 Designing power-efficient multi-Gb/s links Accurate system-level link modeling –Careful statistical accounting of all noises ISI, Xtalk, voltage, and timing noise Power-efficient I/O system implementation –Design within the BW of the process technology –Better channel characteristics enable lower power –Immunity to variation, deterministic and random noise comes at a power cost On-die calibration and measurement –Calibration can significantly reduce power –Measurement necessary to close the modeling loop

12 11 System-level link modeling 1.Empirical calculation –Use random data 2.Peak distortion analysis –Analytical calculation of worst-case eye 3.Statistical ISI analysis –Analytical calculation of BER eye

13 12 Traditional method of signaling analysis and validation Most chip-to-chip signaling links considered in the past used simple Binary NRZ modulation These links had a low symbol rate and little channel memory Transient simulation using a few random data vectors was sufficient to accurately characterize the eye.

14 13 Motivation for behavioral link analysis Simulated eye can be optimistic –Won’t capture worst-case ISI, especially for channels with long memory Characterizes impact of deterministic and random noise sources –For low bit error rates (BER), very unlikely noise conditions must be considered Nearly exact statistical analysis reduces need for excess design margins Fast evaluation of various link architectures without designing complete circuits –e.g. Various equalizers can be traded off easily

15 14 Properties of a Linear Time-invariant System Frequency response (e.g. S-parameters) S 21 FFT Impulse Response Convolution Superposition

16 15 LTI property: Convolution Tx symbol (mirror) Impulse response Pulse response

17 16 LTI property: Superposition Tx symbol …000010000000… In Out Pulse response

18 17 LTI property: Superposition of symbols In Out Response to pattern 100111 Tx symbol … 000010011100 …

19 18 LTI property: Superposition of coupled symbols In Out FEXT Pulse response Tx symbol …000010000000…

20 19 In Out FEXT response LTI property: Superposition of coupled symbols Tx symbol …000011111100…

21 20 Out FEXT response LTI property: Superposition of coupled symbols Tx symbol …000011111100…

22 21 Out Tx symbol …000010011100… Insertion loss response LTI property: Superposition of coupled symbols

23 22 Out Tx symbol …000010011100… Tx symbol …000011111100… FEXT response Insertion loss response Composite response LTI property: Superposition of coupled symbols

24 23 Worst-case eye calculation Eye diagrams are generally calculated empirically –Convolve random data with pulse response of channel –Pulse response is derived by convolving the impulse reponse with the transmitted symbol For eye diagrams to represent the worst-case, a large set of random data must be used –Low probability of hitting worst case data transitions –Computationally inefficient An analytical method of producing the worst-case eye diagram exists –Computationally efficient algorithm

25 24 Differential S Parameters

26 25 Eye diagram (100 bits @5Gb/s)

27 26 Random data eye (100 bits) --- Random data eye (1000 bits) --- Eye diagram (1000 bits @5Gb/s)

28 27 Sample pulse response cursorprecursorpostcursor ISI+ISI-

29 28 Step response 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1

30 29 0 1 1 0 1 0 0 1 0 0 0 0 0 Worst-case 0

31 30 0 1 0 1 1 0 0 0 0 0 Worst-case 1

32 31 1 1 0 1 0 0 1 0 0 1 0 1 1 0 Worst-case 0  Worst-case 1  How to find worst-case patterns

33 32 Worst-case Received Voltage Difference (RVD) 16-34222

34 33 5Gb/s Pulse Response

35 34 5Gb/s Response due to worst-case data pattern Worst-case 0 Worst-case 1

36 35 Worst-case data response Worst-case 1 Lone 1

37 36 5Gb/s WC eye shape Precursor Cursor Postcursor

38 37 WC eye vs random data eye WC eye shape 1000 symbols random data eye 100 symbols random data eye

39 38 BER in 10 X Legend Sample time (sec) Sample voltage (V) BER distribution eye What is a BER distribution eye? Sample time Sample reference BER=10 -10

40 39 BER in 10 X Legend Sample time (sec) Sample voltage (V) BER distribution eye Sample time Sample reference BER=10 -5 What is a BER distribution eye?

41 40 BER in 10 X Legend Sample time (sec) Sample voltage (V) BER distribution eye Sample time Sample reference BER=10 -1 What is a BER distribution eye?

42 41 BER distribution vs Worst-case eye Worst-case eye edges Legend shows BER in 10 X

43 42 BER distribution eye calculation Calculation method is based on pulse response shape Assumption: Equal probability of 1 or 0 Determine probability density function (pdf) of ISI –In contrast to determining peak value of ISI More computationally intensive than Peak Distortion Analysis

44 43 BER eye calculation example (no ISI) 9 0 00 000

45 44 PDF of the cursor (when sending a 1) PDF of cursor for a 1 9 1 PDF of ISI 0 1

46 45 PDF of a 1 PDF of cursor for a 1 9 PDF of ISI 0 PDF of a 1 9 Convolve PDFs

47 46 Cumulative Distribution Function (CDF) of a 1 PDF of a 1 9 9

48 47 BER distribution eye (when sampling a 1) 0 9 9 CDF of a 1 Legend (BER): 1 0

49 48 PDF of a 0 PDF of cursor for a 0 0 PDF of a 0 0 PDF of ISI 0 Convolve PDFs

50 49 Cumulative Distribution Function (CDF) of a 0 PDF of a 0 0 0

51 50 BER distribution eye (when sampling a 0) 0 9 Legend (BER): 1 0 0 CDF of a 0

52 51 BER distribution eye Legend (BER): 0.5 0 0 9 Reference BER=0.5 Reference BER=0 Reference BER=0 Reference BER=0.5 0 CDF of a 1 or 0 9 0 CDF of a 0 p=0.5 9 CDF of a 1 p=0.5

53 52 BER eye calculation example (w/ ISI) 16-34222

54 53 1 st precursor ISI PDF 50% chance of a 1 50% chance of a 0 02 0.5 PDF of 1 st pretcursor ISI

55 54 PDF of 1 st postcursor ISI 50% chance of a 1 50% chance of a 0 0-3 1st postcursor ISI PDF 0.5

56 55 2 nd postcursor ISI PDF 50% chance of a 1 50% chance of a 0 PDF of 2 nd postcursor ISI 04 And so on... 0.5

57 56 -3 result 02 p=0.25 PDF all ISI Convolve individual PDFs 02 1 st Precursor 0-3 1 st Postcursor -302 04 2 nd Postcursor And so on... -3 result 0 2 1 3 46 p=0.125

58 57 PDF all ISI p=1/64 1098765432-2-3-410

59 58 PDF of the cursor (when sending a 1) PDF of cursor for a 1 16 1

60 59 PDF of a 1 PDF of cursor for a 1 16 PDF of ISI 1098765432-2-3-410 PDF of a 1 262524232221201918151413121716

61 60 Cumulative Distribution Function (CDF) of a 1 PDF of a 1 262524232221201918151413121716

62 61 BER distribution eye (when sampling a 1) 16 22 CDF of a 1 Legend (BER): 1 0 Reference BER=0 Reference BER=0.3 Reference BER=0.9 Reference BER=1

63 62 BER distribution eye (when sampling a 0) 16 22 CDF of a 0 Legend (BER): 1 0 Reference BER=1 Reference BER=0.9 Reference BER=0.3 Reference BER=0

64 63 BER distribution eye Reference BER=0.5 Reference BER=0.25 Reference BER=0.25 Reference BER=0.5 Legend (BER): 0.5 0 CDF of a 0 or 1 BER=0 CDF of 1 CDF of 0

65 64 Handling Tx jitter in link analysis Jitter is amplified over lossy channels –Byproduct of frequency-dependent delay and loss –Must be accounted for in analytical model Discussion of these methods is beyond the scope of this presentation Lossy Rx Tx Following primary authors have published techniques to analyze Tx jitter:  Balamurugan, Hanumolu, Sanders, Stojanovic, Casper Following primary authors have published techniques to analyze Tx jitter:  Balamurugan, Hanumolu, Sanders, Stojanovic, Casper

66 65 Signaling analysis summary Link analysis accuracy enables design of balanced link design –Low power –High performance Three types of link analysis –Empirical: Inexact, optimistic, time consuming –Peak distortion: Uses LTI to find worst-case eye, can be pessimistic –Behavioral/statistical: Exact channel modeling using LTI and behavioral models of circuit blocks Tx jitter is a special case that must be handled for better behavioral accuracy

67 66 Design example: 20Gb/s data link “Bonneville” Goals –Achieve highest performance link using 90nm CMOS 20Gb/s target across a desktop channel 10Gb/s target across a server channel –Power < 20mW/Gb/s –Small area (300um by 300um for Rx and Tx) –Forwarded and embedded clock architectures

68 67 chipset CPU socket μP/CS Clean BP -80dB -60dB -40dB 0dB 0GHz5GHz15GHz -20dB 10GHz Channel Insertion Loss 7” FR4 microstrip Microprocessor/Chipset: Non-interleaved routing FEXT only Tx Pad cap=0.4pF Rx Pad cap=0.1pF microstrip Sockets on Tx

69 68 Channel loss and Equalization Channel loss distorts and attenuates signal Develop low loss materials Compensate for channel distortion- Equalization –Transmitter pre-emphasis –Receiver linear equalizer –Decision Feedback Equalizer Channel Response Vs. Frequency Non-Equalized Equalized 0123456789 Frequency (GHz) -50 -40 -30 -20 -10 0 10 20 30 40 50 Received Magnitude (dB) Channel Equalizer Equalized Channel Response Targeted Filter (Equalizer) Response

70 69 Equalization overview – Rx DFE  +  × c4c4 × c3c3 × c2c2 × c1c1 + + _ Non-linear –DFE Linear –Continuous-time Transversal Filter High-pass –passive –active »capacitive degeneration »L peaking –Discrete-time Rx ADC & FIR Rx analog FIR Tx pre-emphasis

71 70 Δ Δ DAC Data C -1 [5:0] C 0 [5:0] C 1 [5:0] 6 Non-linear –DFE Linear –Continuous-time Transversal Filter High-pass –passive –active »capacitive degeneration »L peaking –Discrete-time Rx ADC & FIR Rx analog FIR Tx pre-emphasis Equalization overview – Tx Preemphasis

72 71 bias Non-linear –DFE Linear –Continuous-time Transversal Filter High-pass –passive –active »capacitive degeneration »L peaking –Discrete-time Rx ADC & FIR Rx analog FIR Tx pre-emphasis Equalization overview – CTLE

73 72 20Gb/s 10Gb/s 30Gb/s Tx FIR taps DFE taps DFE tap start 1 23 4 5 6 482482 483483 484484 4 128 4141 4242 4848 4 16 4 32 4 64 4444 μP/CS 1 st order CTLE No CTLE Measured data using similar assumptions

74 73 Bonneville architecture RXTX 5GHz clock 20Gb/s Phase gen. 4-tap LE (pre- emphasis) 2 nd -order CTLE Measurement results: –20Gb/s across uP channel –15Gb/s across server channel –12mW/Gb/s power efficiency –Measured data rate matched link modeling results within 10%

75 74 Data link measurements Some data link blocks are straightforward to characterize with external measurement equipment –Examples: Data Tx (50Ω) DC currents and voltages (averaging) Recovered data (after sampling), e.g. Bit error rate tester (BERT) Other measurements are extremely difficult to perform with external measurement equipment –Examples: Clock jitter (>5GHz), especially high-frequency jitter Sampled data eye Data Rx sensitivity Built-in self test (BIST) and self-calibration is required for high-volume testing of data links –Examples: Automatic clock-data deskew Adaptive equalization On-die measurement capability nearly essential in multi-Gb/s data links –Closes the loop for link design –Enables BIST and calibration

76 75 Bonneville on-die measurement RXTX 5GHz clock 20Gb/s Phase gen. 4-tap LE (pre- emphasis) 2 nd -order CTLE

77 76 Bonneville on-die measurement RXTX 5GHz clock 20Gb/s Phase gen. 4-tap LE (pre- emphasis) 2 nd -order CTLE Error counter Offset Test Logic

78 77 On-die scope test capabilities 3 modes Characterize circuits

79 78 On-die scope test capabilities 3 modes Characterize circuits Waveform capture

80 79 On-die scope test capabilities 3 modes Characterize circuits BER eye diagrams Waveform capture

81 80 RX Input-referred Noise RX PDF Counter Test Control Offset ctrl Measurement: –Sweep calibrated digital offset to generate CDF, counting 1’s and 0’s –Generate noise CDF/PDF for Rx + - V test (DC)

82 81 RX Input-referred Noise RX PDF Counter Test Control Offset ctrl + - V test (DC) V test (DC) + noise All 1’s All 0’s Offset [V]

83 82 RX Input-referred Noise RX PDF Counter Test Control Offset ctrl + - V test (DC) Prob{‘1’} (CDF) Offset [V] V test (DC) + noise All 1’s All 0’s Offset [V]

84 83 IR Noise PDF RX Input-referred Noise RX PDF Counter Test Control Offset ctrl + - V test (DC) Offset [V] All 1’s All 0’s V test (DC) + noise Offset [V]

85 84 Rx input noise (no PSN, no offset)  noise 1.3mV Det. noise 0mVp-p

86 85 Rx input noise (200MHz PSN, no offset)  noise 1.1mV Det. noise ~1mVp-p

87 86 Rx input noise (200MHz PSN, 85mV offset)  noise 1mV Det. noise 16mVp-p

88 87 Rx PSRR (200MHz) Noise floor

89 88 Sample periodic signal: –Voltage: Eqivalent- time A/D using comparator offset On-die waveform capture

90 89 On-die waveform capture Sample periodic signal: –Voltage: Eqivalent- time A/D using comparator offset –Time: Equivalient time A/D using interpolator offset

91 90 On-die waveform capture Sample periodic signal: –Voltage: Eqivalent- time A/D using comparator offset –Time: Equivalient time A/D using interpolator offset

92 91 On-die waveform capture Sample periodic signal: –Voltage: Eqivalent- time A/D using comparator offset –Time: Equivalient time A/D using interpolator offset

93 92 Wave capture, Rx eq Tx Rx Eq Rx

94 93 Wave capture, Tx eq Tx Eq Tx Rx

95 94 Wave capture, Tx+Rx eq Tx Eq Tx Rx Eq Rx

96 95 BER eye diagram Pass Fail Characterize BER at various sampling points: –Voltage: Vary comparator offset –Time: Vary interpolator offset

97 96 BER eye diagram Pass Fail Characterize BER at various sampling points: –Voltage: Vary comparator offset –Time: Vary interpolator offset

98 97 BER eye diagram # errors Characterize BER at various sampling points: –Voltage: Vary comparator offset –Time: Vary interpolator offset

99 98 Rx Equalization (CTLE) Datarate17.5Gb/s Channel7” Desktop Tx Rx Eq Rx

100 99 Tx Equalization (Pre-emphasis) Datarate17.5Gb/s Channel7” Desktop Tx Eq Tx Rx

101 100 Tx + Rx Equalization Datarate17.5Gb/s Channel7” Desktop Tx Eq Tx Rx Eq Rx

102 101 Tx + Rx Equalization, no Rx offset trim Datarate17.5Gb/s Channel7” Desktop Datarate17.5Gb/s Channel7” Desktop Tx Eq Tx Rx Eq Rx

103 102 Tx + Rx Eq, 10% Tx PSN @ 200MHz Datarate17.5Gb/s Channel7” Desktop Tx Eq Tx Rx Eq Rx

104 103 Tx + Rx Eq, 10% Rx PSN @ 200MHz Datarate17.5Gb/s Channel7” Desktop Tx Eq Tx Rx Eq Rx

105 104 Measurement summary On-die link measurements close the design loop and enable link self test and adaptation –Example: BER eye On-die measurements can add significantly less noise than off- die measurements –Example: Clock-data jitter measurement However, calibration of the on-die circuits is still required for absolute accuracy –Examples: Voltage offsets, phase interpolators In some cases, such as were averaging is possible, off-die measurements are still very useful.

106 105 Overall summary Tera-scale many-core μP’s will drive aggregate I/O rates aggressively –Power budget will constrain link design space Power efficiency depends strongly on process technology and channel High-performance and low-power link design requires accurate system level tools –Tools are in place with areas for improvement On-die link measurement capabilities close design loop and enable link self-test and adaptation Acknowledgements:  Ganesh Balamurugan, James Jaussi, Joe Kennedy, Mozhgan Mansuri, Randy Mooney, Shekhar Borkar


Download ppt "Advanced Analysis, Design, and Measurement Techniques for Multi-Gb/s Data Links Frank O’Mahony Bryan Casper Circuit Research."

Similar presentations


Ads by Google