ECE 679: Digital Systems Engineering Patrick Chiang Office Hours: 1-2PM Mon-Thurs GLSN 100
Class Introductions Who am I Who are you
Class Basics Class basics Guest lecture (Dr. Frank O’Mahony) 4 Homeworks (%20) (groups of 2) Midterm (%40) Final Project (%40) 4-page IEEE report 10 minute presentation (groups of 2) Guest lecture (Dr. Frank O’Mahony) Intel Research Labs (May 4th) Intel Field Trip (June 7th) TBD Presentations of 1-2 best project reports
Class Homework Homework Skim Dally/Poulton “Digital Systems Engineering” Chapter 3 Skim Overview Paper: http://mos.stanford.edu/papers/mh_micro_98.pdf Includes running Stat Eye Oregon State Matlab (eecs.oregonstate.edu/it) www.stateye.org Problem Set #1 rlc files -- ~pchiang/hspice (rlc_spice_deck; rlc.rlc) Spice models -- ~pchiang/hspice/process_files/ 130nm to 22nm Simulator lang = spice Spectre models – DEFINE gpdk090 /nfs/guille/analog/c/cdsmgr/process/gpdk090_v3.8/libs.cdb/gpdk090
What does this mean for analog designers? Ever build an ADC? Ever wonder what to do with the digital bits? 8-16 bits @ 100MHz, 200MHz, 400MHz Goes to Vector analyzer Analog Why does this clock rate not increase? What really is this output doing? Where is it going? Fs = 600MHz
Brief Summary Introduction to the area Why serial links are important What are the current technology trends/limitations
4Gb/s Low Power, Area Efficient Serial Links IBM Processor CPU M e m o r y From/to other subsystems (e.g. backplane) High-speed I/Os Interconnection between different chips Transmitter Equalization Receiver Offset Cancellation 4Gb/s Transmitter Output, 1m Organization of the channel, arrows from channel, plots…change image layout Reall what you want to say on the slides. Transmitter Output Router Backplane(1m, FR4) Receiver Input 2000 0.25um Testchip 2001 0.25um Testchip Ming-Ju E. Lee, William J. Dally, John W. Poulton, Patrick Chiang, Stephen F. Greenwood. An 84-mW 4Gb/s Clock and Data Recovery Circuit for Serial Link Applications. VLSI Circuits Symposium, Kyoto, Japan, June 2001, pp. 149-152. Ming-Ju E. Lee, William Dally, Patrick Chiang. Low-Power Area-Efficient High-Speed I/O Circuit Techniques. IEEE Journal of Solid-State Circuits, November 2000, Vol. 35, No. 11, pp. 1591-1599. 4Gb/s Transmitter Output, Equalized 4Gb/s Transmitter Output
Scaling Serial Links: From 4Gb/s->20Gb/s Thesis: Develop 20Gb/s Serial Link Area: 500um x 500um Power: 200mW/link 1 bit time = 1FO4 Focus on timing uncertainty, not channel…independent vector Timing uncertainty becomes KEY issue t 250ps v 4Gb/s Eye Diagram t 50ps v 20Gb/s Eye Diagram
Transmitter Block Diagram No post-PLL Clock Buffers Dotted lines around different circuit components, PLL, muxing, etc. Clocks are differential clocks. Get rid of everything else, use red. Or change images…lose people on the insight, carry through. Simpler is better
Test Chip UMC 1.2V, 0.13um CMOS(single Vt) Die size 700um x 1.15mm Test Interface 10GHz PLL PRBS Check Test Structures 700um Phase Interpolators RX DLL TX Clock Recovery Transmitter Muxing PRBS Gen Our test chip was fabricated in National Semiconductor’s quarter micron CMOS technology. The die is 2.6 by 1.4 square millimeter and uses a 52-pin impedance controlled package donated by Vitesse Corporation. The active area of the transceiver circuits is 0.31-mm2. 1.1mm UMC 1.2V, 0.13um CMOS(single Vt) Die size 700um x 1.15mm 50 Ohm Pad Termination using Wafer Probes
PLL Measurements Jitter limited by 1.25GHz input reference clock Power Spectrum Open Loop VCO Phase Noise @ 1MHz -97dBc/Hz 10GHz Jitter (RMS) 0.97ps 10GHz Jitter(pk-pk) 8.0ps PLL Power 38.6mW VCO Power 6mW Tuning Range 1.14-1.31 Change the cadence of talking…these are the important points. Too much stuff in slides, too heavy…line width, is 2-3 points. Q=10 Jitter Q=5 Jitter (c) Jitter limited by 1.25GHz input reference clock HP 8133A input clock (1.2ps RMS, 8.9ps pk-pk)
Eye Diagram Jitter 2.2ps RMS 15.6ps pk-pk Data Rate = 19.2Gb/s Don’t spend toom uch time on 19.2 Seen here is the phase step values across the entire range. The average phase resolution should by 15.6ps, so the interpolation steps shown are very accurate. Note that every 9nth phase has phase interpolation values lower than the average of 15.6ps, which is what is expected, since these are the redundant steps. You can also see that not “every” 9th phase value is consistently small. For example, phases 18 and 36 don’t show as small of a phase step as phases 9 and 27.The reason for this error is due to a layout error, due to asymmetric clock loading causing different capacitive coupling for different transitions. (Different phase differences due to different delays amounts in the DLL itself) Data Rate = 19.2Gb/s Voltage ripple caused by lack of current source at differential pair tail node
High Speed Transmitter Comparisons A 250mW Full-Rate 10Gb/s Transceiver Core in 90nm CMOS using a Tri-State Binary PD with 100ps Gated Digital Output T. Masuda, et. al., ISSCC 2007. A full-rate 10Gb/s transceiver core employing a tri-state binary PD with 100ps gated digital output is implemented in a 90nm CMOS process. Direct drive from the VCO is utilized to eliminate the 10GHz clock buffer current. The RX exhibits a recovered jitter of 906fs(rms) and an input sensitivity of 5.9mV. The TX generates a jitter of 5mUI(rms). The chip consumes 250mW.
Conventional Serial Link Receivers Pre-Amp In Data 20Gb/s Multiphase PLL D[0] D[1] D[2] D[3] ck[0] ck[1] ck[2] ck[3] Conventional architectures also use multi-phase PLL Static Phase Offset Power Supply Sensitivity Well, guess what…we have same problem at the receiver
2nd Generation Transmitter Equalizing Path Analog delay, but replica bias… 2-Tap Equalizer implemented for compensating for channel losses Achieve 50ps analog delay with CML buffers
Fabrication: Test Chip ST Microelectronics 0.13um test chip 307mW / transceiver 0.46mm^2 20mV input sensitivity 2006 0.13um Test Chip 450um 350um Transmitter 500um 600um Receiver First 0.13um
All Results Single-Ended 80mV 20Gb/s Ideal Channel All Results Single-Ended 43ps 33mV 20Gb/s -6.5dB @ 10GHz 37ps
20Gb/s Ideal Channel with α=0.37 Results (cont’d) 20Gb/s Ideal Channel with α=0.37 72mV 36.4ps 62mV 20Gb/s -6.5dB @ 10GHz with α=0.37 35ps
Rationale for Multi-cores Next generation computing – Multi-core Processing i.e. multiple, parallel DSPs (i.e. MACs) Why we cannot achieve faster frequencies? Wire delays don’t scale like transistors Power increases exponentially (when pushing process technology) Timing margins degraded by Variability Power supply noise Digital crosstalk NOTE: More independent threads require more memory bandwidth Intel, 80 Cores, ISSCC 2007
Research: Explore Parallel Serial Links Serial Links also exhibit the same characteristics Channel losses get worse Power consumption increases significantly with bandwidth Timing precision limited by: Static Phase Offset (process variation) Power-supply Induced Jitter Interchannel Crosstalk Serial Links need to to also push for high amounts of parallelism How is this different than conventional link design? Channel equalization becomes more difficult Adjacent channel crosstalk Difficult channel estimation problem (power, flexibility, data-rate, equalizer design, channel, distance) Amortize Clock Power for Multiple Links Distributed resonant clocking of analog/mixed-signal front-end’s
Problem of IO 2500 pins / 2 = 1200 Differential pins Assume 10Gbs / link = 12 Tb/s Bandwidth 100mW/Gb(bandwidth) = 120W
Stateye Playing Fun with Stat-Eye Homework examples 5Gb/s -> 10Gb/s Worse Channels Worse timing jitter Homework examples
Next Time Telegrapher’s Equation Channel Models Reflection coefficients Channel Models Skin Effect Dielectric constant vias