High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2 1Dept. of ECE, 2Dept. of CSE, UC San Diego, La Jolla, CA 19th Conference on Electrical Performance of Electronic Packaging and Systems Oct 25, 2010 Austin, USA
Outline Introduction Equalized On-Chip Global Link Overall structure Basic working principle Driver Design for On-Chip Transmission-Line Guideline for tapered CML driver Driver design example Continuous-Time Linear Equalizer (CTLE) Design CTLE modeling CTLE design example Driver-Receiver Co-Design for Low Energy per Bit Methodology Overall link design example Conclusion
Research Motivation Global interconnect planning becomes a challenge in ultra-deep sub-macron (UDSM) process Performance gap between global wire and logic gates Conventional buffer insertion brings in larger extra power overhead Uninterrupted wire configurations are used to tackle the on-chip global communication issues On-chip T-lines to reduce interconnect power Equalization to improve the bandwidth State-of-the-art[Kim2009] 2Gb/s/um, < 1pJ/b, signaling over 10mm global wire in 90nm
Our Contributions Contributions Results of our design Build up a novel equalized on-chip T-line structure for global communication Tapered CML driver + CTLE receiver Accurate small-signal modeling on CTLE receiver to improve the optimization quality A design methodology to achieve driver-wire-receiver co-optimization to reduce the total energy per bit Results of our design 20Gbps signaling over 10mm, 2.2um-pitch on-chip T-line 11ps/mm latency and 0.2pJ/b energy per bit in 45nm
Equalized On-Chip Global Link Overall structure Tapered current-mode logic (CML) drivers Terminated differential on-chip T-line Continuous-time linear equalizer (CTLE) receiver Sense-amplifier based latch
Basic Working Principle Tapered CML Driver Provide low-swing differential signals to driver T-line Tapered factor u, number of stages N, fan-out X, final stage current ISS, driver resistance RS T-line Differential wire w/ P/G shielding Geometries (width, pitch) and termination resistance RT CTLE Receiver Recover signal and improve eye-quality Load resistance RL, source degeneration resistance RD and capacitance CD, over-drive voltage Vod. Sense-amplifier based latch Synchronize and convert signal back to digital level
Tapered CML Driver Design Output swing constraint Design guideline [Tsuchiya2006, Heydari2004] Begin from the final stage For given VSW, output resistance RS optimized with RT to increase eye-opening Transistor size Tapered factor u = 2.7 for delay reduction Number of stages Each previous stage is designed backward by scaling with the factor u Need to design: Output resistance RS Tail current ISS Size of transistors W
CML Driver Study w/ Loaded T-line Assume 45nm 1P11M CMOS T-line built on M9 with M1 as reference T = 1.2um, H = 3.5um (fixed) Optimize W and S for eye-opening Change of the eye-opening with width for fixed 2um pitch Change of the eye-opening with pitch for equal width/spacing
CML Driver Design Example Experimental observations Optimal eye happens when width=spacing Eye-opening improves with larger pitch Design methodology Choose the minimum pitch that satisfied the wire-end eye-opening requirement Design example
Accurate CTLE Modeling Design Variables: RL, RD, CD, Vod(Size) [Hanumolu2005] Small Signal Circuit to derive H(s):
CTLE Modeling Validation <10% correlation error >20% eye-opening increase Test case:10mm, 16mV-eye@wire-end Blue lines: simple modeling, not consider rds and parasitics Red line: only consider rds Black line: the proposed accurate model
CTLE Design Example Observations of CTLE study Design example Eye-opening improves with relaxed power constraints but tends to be saturated Design example Based on the pre-optimized CML driver + T-line design Eye-opening improved by 4X after CTLE
Driver-Receiver Co-Design Methodology Optimize driver-wire-receiver together by setting Veye/Power as the cost function Choose pre-designed CML/T-line/CTLE as initial solution Optimization Flow Driver-to-receiver step-response generation based on SPICE simulation and CTLE modeling Eye-opening estimation based on step-response SQP-based non-linear optimization Variables: [ISS,RT,RL,RD,CD,Vod] Performance Comparison Option A:Driver/Receiver independent design Option B:Low-power driver/receiver co-design
Low Energy-per-Bit Optimization Flow Pre-designed CML driver Pre-designed CTLE receiver Driver-Receiver Co-Design Initial Solution Change variables [ISS,RT,RL,RD,CD,Vod] Cost-Function Veye/Power Co-Design Cost Function Estimation SPICE generated T-line step response Receiver Step-Response using CTLE modeling Step-Response Based Eye Estimation Internal SQP (Sequential Quadratic Optimization) routine to generate best solution Best set of design variables in terms of overall energy-per-bit
Simulated Eye Diagrams Methodology A: driver/receiver separate design Methodology B: driver/receiver co-design for low-power
Summary of Performance Comparison Methodology A driver/receiver separate design Methodology B driver/receiver co-design for low-power RS/ohm 47 148 RT/ohm 94 1100 RL/ohm 440 890 RD/ohm 110 1430 CD/fF 680 150 Vod/mV 60 58 Eye-Opening@CTLE/mV 91 113 Power Consumption/mW 8.1 3.8 Note: driver/receiver co-design methodology uses much larger driver/termination resistance to reduce power, but will close the eye-opening at the driver output and wire-end. Final eye is recovered by fully utilizing CTLE.
Conclusion We propose a novel equalized on-chip global link using CML driver and CTLE receiver Accurate modeling for CTLE is provided to achieve <10% correlation error and will improve eye-opening optimization quality Our design achieves 20Gbps signaling over 10mm, 2.2um-pitch on-chip T-line 11ps/mm latency and 0.2pJ/b energy
Thank You! Q & A