Presentation on theme: "6 February 2014 www.c2s2.org Design at the Top of the Semiconductor Foodchain: How Manufacturing Challenges Below 90nm Impact Circuits & Systems Rob A."— Presentation transcript:
6 February Design at the Top of the Semiconductor Foodchain: How Manufacturing Challenges Below 90nm Impact Circuits & Systems Rob A. Rutenbar Director, MARCO Center for Circuit & System Solutions Professor of ECE, Carnegie Mellon
6 February 2014 Slide 2 2 s 2.org 2 s 2.org About This Talk Some background on the MARCO FCRP program What all these acronyms are… …and why you might want to know about them A brief look at work in the C2S2 Circuits Center How mfg challenges at highly scaled nodes percolate up to the foodchain What causes circuit designers nightmares, and what were doing about it.
6 February 2014 Slide 3 2 s 2.org 2 s 2.org C2S2, MARCO, FCRP, etc: A Little Bit of Background and Context
6 February 2014 Slide 4 2 s 2.org 2 s 2.org The focus center program is designed to create a nationwide multi-university network of research centers that will keep the United States and U.S. semiconductor firms at the front of the global microelectronics revolution. FCRP: Focus Center Research Program Vision: National research centers in semiconductor technology Multiple-university teams & large-scale efforts (~$10M/center/year) Long-range research horizon Focus on discovery : where evolutionary R&D may not find solutions Craig R. Barrett President and CEO, Intel; (Former) Chair, Semiconductor Technology Council
6 February 2014 Slide 5 2 s 2.org 2 s 2.org Focus Center Research Program: Timeline First centers chartered in 1999; five centers in operation today Total program currently ~$25M/year Aug recompete restart Systems Interconnect Circuits Devices Materials
6 February 2014 Slide 6 2 s 2.org 2 s 2.org MARCO: Microelectronics Advanced Research Corp. MARCO coordinates FCRP Centers, funding, industry/govt interfaces Centers Funding MARCO Governing Council Management Universities... ~30 schools US DOD
6 February 2014 Slide 7 2 s 2.org 2 s 2.org systemsstructuresmaterialsphysics FCRP Centers: Designed To Target Entire Semiconductor Foodchain physics structures materials devices circuits logic / architecture system software application HW/SW integrated products Pushing CMOS to its limitsand beyond Containing the growing cost of complexity Driving down cost of design & verification Containing latency & power of interconnect Overcoming the tyranny of KT/q
6 February 2014 Slide 8 2 s 2.org 2 s 2.org C2S2: Center for Circuit & System Solutions C2S2 core competency: Circuits Technology scaling impacts Analog, digital, RF, MEMS ckts Some photonics, too Assoc. design tools & methodologies Logistics CMU is lead school Now 12 universities ~47** faculty, 67 grad students devices circuits logic / architecture
6 February 2014 Slide 9 2 s 2.org 2 s 2.org The C2S2 Research Team Executive team Research team Rob Rutenbar CMU, Director Bob Brodersen Berkeley Mark Horowitz Stanford Wen-Mei Hwu Illinois Larry Pileggi CMU Teresa Meng Stanford Charles Sodini MIT Art Davidson CMU, Exec Dir
6 February 2014 Slide s 2.org 2 s 2.org devices circuits logic / architecture C2S2: Doing Circuits in Highly Scaled Technologies How will we do circuit design with tomorrows different, difficult devices? Coping with scaling
6 February 2014 Slide s 2.org 2 s 2.org …and, how do we deal with the conscientious objectors…?
6 February 2014 Slide s 2.org 2 s 2.org C2S2: Doing Reluctant Circuits at Scaled Nodes How will we approach circuits that dont want to scale? Circuits that preferfor $$, or for performancea different technology platform? 10nm -+-+ devices circuits logic / architecture (nm) V V V V V digital analog range ITRS-03 V supply
6 February 2014 Slide s 2.org 2 s 2.org About This Talk Some background on the MARCO FCRP program What all these acronyms are… …and why you might want to know about them A brief look at work in the C2S2 Circuits Center How mfg challenges at highly scaled nodes percolate up to the foodchain What causes circuit designers nightmares, and what were doing about it.
6 February 2014 Slide s 2.org 2 s 2.org Two Different Scaling-Related Problems What about delay ? Past expectation Next process node is faster We rely on this for new designs New problems Yes, its faster… Chip-scale, size of logic hurts Speed of light is a big limiter What about mfg variability ? Past expectation Next process node is worse …but were smart, well manage New problems Its a lot worse than the last node Cannot pretend its deterministic Cannot just look at a few corners
6 February 2014 Slide s 2.org 2 s 2.org View from the Top – Circuits & Systems Fundamental circuits Basic blocks Architectures Materials & structures Systems Devices & wires Whats happening with circuits & systems with CAD & methodology to help design in scaled technologies?
6 February 2014 Slide s 2.org 2 s 2.org Lets Look at Delay … Unfortunate fact: despite a century of physics funding … c has not budged not even 1 m/s ! ~6 ps
6 February 2014 Slide s 2.org 2 s 2.org Delay: Two Different Flavors for Wires Global wires ~ constant length Local wires ~ constant complexity, span constant # gates scale Local wires Get shorter with scaling Global wires Dont get shorter with scaling (thats why theyre global …)
6 February 2014 Slide s 2.org 2 s 2.org Delay: Global Wire Trends Optimally buffered global wires that span 5mm (roughly ¼ die) 30x-40x delay penalty over nine process generations Cannot contract global communications (they're global …) Mid-layer metals 30x delay increase Upper-layer metals 40x delay increase Courtesy Mark Horowitz & Ron Ho, Stanford
6 February 2014 Slide s 2.org 2 s 2.org Consequence: More MHz Not Necessarily Better Anymore This is SpecInt/MHz, a measure of CPU performance / clock speed Curve is flattening, more MHz isnt paying off anymore… Courtesy Mark Horowitz, Stanford
6 February 2014 Slide s 2.org 2 s 2.org Consequence: 1 Chip 1 CPU Going Forward Recent big Intel news: No more single-core CPUs Two high-profile designs abruptly canceled Future is multiple CPUs on a single chip Intel Corp. has cancelled its single-core processor development efforts… and will move to dual-core designs across the mobile, desktop, and server markets…
6 February 2014 Slide s 2.org 2 s 2.org Idea: More CPUs Instead of More MHz CPU Global wires Bad even with buffering or multi-cycle delays Local wires Still OK, still manageable scale Make most of the wires local Can still clock a small CPU fast Use parallelism smarter CPU Memory
6 February 2014 Slide s 2.org 2 s 2.org Next Problem: Manufacturing Variability
6 February 2014 Slide s 2.org 2 s 2.org Manufacturing Variability: A Little History Historically, how have we coped? By hiding as much as possible Behind logic/memory libraries Behind circuit and shapes rules Behind design methodologies Library abstractions Qualification & characterization Design rules…
6 February 2014 Slide s 2.org 2 s 2.org At Nanoscale: Predictability (Chip Variability) -1 ASIC library abstraction broken : doesnt hide the details anymore as we scale below ~65nm Local printability problems Cu thickness distrib Cu thickness histogram Global effects Demise of context-free layout design rules Correlated random variations hit ckt level
6 February 2014 Slide s 2.org 2 s 2.org So, How Do We Cope With Growing Variability? Three broad kinds of solutions Model it accurately, manage it early, inside CAD tools Minimize it aggressively, via smarter ASIC chip architectures Measure it on the fly, calibrate for itlike analog has had to do
6 February 2014 Slide s 2.org 2 s 2.org Model It : Pulling Statistics Up into CAD Statistical interconnect delay analysis R, L, C parameters are statistical Based on mfg variations in BEOL fab R, L, C parameters are correlated Correlations are both local and global Want distribution of delay at outputs Statistical static timing analysis Gate delays are statistical Signal arrival times are statistical Gates and signals are correlated Correlations are both local and global Want distribution of delay at output
6 February 2014 Slide s 2.org 2 s 2.org Key Ideas: Direct Manipulation of the Statistics Wrong way: Monte Carlo trials with existing CAD analysis tools Cannot afford time to run 1000s of randomly parameterized samples Right way: pull the statistics up directly into the analysis engines Represent key circuit/interconnect quantities in a statistical form 1 2 N / N
6 February 2014 Slide s 2.org 2 s 2.org Example: Interval-Valued Interconnect Models Each parameter is a range, not a scalar now 1. Represent all statistical quantities as correlated intervals zeros poles 2. Recast the numerical recipes for linear model order reduction to use intervals instead of scalars Delay % 4. Transform back to delay distrib Root loci histograms Pole/zero complex plane 3. Result is interval poles/zeros
6 February 2014 Slide s 2.org 2 s 2.org Ex: 123-elem RLC Wire, 5%-Global 30%-Local Variation Courtesy James D. Ma, CMU 8 th order reduction, 4 dominant poles Monte Carlo results 8 th order reduction, 4 dominant poles Interval-valued predictions Plots show perspective view of complex plane (bottom & right axes) with pole histograms shown shaded blue (left axis, 10,000 interconnect samples)
6 February 2014 Slide s 2.org 2 s 2.org Delay PDF and CDF of the Same Example Very early research resultbut accuracy is promising, and speedup is currently ~20X over simplistic Monte Carlo Courtesy James D. Ma, CMU PDF of Interconnect DelayCDF of Interconnect Delay Full Monte Carlo Interval-valued predictor Full Monte Carlo
6 February 2014 Slide s 2.org 2 s 2.org Minimize It Attack: Make Variability Small … Starting from basic mfg processes, from shapes-level layout, thru circuits, thru logic, thru interconnect arch: extremely regular Tries to minimize impact of short, medium, long range mfg variations Of course, it also breaks all our design tools and flows, too… Or, maybe this pattern…? A regular fabric for tomorrowYesterdays designs
6 February 2014 Slide s 2.org 2 s 2.org Example: CMU VPGA Architecture Via Patterned Gate Array Uses only 4 masks to define total application-specific interconnect Logic tiles, and the interconnect, are totally regularized chip-wide Like gate array but better, and informed by ~20 years of FPGAs Example: Replace FPGA switchblock of devices with 8 mask-config vias Goal Minimize variations (eg, CMP) at all length scales on chip Make logic and interconnect very predictable for designers Cu thickness distribution
6 February 2014 Slide s 2.org 2 s 2.org Example: Manufacturability of VPGA BEOL Reduced CMP effects Copper dishing < 40Å Post-CMP Copper thickness variation is less than 2-3% Highly promising as a manufacturable ASIC replacement structure M4 Density of CMU VPGA FPU Courtesy Duane Boning (MIT) & Larry Pileggi (CMU) Cu Dishing (M4)Final Post-CMP Cu Thickness (M4) Plated Thickness (M4)Oxide Erosion (M4)
6 February 2014 Slide s 2.org 2 s 2.org Measure It : Circuits to Measure & Adapt With scaling, not only are transistors getting worse, … but the neighborhoods they live in getting noisier What we worry about Behavior of chip-scale interconnects like clock and power distribution Ability to predict worst-case behavior for robust design Ability to understand data-driven noise problems, and to reduce them Big idea Some things you can just design for up front Increasingly, we may need to add circuits that measure & adapt on the fly
6 February 2014 Slide s 2.org 2 s 2.org Ex: Supply Noise Measurement Circuits To measure autocorrelation, just need 2 samplers with fine timing control. Sampling switches are only component required to have high bandwidth. High-resolution, on-chip ADCs to minimize additional noise and allow measurement circuits to hook up to scan chain. VCO-based ADC VCO acts as V-to- f, clock edge count gives digital estimate of f. Averaging improves noise tolerance. Calibration relaxes linearity and offset requirements.
6 February 2014 Slide s 2.org 2 s 2.org Result: 10Gb/s Rambus Link Measurement Rambus 0.13 design Demonstration of concept Noise floor < 300 V rms Measured V dd and V dd Analog Measurements verify cyclostationarity: 1GHz noise at t 2 – but not at t 1 ! Link runs at 1GHz for this data-rate; high link activity at t 2, relatively quiet at t 1. Noise injected from ASIC core PSD(t 1 ) PSD(t 2 ) This is still 130nmbut we think idea holds as we scale aggressively We also put some of these ckts on a next-gen Itanium chip (Courtesy E. Alon, V. Stojanovic, Mark Horowitz, Stanford)
6 February 2014 Slide s 2.org 2 s 2.org Analog Too: Calibrate & Adapt (…or Die ) Example: Massively parallel ADCs Thousands of small ADCs DSP combines data adaptively Ignore (give low weight) to faulty ckts Idea: Massive time-interleaving Relaxes speed reqt of each path Allows device bias in the optimum power efficiency/gain region… …at the knee of weak inversion. In design: 12b 600 MS/s self-calibrated ADC in 0.18 m, 128 channels – 100mW Next: 8b 20GS/s with 1000 channels Looks promising as a scalable ADC architecture for below 90nm (Courtesy H-S Lee, MIT)
6 February 2014 Slide s 2.org 2 s 2.org Summary: FCRP Innovating Across Whole Foodchain Exotic interconnect Novel devices Radical architectures Radical new ckts / tools
6 February 2014 Slide s 2.org 2 s 2.org A Lot More Work To Do At Top of Foodchain (No shortage of problems, even up here, in the clouds) New system architectures For CPUs and for ASICs, to overcome wire delay & mfg variation limitations New circuits & design methodologies CAD tools that understand and optimize statistical models of interconnect Circuits that measure interconnect problems and try to adapt to them
6 February 2014 Slide s 2.org 2 s 2.org Acknowledgements Many participants in the MARCO Focus Center for Circuit & System Solutions (C2S2) provided material for this talk I want to acknowledge them here More info on all these projects at Carnegie Mellon Prof. Larry Pileggi Prof. Andrzej Strojwas James D. Ma MIT Prof. Duane Boning Prof. H.-S. Harry Lee Stanford Prof. Mark Horowitz E. Alon Ron Ho V. Stojanovic