A High-Speed & High-Capacity Single-Chip Copper Crossbar John Damiano, Bruce Duewer, Alan Glaser, Toby Schaffer, John Wilson, and Paul Franzon North Carolina.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

RAM (RANDOM ACCESS MEMORY)
COEN 180 SRAM. High-speed Low capacity Expensive Large chip area. Continuous power use to maintain storage Technology used for making MM caches.
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 5 Programmable.
1 Cleared for Open Publication July 30, S-2144 P148/MAPLD 2004 Rea MAPLD 148:"Is Scaling the Correct Approach for Radiation Hardened Conversions.
Module 12.  In Module 9, 10, 11, you have been introduced to examples of combinational logic circuits whereby the outputs are entirely dependent on the.
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
CHALLENGES IN EMBEDDED MEMORY DESIGN AND TEST History and Trends In Embedded System Memory.
Minimizing Clock Skew in FPGAs
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
1 Chapter 6 Low-Noise Design Methodology. 2 Low-noise design from the system designer’s viewpoint is concerned with the following problem: Given a sensor.
1 Lecture 16B Memories. 2 Memories in General Computers have mostly RAM ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
Overview Memory definitions Random Access Memory (RAM)
S. RossEECS 40 Spring 2003 Lecture 28 Today… Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of.
04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)
1 Lecture 16B Memories. 2 Memories in General RAM - the predominant memory ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
Electronic Counters.
Logic Device and Memory. Tri-state Devices Tri-state logic devices have three states: logic 1, logic 0, and high impedance. A tri-state device has three.
2013 DAC Designer/User Track Presentation Inductor Design for Global Resonant Clock Distribution in a 28-nm CMOS Processor Visvesh Sathe 3, Padelis Papadopoulos.
Computer performance.
Khaled A. Al-Utaibi Memory Devices Khaled A. Al-Utaibi
CSET 4650 Field Programmable Logic Devices
Memory interface Memory is a device to store data
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Circuit design for FPGAs: –Logic elements. –Interconnect.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Power Reduction for FPGA using Multiple Vdd/Vth
Device Physics – Transistor Integrated Circuit
The 8253 Programmable Interval Timer
CS1Q Computer Systems Lecture 11 Simon Gay. Lecture 11CS1Q Computer Systems - Simon Gay2 The D FlipFlop A 1-bit register is called a D flipflop. When.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
Memory and Storage Dr. Rebhi S. Baraka
Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.
Houman Homayoun, Sudeep Pasricha, Mohammad Makhzan, Alex Veidenbaum Center for Embedded Computer Systems, University of California, Irvine,
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
A High-Speed & High-Capacity Single-Chip Copper Crossbar John Damiano, Bruce Duewer, Alan Glaser, Toby Schaffer,John Wilson, and Paul Franzon North Carolina.
Bi-CMOS Prakash B.
Chapter 4: Secs ; Chapter 5: pp
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 8 – Memory Basics Logic and Computer Design.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 22 Memory Definitions Memory ─ A collection of storage cells together with the necessary.
EE Electronics Circuit Design Digital Logic Gates 14.2nMOS Logic Families 14.3Dynamic MOS Logic Families 14.4CMOS Logic Families 14.5TTL Logic.
EE121 John Wakerly Lecture #15
A High-Speed & High-Capacity Single-Chip Copper Crossbar John Damiano, Bruce Duewer, Alan Glaser, Toby Schaffer,John Wilson, and Paul Franzon North Carolina.
07/11/2005 Register File Design and Memory Design Presentation E CSE : Introduction to Computer Architecture Slides by Gojko Babić.
Memory 2 ©Paul Godin Created March 2008 Memory 2.1.
THEME 6: Frequency dividers. Digital counters with reduced counting modulus. Programmable digital counters. If the input pulses are more than K, the counter.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Logic Families There are several different families of logic gates. Each family has its capabilities and limitations, its advantages and disadvantages.
Memories.
REGISTER TRANSFER LANGUAGE (RTL)
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
A High-Speed and High-Capacity Single-Chip Copper Crossbar
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
EI205 Lecture 15 Dianguang Ma Fall 2008.
Architecture & Organization 1
SIDDAGANGA INSTITUTE OF TECHNOLOGY
8255.
Architecture & Organization 1
Topics Circuit design for FPGAs: Logic elements. Interconnect.
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Unit -4 Introduction to Embedded Systems Tuesday.
Presentation transcript:

A High-Speed & High-Capacity Single-Chip Copper Crossbar John Damiano, Bruce Duewer, Alan Glaser, Toby Schaffer, John Wilson, and Paul Franzon North Carolina State University Crossbars consist of numerous input and output lines and, upon programming, can provide for the arbitrary and simultaneous connection of any input to any output. The crossbar is an essential part of many circuits requiring multi- channel signal switching, such as ATM switches, specialized VLIW video signal processors, and many DSPs. Why a Crossbar for the Copper Challenge? The need for high-speed switching technology is growing as designs grow faster, especially with the advent of SOC technology. Crossbar circuits inherently contain long, heavily-loaded interconnects and are therefore representative of a family of designs, such as SRAM, DRAM, and logic cache memory. The crossbar is simple and efficient enough to directly demonstrate the advantages offered by advanced interconnect The crossbar circuit demonstrates that the use of copper interconnect provides strong performance enhancements in a state-of-the-art circuit, while design trade-offs make copper technology attractive for embedded applications. Speed and latency are improved through the use of copper interconnect. We have also demonstrated that copper interconnect can allow for the use of a smaller crossbar cell, offering higher performance with a substantially smaller die size. Features such as high performance and smaller die size make copper technology particularly attractive for future design solutions. Cell Design Interconnect Strategy Full Report available on the web at Features of the Copper Crossbar Efficient programming - fully programmable using input / write enable lines, programming performed column-by-column. It should be noted that this crossbar design is non-blocking, i.e. any input can be sent to any output, and the crossbar can operate in broadcast mode. Reset - instantly writes a "0" to all cells & clears all outputs. All output lines remain low until programmed. Reset is performed prior to re- programming or pre-configuration. Pre-configure - allows instant programming for any of several common I/O configurations within a single write cycle (<3ns). Our circuit features two built-in configurations: corner turn and broadcast mode (illustrated at right). Pre-configured input/ output mappings improve testability and can be modified to fit the needs of a specific application. Pre-Configurations CopperAluminum Simulation Results The crossbar cell is designed around a latch used to store a memory bit, as shown at left. An I/O connection is created by writing a '1' to a single memory bit within each output column. Each memory bit is written by holding the selected input 'high' while strobing the chosen output line's write_enable line, as shown at right. The stored memory bit is used as one input to a 2- input AND gate and therefore determines which input is passed to the output line. A The copper crossbar (left) functions for a square-wave input signals with f=2.67GHz while the aluminum crossbar (right) functions up to f=2.0GHz. Use of copper interconnect improved the maximum data rate more than 30%, from 4.0Gb/s to 5.33Gb/s. Total delay through the the copper crossbar (left) is 370ps vs. 425ps for the aluminum crossbar (right) - a reduction of 15%. Delays through the input lines, the crossbar cell, and the OR tree were all lower using copper interconnect. Above and Below: Tracing 2.0GHz signals through the copper (left) and aluminum (right) crossbar. Signal integrity is improved for the circuit using copper interconnect, with better OR tree performance the most notable feature. These performance advantages are the result of copper interconnect’s lower capacitive load while maintaining low resistance. stored value of '0' holds the cell output ‘ low’, while a stored value of '1' passes the input to the cell output. All crossbar cell AND outputs are combined for a given output column through an OR tree, and the output of each tree constitutes a single output line. cell schematic Crossbar programming The lower resistivity and electromigration properties of copper allow for interconnect scaling (and improved performance) while maintaining a high current density in narrow lines. An added benefit exists for embedded applications. Achieving the performance benefits of copper using aluminum interconnect for an arrayed circuit would require some or all of the following: (1) use of larger drivers within the crossbar cell; (2) use of wider interconnect to ensure that reliability specs are met; (3) increasing cell size to reduce coupling capacitance between I/O lines. Modifications required to achieve equivalent performance for aluminum interconnect would increase cell size substantially. Modifying the crossbar cell to make RC equivalent to the copper cell would require increase cell (and circuit) area by 64%. Moreover, these figures do not include further changes to aluminum linewidth required to compensate for the larger cell size and subsequently longer interconnects. The impact of this advantage alone - achieving significant die size reduction while improving performance, as demonstrated by the copper crossbar - cannot be overestimated for SOC or embedded applications. This factor aligns copper process technology with the design technology of the future. Our interconnect strategy is illustrated at right - M3 and M5 layers are of particular interest. M3 pitch is as large as possible given the number of output interconnects required and the cell size Interconnect for the final gates in the OR tree are the longest and therefore present heavy loads. The single input line per cell was placed on M5 to minimize their resistance. Interdigitated ground lines shield the signal lines, reducing the likelihood of crosstalk and delay problems introduced by self-inductance. The capacitive load on M3 and M5 interconnect - the input and output stages respectively - limits the maximum input signal frequency and has the most impact on its performance, especially for the M3 lines where line capacitance dominates load capacitance. Cell size advantages