Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tezzaron Semiconductor Device Summary Fully functional devices demonstrating a wide variety of applications Good yield –90% process, 96% device Demonstrated.

Similar presentations


Presentation on theme: "Tezzaron Semiconductor Device Summary Fully functional devices demonstrating a wide variety of applications Good yield –90% process, 96% device Demonstrated."— Presentation transcript:

1 Tezzaron Semiconductor Device Summary Fully functional devices demonstrating a wide variety of applications Good yield –90% process, 96% device Demonstrated alignment –Repeatable ~0.3micron High interconnect density –10,000 to 170,000 per sqmm Positive thermal cycle testing –>100,000 device cycles –65 to 150C 15 minute soak Good correlation with models and simulations Demonstration of tools Demonstrated faster, lower power, higher density

2 Tezzaron Semiconductor Considerations

3 Tezzaron Semiconductor Wafer to Wafer - Best Fit Memory –DRAM –PCRAM, FERAM, MRAM FPGA Sensors Processors –Short wires –Heat, heat, heat

4 Tezzaron Semiconductor 3D Interconnect Characteristics SuperVia TM Gen IIFace to Face Size 4.0  X 4.0  1.2  X 1.2  1.7  X 1.7  (0.75  X 0.75  Minimum Pitch 6.08  <4  2.4  (1.46  Feedthrough Capacitance 7fF2-3fF<< Series Resistance <0.25  <0.35  <

5 Tezzaron Semiconductor Parameters 10um Z dimension increments –5-15um thickness Low R Moderate C Repair & Redundancy –It’s still per sqmm! Pitch –0.5um limit How many layers? –2 to 5, current horizon

6 Tezzaron Semiconductor HEAT!!! Modeling –What modeling, more data, more testing required What we know…. –32W/sqmm, Structurally sound <5W easy rules –~15W/100sqmm cliff –>150W possible –>500W liquid cooling

7 Tezzaron Semiconductor Even with innovations like DDR II * and QDR, * inadequate memory speed – the so- called “Memory Wall” – is still the primary obstacle to system performance; [i] it undermines most of the speed improvements of today’s processors. [ii] In spite of gains in bus speed, high memory latency causes processors to wait for data; 2003 statistics show that individual processors in high-performance systems and servers spend 65-95% of their time [iii] idly waiting for either memory or I/O. * [i] [ii] [iii] [i] [i] N.R. Mahapatra and B. Venkatrao, “The Processor-Memory Bottleneck: Problems and Solutions” Association for Applied Computing Crossroads 5 no. 3 (1999) [e-journal]. [ii] [ii] Anthony Cataldo, “MPU designers target memory to battle bottlenecks” EETimes, 19 October 2001 [iii] [iii] Sally McKee, “Perspectives on The Memory Wall Problem” accessed online Sept at Jack Dongarra, “Getting the Performance out of High Performance Computing” accessed online Sept at Graph: J. Dongarra, U. of Tennessee

8 Tezzaron Semiconductor Commodity Memory … FLAT!

9 Tezzaron Semiconductor A Poster Child for the Productivity Crisis Sparse Matrix Operations Particle Physics, Weapons Dev. 5.9% efficiency Finite Element Analysis Weather & Ocean Forecasting 7.1% efficiency Large Matrix Manipulation Engineering Design of Complex Structures 8.4% efficiency Memory Intensive Calculations Cryptanalysis < 3.0% efficiency I/O BW to Processing Ratio Radar, Sonar, Imaging Sensors 12% efficiency ASCI Q PC’s are 15 to 25% Efficient

10 Tezzaron Semiconductor Linear! Comparable When Processor Limited

11 Tezzaron Semiconductor Log! X When Memory Limited

12 Tezzaron Semiconductor Where Does the Bandwidth Go? When Costs are Grouped by Bandwidth, Memory Bandwidth is 80% of the Cost of a Cray X1 Class Machine

13 Tezzaron Semiconductor OK, access to main memory is glacial Solution: On Chip Cache

14 Tezzaron Semiconductor Cache THE Driver of Processor Die Area $4227 $1980 $TBD 130 nm 90 nm

15 Tezzaron Semiconductor Deciding What to Bring “On Chip”

16 Tezzaron Semiconductor The Good The Bad The Ugly

17 Tezzaron Semiconductor On Chip / Off Chip Power Operation Energy 32-bit ALU operation5 pJ 32-bit register read10 pJ Read 32 bits from 8K RAM50 pJ Move 32 bits across 10mm chip100 pJ Move 32 bits off chip1300 to 1900 pJ Calculations using a 130nm process operating at a core voltage of 1.2V (Source: Bill Dally, Stanford) Prefetch/Cache Overhead and Off Chip Memory Access are key Power Issues

18 Tezzaron Semiconductor On Chip / Off Chip Latency Madison 6MPOWER4+POWER5 Frequency (GHz) L2 Latency 5 cycles 3.3 ns 12 cycles 7.1 ns 13 cycles 6.8 ns L3 Latency 14 cycles 9.3 ns 123 cycles 72.3 ns 87 cycles 45.8 ns Memory Latency ~224 cycles ~149 ns 351 cycles 206 ns 220 cycles 116 ns

19 Tezzaron Semiconductor 3D Heterogeneous Integration Rendering of 3D IC Maps to memory die array Maps to logic only die AFTER: 3D IC Single Die~ 430 mm2 2D IC “All or Nothing” Wafer Cost ~ $6,000 Low yield ~ 15%, ~ 10 parts per wafer128MB not 9MB memory costs ~ $44/MB memory costs ~ $1.50/MB  $0.44/MB 14x increase in memory density 4X Logic Cost Reduction 29x  100x memory cost reduction (choice!) Intel Photo used as proxy BEFORE Only Memory Directly Compatible with Logic (virtually no choice!)

20 Tezzaron Semiconductor Octopus L3 Cache DRAM 1Gb-4Gb Down to 5ns, latency 1GHz Max clock rate Minimum Timing - tRCD=1, tCYC=4, tPRE=0, tCL=2 Programmable 8 port by 256 bit architecture Programmable burst length 4 to 256 Programmable port width 32 to 256 bits Exposed or hidden refresh options DDR 2000MT Max >200GB/s sustained, closed page mode, BL=4, bandwidth 512GB/s peak bandwidth >25TB/s peak on-board transfer rate 1.0V to 1.7V I/O 1.4V to 1.6V Core Internally ECC protected, Dynamic self-repair 115C die full function operating temperature 65 sqmm die footprint

21 Tezzaron Semiconductor The Demo!


Download ppt "Tezzaron Semiconductor Device Summary Fully functional devices demonstrating a wide variety of applications Good yield –90% process, 96% device Demonstrated."

Similar presentations


Ads by Google