Presentation is loading. Please wait.

Presentation is loading. Please wait.

3D Interconnect: Architectural Challenges and Opportunities UC SANTA BARBARA Tim Sherwood.

Similar presentations


Presentation on theme: "3D Interconnect: Architectural Challenges and Opportunities UC SANTA BARBARA Tim Sherwood."— Presentation transcript:

1 3D Interconnect: Architectural Challenges and Opportunities UC SANTA BARBARA Tim Sherwood

2 The Role of Architecture Applications Runtime System Architecture Circuit Device Package SW HW Constraints Demands 3D Integration (Noise, Thermal, Yield) (Battery Life, Performance, Programmability )

3 Lab Overview Intrusion Detection System Server Farm Processor Core Caches, etc. Prototype Acceleration Primitives High Speed Programmable Routers 1 0 0 0 1 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0 b2 b1 b0 b4 { 2 } b5 b9 b8 { 2,7 } b7 b6 { 2,5 } b3 Intrusion Detection and Prevention Adaptive Hardware Profiling Engines integrated On-Chip Memory Hierarchy Software Defined Wireless Access Point Reconfigurable Security on FPGAs High Throughput MEMS controllers

4 Lab Overview Software Defined Wireless Access Point Intrusion Detection System Server Farm Processor Core Caches, etc. Prototype Acceleration Primitives High Speed Programmable Routers 1 0 0 0 1 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0 b2 b1 b0 b4 { 2 } b5 b9 b8 { 2,7 } b7 b6 { 2,5 } b3 Intrusion Detection and Prevention Adaptive Hardware Profiling Engines integrated On-Chip Memory Hierarchy Reconfigurable Security on FPGAs High Throughput MEMS controllers

5 Potential for Impact from 3D Software Defined Wireless Access Point Intrusion Detection System Server Farm Processor Core Caches, etc. Prototype Acceleration Primitives High Speed Programmable Routers 1 0 0 0 1 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0 b2 b1 b0 b4 { 2 } b5 b9 b8 { 2,7 } b7 b6 { 2,5 } b3 Intrusion Detection and Prevention Adaptive Hardware Profiling Engines integrated On-Chip Memory Hierarchy Reconfigurable Security on FPGAs High Throughput MEMS controllers 3D Specialization 3D Bandwidth 3D Integration for Latency 3D Integration for Mixed Signal 3D Integration for Mixed Technology 3D Specialization

6 Potential for Impact from 3D Software Defined Wireless Access Point Intrusion Detection System Server Farm Processor Core Caches, etc. Prototype Acceleration Primitives High Speed Programmable Routers 1 0 0 0 1 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0 b2 b1 b0 b4 { 2 } b5 b9 b8 { 2,7 } b7 b6 { 2,5 } b3 Intrusion Detection and Prevention Adaptive Hardware Profiling Engines integrated On-Chip Memory Hierarchy Reconfigurable Security on FPGAs High Throughput MEMS controllers 3D Specialization 3D Bandwidth 3D Integration for Latency 3D Integration for Mixed Signal 3D Integration for Mixed Technology 3D Specialization

7 Presented Works Shashidhar Mysore, Banit Agrawal, Sheng-Chih Lin, Navin Srivastava, Kaustav Banerjee and Timothy Sherwood. Introspective 3D Chips, Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems ( ASPLOS ), October 2006. San Jose, CA Gian Luca Loi, Banit Agrawal, Navin Srivastava, Sheng-Chih Lin, Timothy Sherwood, Kaustav Banerjee. A Thermally- Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy, Proceedings of the 43nd Design Automation Conference ( DAC ), June 2006. San Francisco, CA

8 Two Specific Opportunities 1) 3D Integration for Performance  Bring Memory Closer to those that use it  More Bandwidth and Lower Latency  Tricky System Level Tradeoffs 2 ) 3D Integration for Specialization  Integration offers unique specialization opportunity  Decouple commodity from niche The ramifications of any radical change requires a careful evaluation that considers all the parameters

9 temp package total power dynamic power V utilized area communication A Simple Performance “Ecosystem” parallelismfreq leakage app OS or runtime feedback performance No multicore, no spatial variance, no temporal variance, no metrics of cost or error or yield

10 Two Specific Opportunities 1) 3D Integration for Performance  Bring Memory Closer to those that use it  More Bandwidth and Lower Latency  Tricky System Level Tradeoffs 2 ) 3D Integration for Specialization  Integration offers unique specialization opportunity  Decouple commodity from niche The ramifications of any radical change requires a careful evaluation that considers all the parameters

11 Basic Savings in 3D Area: 4 Dist: √8 ≈ 2.8 Area: 2 Dist: √4 ≈ 2 + 1L Area: 1 Dist: √2 ≈ 1.4 + 3L BW: √8 ≈ 2.8 BW: 2√4 ≈ 4 BW: 4√2 ≈ 5.6 On-chip Latency improved, Bandwidth could improve more What about real wires? What about apps? What about temp?

12 Example Technology Node Banerjee et al. IEEE 2001

13 3D Wire Delay 160240320400480560640720800 0 0.2 0.4 0.6 0.8 1 1.2 1.4 x10 -11 D e l a y ( S e c ) Wire length L(um) Vertical via model Horizontal line model Horizontal wire length L Distributed RC delay Vertical wire length

14 A “Typical” 2D System Design DRAM CPU core L2Unified Cache L2to Main Memory External Bus Board L1 I-Cache L1 D-Cache Memory Controller Memory Bottleneck

15 A 3D Memory System CPU core L1I-Cache L1D-Cache L2Unified Cache L2to Main Memory vertical interlayer Bus L1to L2vertical interlayer Bus Layer1 2 3to18 Stacked three dimensional main memory 8 bytes to 128 bytes 200 Mhz to 2 Ghz

16 System-Level Simulation Simulator : Sim-Alpha simulator Processor : Alpha-21264 processor Benchmarks: mcf, parser, twolf with Minnespec reduced inputs % main memory access per instruction mcf parser twolf 1.7% 0.258472% 0.00062%

17 Effect of Bus Width and Frequency 0 1 2 3 4 5 6 7 10 100 1000 10000 Execution time (sec) L2 cache size in KBytes 8 bytes bus width (3-D) 16 bytes bus width (3-D) 32 bytes bus width (3-D) 64 bytes bus width (3-D) 128 bytes bus width (3-D) 8 bytes bus width (2-D) mcf Only a few vias required

18 Effect of Clock Frequency : mcf

19 Effect of Clock Frequency : parser

20 Effect of Clock Frequency : twolf

21 An Example Memory System

22 Self-consistent Thermal Modeling Insert the initials values of leakage and dynamic power for each layer Calculate the first thermal profile Based on the previous thermal profile calculate the new power dissipation considering I on decrease with temperature I Leakage increase with temperature Calculate the new temperature profile Finish Yes No Is it convergent?

23 3D Thermally-aware Performance Analysis mcf 1 1.5 2 2.5 3 Temperature constraint Min execution time in2-D 3-D 3-D max chip temperature 2-D max chip temperature 400 390 380 370 360 350 340 330 T e m p e r a t u r e ( K ) E x e c u t i o n t i m e p e r i n s t r u c t i o n

24 3D Thermally-aware Performance Analysis twolf 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 600 100014001800220026003000 Frequency in MHz Maximum frequency allowed due to temperatureconstraint Min execution time in3-D 2-D 390 380 370 360 350 340 330 Temperature constraint 2-D max chip temperature 3-D max chip temperature T e m p e r a t u r e ( K ) E x e c u t i o n t i m e p e r i n s t r u c t i o n

25 3D Memory Integration Many Unaccounted For Effects  Effect of Multiple Cores and Memory Banks  Spatial Variation  Temporal Variation (thermal load balancing)  All of these are intimately tied to the integration method and packaging How to Manage  Architecture and Software will be increasingly involved  Exposing Variation to higher levels  Huge demand for “models”, “sensors”, and “knobs”  Thermal, Packaging, Application, Architecture all tangled  Need to build models that capture all of these aspects  Models need to be “self consistent”

26 Two Specific Opportunities 1) 3D Integration for Performance  Bring Memory Closer to those that use it  More Bandwidth and Lower Latency  Tricky System Level Tradeoffs 2 ) 3D Integration for Specialization  Integration offers unique specialization opportunity  Decouple commodity from niche The ramifications of any radical change requires a careful evaluation that considers all the parameters

27 3D Integration for Introspection Complex interactions across levels of abstraction make debugging, optimizing, securing, and analysis in general difficult The first requirement – visibility  Not just data capture, we need the ability to put together a cohesive picture of system interactions and correlate between them in a sound and non-intrusive manner The hardware/software boundary is uniquely situated  Piece together from low level events What would the programmer wish list look like?

28 To Integrated Monitoring Hardware L1_BPU Decode Trace Cache Top L2_BPU Bus Control MOBITLB Trace Cache Bottom DTLB L1 Cache Top L2 Cache L1 Cache Bottom FP Exec UROM FP Reg Alloc Rename Instr Q1 Sched Instr Q2 Int Reg Retire Int Exec Mem Ctl 790 320 2 3 2 What programmers want 32 bit Memory Address 32 bit Memory Value 10 bit Opcodes 2, 5 bit Register Names 2, 32 bit Register Values 10 bits of “status” Everything. 3x 4x 1892 bits per cycle = 1 terrabyte / sec @ 4 Ghz

29 Why programmers cant have it Interconnect is not free  Huge cross chip busses  OptBuf 285um  20,000 buffers Analysis is not free  Significant processing required Extra cost of added heat  $15 budget for cooling Used by developers To Integrated Monitoring Hardware L1_BPU Decode Trace Cache Top L2_BPU Bus Control MOBITLB Trace Cache Bottom DTLB L1 Cache Top L2 Cache L1 Cache Bottom FP Exec UROM FP Reg Alloc Rename Instr Q1 Sched Instr Q2 Int Reg Retire Int Exec Mem Ctl 790 320 2 3 2

30 Cake + Eating It Too Need a way to provide cheap (or high margin) HW to the masses  No paying for developer functionality Get developers the powerful analysis they crave  See everything at executable rate Provide “snap-on” functionality for developers  Separate chip for analysis engine  Only hook it onto “developer” systems Idea is not limited to development systems  Security, Error Correction, Confidentiality, Accelerators, … 3d Integration offers the potential

31 Thermal Impact

32 Conclusion: Opportunities+Challenges 3D Integration for Performance  Bring Memory Closer to those that use it  More Bandwidth and Lower Latency  Requires few vias for big impact  Tricky System Level Tradeoffs 3D Integration for Specialization  Integration offers unique specialization opportunity  Requires rethinking of integration process  Decouple commodity from niche Challenges  Cross layer models: from app to package  Cross layer optimization: both static and dynamic  Thermal Management is everybody's problem

33 http://www.cs.ucsb.edu/~arch/ NSF CNS 0524771, NSF CCF 0702798, NSF CCF 0448654

34 Related Work Bryan Black, Murali M. Annavaram, Edward Brekelbaum, John DeVale, Gabriel H. Loh, Lei Jiang, Don McCauley, Pat Morrow, Don Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Paul Shen, Clair Webb, "Die Stacking (3D) Microarchitecture," in IEEE International Symposium on Microarchitecture, 469-479, 2006. PUBLICATIONS on 3D STACKED IC 1. Karthik Balakrishnan, Vidit Nanda, Siddharth Easwar, and Sung Kyu Lim, "Wire Congestion And Thermal Aware 3D Global Placement," IEEE/ACM Asia South Pacific Design Automation Conference, p1131-1134, 2005. (pdf) 2. Jacob Minz, Sung Kyu Lim, and Cheng-Kok Koh, "3D Module Placement for Congestion and Power Noise Reduction," ACM Great Lake Symposium on VLSI, p458-461, 2005. (pdf) 3. Jacob Minz, Eric Wong, and Sung Kyu Lim, "Reliability-aware Floorplanning for 3D Circuits," to appear in IEEE International SOC Conference, 2005. (pdf) 4. Kiran Puttaswamy and Gabriel H. Loh, "Implementing Caches in a 3D Technology for High Performance Processors", IEEE International Conference on Computer Design, pp. 525-532, 2005. (pdf) 5. Eric Wong and Sung Kyu Lim, "3D Floorplanning with Thermal Vias," to appear in Design, Automation and Test in Europe, 2006. 6. Kiran Puttaswamy and Gabriel H. Loh, "Implementing Register Files for High-Performance Microprocessors in a Die-Stacked (3D) Technology," IEEE International Symposium on VLSI, pp. 384-389, 2006. (pdf) 7. Kiran Puttaswamy and Gabriel H. Loh, "The Impact of 3-Dimenstional Integration on the Design of Arithmetic Units," IEEE International Symposium on Circuits and Systems, pp. 4951-4954, 2006. (pdf) 8. Kiran Puttaswamy and Gabriel H. Loh, "Thermal Analysis of a 3D Die-Stacked High-Performance Microprocessor," ACM/IEEE Great Lakes Symposium on VLSI, 19-24, 2006. (pdf) 9. Kiran Puttaswamy and Gabriel H. Loh, "Dynamic Instruction Schedulers in a 3-Dimensional Integration Technology," ACM/IEEE Great Lakes Symposium on VLSI, 153-158, 2006. (pdf) 10. Yuan Xie, Gabriel H. Loh, Bryan Black and Kerry Bernstein, "Design Space Exploration for 3D Architectures," ACM Journal on Emerging Technologies in Computing Systems, vol.2(2), pp. 65-103, 2006. (pdf) 11. Eric Wong, Jacob Minz, and Sung Kyu Lim, "Decoupling Capacitor Planning and Sizing for Noise and Leakage Reduction," to appear in IEEE International Conference on Computer Aided Design, 2006. 12. Bryan Black, Murali M. Annavaram, Edward Brekelbaum, John DeVale, Gabriel H. Loh, Lei Jiang, Don McCauley, Pat Morrow, Don Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Paul Shen, Clair Webb, "Die Stacking (3D) Microarchitecture," in IEEE International Symposium on Microarchitecture, 469-479, 2006. 13. Kiran Puttaswamy, Gabriel H. Loh, "Thermal Herding: Microarchitecture Techniques for Controlling HotSpots in High-Performance 3D-Integrated Processors," in IEEE International Symposium on High-Performance Computer Architecture, 2007. 14. Kiran Puttaswamy, Gabriel H. Loh, "Scalability of 3D-Integrated Arithmetic Units in High-Performance Microprocessors," to appear in ACM Design Automation Conference, 2007. PUBLICATIONS on MICRO-ARCHITECTURAL FLOORPLANNING 1. Mongkol Ekpanyapong, Jacob Minz, Thaisiri Watewai, Hsien-Hsin S. Lee, and Sung Kyu Lim, "Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25, No. 7, pp. 1289-1300, 2006. (pdf) 2. Mongkol Ekpanyapong, Jacob Minz, Thaisiri Watewai, Hsien-Hsin S. Lee, and Sung Kyu Lim, "Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design," ACM Design Automation Conference, p634-639, 2004. (pdf) 3. Mongkol Ekpanyapong, Sung Kyu Lim, Chinnakrishnan Ballapuram, and Hsien-Hsin S. Lee, "Wire-driven Microarchitectural Design Space Exploration," IEEE International Symposium on Circuits and Systems, p1867-1870, 2005. (pdf) 4. Michael Healy, Mario Vittes, Mongkol Ekpanyapong, Chinnakrishnan Ballapuram, Sung Kyu Lim, Hsien-Hsin S. Lee, and Gabriel H. Loh, "Microarchitectural Floorplanning Under Performance and Temperature Tradeoff," to appear in Design, Automation and Test in Europe, 2006. 5. Michael Healy, Mario Vittes, Mongkol Ekpanyapong, Chinnakrishnan Ballapuram, Sung Kyu Lim, Hsien-Hsin S. Lee, and Gabriel H. Loh, "Multi-Objective Microarchitectural Floorplanning For 2D And 3D ICs," to appear in IEEE Transactions on Computer-Aided Design of Integrated Ciruits and Systems. 6. Fayez Mohamood, Michael Healy, Sung Kyu Lim, and Hsien-Hsin S. Lee, "A Floorplan-Aware Dynamic Inductive Noise Controller for Reliable Processor Design," to appear in IEEE/ACM International Symposium on Microarchitecture, 2006. 7. Fayez Mohamood, Michael Healy, Hsien-Hsin Lee, and Sung Kyu Lim, "Noise-Direct: A Technique for Power Supply Noise Aware Floorplanning Using Microarchitecture Profiling," to appear in IEEE/ACM Asia South Pacific Design Automation Conference, 2007. PUBLICATIONS on 3D PACKAGING 1. Jacob Minz and Sung Kyu Lim, "Layer Assignment for System-on-Packages," ACM/IEEE Asia and South Pacific Design Automation Conference, p31-37, 2004. (pdf) 2. Jacob Minz, Mohit Pathak, and Sung Kyu Lim, "Net and Pin Distribution for 3D Package Global Routing," Design, Automation and Test in Europe, p1410-1411, 2004. (pdf) 3. Ramprasad Ravichandran, Jacob Minz, Mohit Pathak, Siddharth Easwar, and Sung Kyu Lim, "Physical Layout Automation for System-On-Packages," IEEE Electronic Components and Technology Conference, p41-48, 2004. (pdf) 4. Pun Hang Shiu, Ramprasad Ravichandran, Siddharth Easwar, and Sung Kyu Lim, "Multi-layer Floorplanning for Reliable System-on-Package," IEEE International Symposium on Circuits and Systems, p69-72, 2004. (pdf) 5. Jacob Minz, Sung Kyu Lim, Jinwoo Choi, and Madhavan Swaminathan, "Module Placement for Power Supply Noise and Wire Congestion Avoidance in 3D Packaging," IEEE Electrical Performance of Electronic Packaging, p123-126, 2004. (pdf) 6. Jacob Minz and Sung Kyu Lim, "A Global Router for System-on-Package Targeting Layer and Crosstalk Minimization," IEEE Electrical Performance of Electronic Packaging, p99-102, 2004. (pdf) 7. Jacob Minz, Eric Wong, and Sung Kyu Lim, "Thermal and Crosstalk-Aware Physical Design For 3D System-On-Package," IEEE Electronic Components and Technology Conference, P824-831, 2005. (pdf) 8. Eric Wong, Jacob Minz, and Sung Kyu Lim, "Power Noise-aware 3D Floorplanning for System-On-Package," to appear in IEEE Electrical Performance of Electronic Packaging, 2005. (pdf) 9. Sung Kyu Lim, "Physical Design for 3D System-On-Package: Challenges and Opportunities," IEEE Design & Test of Computers, Vol. 22, No. 6, p532-539, 2005. (pdf) 10. Jacob Minz, Eric Wong, Mohit Pathak, and Sung Kyu Lim, "Placement and Routing for 3D System-On-Package Designs," to appear in IEEE Transactions on Components and Packaging Technologies. 11. Jacob Minz and Sung Kyu Lim, "Block-level 3D Global Routing With an Application to 3D Packaging," to appear in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 12. Jacob Minz, Somaskanda Thyagaraja, and Sung Kyu Lim, "Optical Routing for 3D System-On-Package," to appear in Design, Automation and Test in Europe, 2006. 13. Eric Wong, Jacob Minz, and Sung Kyu Lim, "White Space Management for Thermal Via and Decoupling Capacitor Insertion Targeting 3D System-On-Package," to appear in IEEE Electronic Components and Technology Conference, 2006. 14. Eric Wong, Jacob Minz, and Sung Kyu Lim, "Multi-objective Module Placement For 3D System-On-Package," IEEE Transactions on Very Large Scale Integration Systems, Vol. 14, No. 5, pp. 553-557, 2006


Download ppt "3D Interconnect: Architectural Challenges and Opportunities UC SANTA BARBARA Tim Sherwood."

Similar presentations


Ads by Google