Presentation is loading. Please wait.

Presentation is loading. Please wait.

Galaxy: High-Performance Energy-Efficient Multi-Chip Architectures Using Photonic Interconnects Nikos Hardavellas – Parallel Architecture Group.

Similar presentations


Presentation on theme: "Galaxy: High-Performance Energy-Efficient Multi-Chip Architectures Using Photonic Interconnects Nikos Hardavellas – Parallel Architecture Group."— Presentation transcript:

1 Galaxy: High-Performance Energy-Efficient Multi-Chip Architectures Using Photonic Interconnects Nikos Hardavellas PARAG@N – Parallel Architecture Group Northwestern University Team: Y. Demir, P. Yan, S. Song, J. Kim, G. Memik

2 Chip Power Scaling © Hardavellas 2 Chip power does not scale [Azizi 2010]

3 Voltage Scaling Has Slowed © Hardavellas 3 In last decade: 13x transistors but 30% lower voltage Cannot run all transistors fast enough

4 Pin Bandwidth Scaling © Hardavellas 4 [TU Berlin] Cannot feed cores with data fast enough to keep them busy

5 Data Scaling SPEC, TPC datasets growth: faster than Moore Same trends in scientific, personal computing Large Hadron Collider March11: 1.6PB data (Tier-1) Large Synoptic Survey Telescope 30 TB/night 2x Sloan Digital Sky Surveys/day Sloan: more data than entire history of astronomy before it © Hardavellas 5 More data more computing power to process them

6 Galaxy: Optically-Connected Disintegrated Processors Physical constraints limit single-chip designs Area, Yield, Power, Bandwidth Multi-chip designs break free of these limitations Processor disintegration Macro-chip integration © Hardavellas 6 [Pan, WINDS 2010]

7 Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 7

8 Nanophotonic Components © Hardavellas 8 off-chip laser source coupler resonant modulators resonant detectors Ge-doped waveguide Selective: couple optical energy of a specific wavelength

9 Modulation and Detection © Hardavellas 9 11010101 10001011 16 - 64 wavelengths DWDM 5 - 20μm waveguide pitch 10Gbps per link 8 Tbps/mm bandwidth density or more !!!

10 Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 10

11 Galaxy Architecture © Hardavellas 11

12 Routing Example © Hardavellas 12

13 Galaxy Architecture © Hardavellas 13

14 Galaxy MWSR Optical Crossbar © Hardavellas 14 More energy-efficient than SWMR at that scale MWSR avoids broadcast bus, but requires arbitration

15 Token-Based Arbitration © Hardavellas 15 8 cycles on average for token arbitration (5 chiplets)

16 Dense Off-Chip Coupling © Hardavellas 16 Dense optical fiber array [Lee, OSA/OFC/NFOEC 2010] ~3.8dB loss, 8 Tbps/mm demonstrated Misalignment loss <1 dB Loss comparable to optical proximity couplers

17 Nanophotonic Parameters © Hardavellas 17

18 Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 18

19 Architectural Parameters © Hardavellas 19

20 Modeling Infrastructure © Hardavellas 20 3D-stack model SimFlex sampling 95% confidence photonic-layer ring heating

21 Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 21

22 Load-Latency Curves © Hardavellas 22 16 tokens provide optimal buffer depth

23 Laser Power Sensitivity to Optical Parameters © Hardavellas 23 Coupler Loss Off-Ring Loss Waveguide & Filter Drop Loss Modulator Insertion Loss Highly sensitive to coupler loss, insensitive to other losses

24 Sensitivity to Fiber Density 116mm 2 chiplets 43mm along the chip edge Enough room for 172 fibers @ 250μm pitch © Hardavellas 24 128 fibers: within 3% of max performance

25 Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 25

26 Performance Against Unlimited Designs © Hardavellas 26 Unlimited power (max speed of design, irrespective of temp.) Mesh_20MC & Corona_20MC Also unlimited bandwidth (20 MCs per chip, 5x more pins) Galaxy matches the performance of unlimited designs

27 Performance Against Realistic Designs Realistic: within power and bandwidth envelopes Galaxy chiplets within 66.2 o C chiplets run at max speed © Hardavellas 27 Galaxy: 2.2x speedup on average (3.4 max)

28 Energy-Delay Product Cool chiplets minimize leakage © Hardavellas 28 Galaxy: 2.4x-2.8x smaller EDP on average (6.8x max)

29 Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 29

30 Comparison Against Multi-Chip Alternatives © Hardavellas 30

31 Comparison Against Multi-Chip Alternatives © Hardavellas 31 Fiber Galaxy: 2.5x over Oracle Macrochip (6.8x max)

32 Tapered vs. Optical Proximity Couplers © Hardavellas 32 6x less laser power than Oracle Macrochip with demonstrated couplers

33 Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 33

34 80-core 5-chiplet Galaxy Thermal CFD Modeling © Hardavellas 34 8cm spacing allows cooling with cheap passive heatsinks 88.2 0 C

35 9-chiplet Dense Array (Oracle Macrochip) © Hardavellas 35 Tight arrangement points to liquid cooling requirement 249 0 C

36 9-chiplet Galaxy 2D © Hardavellas 36 Cooling 9 chiplets with passive heatsinks 110 0 C

37 9-chiplet Galaxy 3D © Hardavellas 37 Flexible fibers allow virtual chip to break free of 2D planar designs 83.6 0 C

38 Galaxy Summary Virtual chips with the performance of unlimited designs Breaks free of typical physical constraints Large aggregate area Improved yield (break-even point : 60% yield for photonics) Tb/s/mm bandwidth density Pushes back power wall Processor disintegration 2.2x avg. speedup (3.4 max) 2.4x-2.8x avg. smaller EDP (6.8x max) Macrochip integration 2.5x speedup over Oracle Macrochip (6.8x max) 6x more power efficient links © Hardavellas 38

39 Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 39

40 Energy is Shaping the IT Industry #1 of Grand Challenges for Humanity in the Next 50 Years [Smalley Institute for Nanoscale Research and Technology, Rice U.] Computing worldwide: ~408 TWh in 2010 [Gartner] Datacenter energy consumption in US ~150 TWh in 2011 [EPA] 3.8% of domestic power generation, $15B CO 2 -equiv. emissions Airline Industry (2%) Carbon footprint of worlds data centers Czech Republic Exascale @ 20MW: 200x lower energy/instr. (2nJ 10pJ) 3% of the output of an average nuclear plant! 10% annual growth on installed computers worldwide [Gartner] © Hardavellas 40 Exponential increase in energy consumption

41 Integer add: 0.5pJ; FP-FMA: 50pJ. Where does energy go? Data movement: 1200pJ across 400mm 2 chip, 16000pJ memory Elastic caches: minimize data transfers through adapting caches to workload demands [ISCA09, IEEEMicro10, DATE12] Processing: ~1500pJ to schedule the operation SeaFire: specialized computing on dark silicon to eliminate general- purpose computings overheads [IEEEMicro11, USENIX-Login11] Circuits: wide voltage guardbands Low voltages, process variation timing errors computing errors Elastic fidelity: allow errors at select code/data segments to save energy while maintaining fidelity contract with user [CoRR abs/1111.4279] Chips fundamentally limited by physical constraints. Need to break free. Galaxy: processor disintegration/macrochip integration using photonic interconnects [WINDS10] Overall Focus: Energy-Efficient Computing

42 Thank You! © Hardavellas 42

43 Overcoming Data Movement and Processing Overheads Elastic caches: adapt cache to workloads demands Significant energy on data movements and coherence requests Co-locate data, metadata, and computation Decouple address from placement location Capitalize on existing OS events simplify hardware Cut on-chip interconnect traffic by half Seafire: specialized computing on dark silicon Repurpose dark silicon to implement specialized cores Application cherry-picks a few cores, rest of chip is powered off Vast unused area many specialized cores likely to find good matches 12x lower energy (conservative) 43 © Hardavellas

44 Elastic fidelity: selectively trade accuracy for energy We dont always need 100% accuracy, but HW always provides it Language constructs specify required fidelity for code/data segments Steer computation to exec/storage units with appropriate fidelity and lower voltage 35% lower energy Overcoming Voltage Guardbands 44 © Hardavellas No errors 10% errors


Download ppt "Galaxy: High-Performance Energy-Efficient Multi-Chip Architectures Using Photonic Interconnects Nikos Hardavellas – Parallel Architecture Group."

Similar presentations


Ads by Google