Presentation on theme: "1 Avinash K. Kodi and Randy W. Morris, Jr. Department of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701"— Presentation transcript:
1 Avinash K. Kodi and Randy W. Morris, Jr. Department of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701 E-mail: email@example.com, firstname.lastname@example.org ACM/IEEE Symposium on Architectures for Networking and Communications Systems, Princeton, New Jersey October 19-20, 2009 Design of a Scalable Nanophotonic Interconnect for Future Multicores
Chip Multi-Processor 3 Multicores have arrived -Future processors will be comprised of 100’s to 1000’s of cores Multicores have arrived -Future processors will be comprised of 100’s to 1000’s of cores Intel Tera-FLOPS, 80-cores, 65 nm, 2007 1 IBM cell processor, 8-cores, 90 nm, 2004 3 SPARC processor-16cores, 65 nm, 2008 2 1.Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar, ”A 5-ghz mesh inter-connect for a teraFLOPS processor," IEEE Micro, pp. 51-61, September/October 2007 2.G. Konstadinidis et. al., “Architecture and physical implementation of a third generation 65 nm, 16 cores, 32 thread chip-multithreading sparc processor,“ IEEE Journal of Solid-State Circuits, no. 1, p. 717, January 2009. 3.The Cell project at IBM Research, http://www.research.ibm.com/cell/home.html
Power Dissipation 5 Tile Power: Intel Tera-Flops (65 nm) 2 2. Y. Hoskote, “A 5-GHz Mesh Interconnect for A Teraflops Processor,” IEEE Computer Society, 2007 pp. 51-61 28% Recent NSF-sponsored workshop on On-Chip Interconnection Networks 1 : Power consumption of NOCs implemented with current techniques – exceeds expected needs by a factor of 10. Recent NSF-sponsored workshop on On-Chip Interconnection Networks 1 : Power consumption of NOCs implemented with current techniques – exceeds expected needs by a factor of 10. Potential Solutions - Nanophotonics - Nanophotonics - Wireless/RF - Wireless/RF - 3D stacking - 3D stacking 1. Reference : J.D.Owens, W.J.Dally, R.Ho, D.N.Jayasimha, S.W.Keckler and L.S.Peh, “Research Challenges for On-Chip Interconnection Networks”, IEEE Micro, vol. 27, no. 5, pp. 96 – 108, September-October 2007.
6 Why use Nanophotonics? CMOS compatible Low Power (0.1 mW) Small Footprint (~10 µm) High Bandwidth (~10 Gbps) Low Latency (10.45 ps/mm) CMOS compatible Low Power (0.1 mW) Small Footprint (~10 µm) High Bandwidth (~10 Gbps) Low Latency (10.45 ps/mm) 1. Lipson, M., Compact Electro-Optic Modulators on a Silicon Chip, IEEE J. Sel. Top. Quant., Vol. 12, No. 6, Nov.-Dec. 2006, p. 1520-6.Compact Electro-Optic Modulators on a Silicon Chip 2. M. Lipson, Guiding, Modulating and Emitting Light on Silicon - Challenges and Opportunities, IEEE Journal of Lightwave Technologies, Vol. 23,Guiding, Modulating and Emitting Light on Silicon - Challenges and Opportunities No. 12, 12 December 2005 (invited).
Optical Interconnect Off-Chip Laser On-Chip Modulator Transmission Medium Photodetector TIA Buffer ChainLimiting Amplifier Driver for Electronics Optical Layer Electronics Layer On-Chip 7 On-chip Modulator -Mach-Zehnder modulator or Micro-Ring Resonator Transmission Medium - Freespace or Waveguide (Polymer or Silicon) Photodetectors - GaAs, III-V materials, Ge-on-SOI (Silicon-on-Insulator)
Micro-ring Resonators 8 Resonant wavelength ( λ 0 ) λ 0 m= n eff 2 R m an integer n eff effective refractive index R radius of the ring resonator Input Port 0Output Port 0 n+n+ p+p+ n+n+ =V OFF =V ON =V OFF VRVR Output Port 1 VRVR Input Port 0 Output Port 0 n+n+ p+p+ n+n+ VRVR Input Port 0Output Port 0 n+n+ p+p+ n+n+
Electrical Interconnect 9 CpCp C0C0 rsrs R, C l opt s opt R =wire resistant per length C =wire capacitance per length Cp=inverter output capacitance C 0 =inverter input capacitance R s = inverter resistance S opt =inverter size L opt = Wire distance RC Link:
ITRS 2007 Transistor & Link Parameters? 10 Device90 nm65 nm45nm32nm22nm V dd 188.8.131.52.8 f clk 3.0884.75.8757.3449.18 R 122220312382455 C 170165160155150 CpCp 10.90.80.7120.544 CoCo 0.50.450.40.3560.272 RsRs 18902200350047006900 S opt 72.560.566.973.191.4 Lopt 0.450.3184.108.40.206 Ioffn (nA/micron) 5070100150220 Ishortckt (nA/micron) 65100 Increase wire delay due to RC constant Increase in Ioffn & Ishortckt current parameters Electrical link device parameters for various VLSI technologies
Waveguide & Receiver 11 WAVEGUIDEPitch (um)Propagation Time (ps) Optical Loss (dB/cm) Si 5.510.451.3 Polymer 204.931.0 RECEIVERPower (mW/Gbps)Area (mm 2 ) Si-CMOS-Amplifier 1.10.02625 80 nm CMOS 2.50.0625 SiGe BiCMOS 24.51.07  N. Kirman and et. al., “Leveraging Optical Technology in Future Bus-based Chip Multiprocessors”, 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006 Vol. 9, Iss. 13 Dec. 2006 pg.492 – 50  S. Koester et. al., “Ge-on-SOI-Dectector/Si-CMOS-Amplifier Receivers for High-Performance Optical-Communication Applications,” Journal of Lightwave Technology, Vol. 25, No. 1, January 2007  C. Kromer and et. al., “A 100-mW 4X10 Gb/s Transceiver in 80-nm CMOS for High-Density Optical Interconnects,” IEEE Journal of Solid-State Circuits, Vol. 40, No. 12, December 2005 D. Kuchta and et. al., “120-Gb/s VCSEL-based parallel-optical interconnect and custom 120-Gb/s testing station,” Journal of Lightwave Technology, Vol. 22 No. 9 pp. 2200-2212, Sept. 2004
Electrical/Optical Comparison 12 Power-delay product at various technology nodes for a 5 mm link. Optics is more advantageous: 52nm for Global & 45 nm for Semi-global Interconnects
core-to-core distance Critical Length 13 Critical Length is the distance where optical becomes more advantageous
14 Why PROPEL? Related Work –Corona (ISCA 2008), Circuit-switch(IEEE Transaction 2008), Shared-bus (Micro 2006) Reduce hardware complexity –Current proposed nanophotonic networks use large number of optical components Nanophotonic for communication (links) and electronics for switching –No optical arbitration required –Balance between cheaper electronic and more costly optics Scalable network design
19 Need for E-PROPEL Related work - Corona (ISCA 2008), Processor-DRAM (HOT Interconnects 2008), Firefly (ISCA 2009) Issues with 256-core version of PROPEL - xbar (15×15), Area (Waveguides), Power dissipation Advantages of E-PROPEL - Non-blocking crossbar, multiple roots (Fat tree), reduce components (over PROPEL)
20 E-PROPEL Design Combine 4 PROPELs with nanophotonic crossbars Cluster 0Cluster 1Cluster 2Cluster 3 Non-blocking Optical Xbar Non-blocking Optical Xbar Non-blocking Optical Xbar Non-blocking Optical Xbar Non-blocking Optical Xbar Non-blocking Optical Xbar Non-blocking Optical Xbar Top and bottom tiles RE-PROPEL: Top and bottom tiles are only connected
27 Power Dissipation Evaluation Buffers (8.06mW) 1 Xbar (8.66mW) 2 Modulator (0.1mW/Gb) 4 TIA/Amplifier (1.1mW/Gb) 5 Electrical Links (44mW) 3 1,2. B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu, “Express cube topologies for on-chip interconnects,” in the Proceeding of 15th International Symposium on High Performance Computer Architecture, Feburary 2009, pp. 163–174. 3. Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, and A. Choudhary, “Firefly: Illuminating future network-on-chip with nanophotonics,” in the Proceedings of the 36th annual International Symposium on Computer Architecture, 2009. 4. Q. Xu, S. Manipatruni, B. Schmidt, J. Shakya, and M. Lipson, “12.5 gbit/s carrier-injection-based silicon micro-ring silicon modulators,” Optics Express:The International Electronic Journal of Optics, vol. 15, no. 2, January 2007. 5. S. J. Koester, C. L. Schow, L. Schares, and G. Dehlinger, “Ge-on-soi-detector/si-cmos-amplifier receivers for high-performance opticalcommunication applications,” Journal of Lightwave Technology, vol. 25, no. 1, pp. 46–57, January 2007.
Uniform Traffic 28 Throughput Latency Throughput 25% increase - 25% increase performance over Mesh Over 2× increase - Over 2× increase in performance over Circuit-switch, Cmesh and Shared-bus Throughput 25% increase - 25% increase performance over Mesh Over 2× increase - Over 2× increase in performance over Circuit-switch, Cmesh and Shared-bus
Throughput: Synthetic Traffic Traces 29 -50% increase -50% increase over mesh for bit-reversal, matrix transpose, and perfect shuffle -50% increase -50% increase over mesh for bit-reversal, matrix transpose, and perfect shuffle
Power Dissipation: Synthetic Traffic 30 by a factor of 5 - PROPEL decreases power consumption by a factor of 5
31 Splash-2 Speed up -PROPEL speed-up LU, Ocean, Radix, Water, FFM and Barnes factor of 2 by of factor of 2 about 1.5 × -FFT, Radiosity and Raytrace have a speed-up of about 1.5 × -PROPEL speed-up LU, Ocean, Radix, Water, FFM and Barnes factor of 2 by of factor of 2 about 1.5 × -FFT, Radiosity and Raytrace have a speed-up of about 1.5 ×
32 Splash-2 Power Dissipation by a factor of 10 - PROPEL decreases power consumption by a factor of 10
33 E-PROPEL Throughput - E-PROPEL throughput is similar to PROPEL except for Uniform, Matrix Transpose, and Perfect Shuffle Transpose, and Perfect Shuffle -RE-PROPEL only slightly decreases performance over E-PROPEL -E-PROPEL improves performance by 2x over mesh - E-PROPEL throughput is similar to PROPEL except for Uniform, Matrix Transpose, and Perfect Shuffle Transpose, and Perfect Shuffle -RE-PROPEL only slightly decreases performance over E-PROPEL -E-PROPEL improves performance by 2x over mesh
34 E-PROPEL Power - E-PROPEL and RE-PROPEL reduce power dissipation by a factor of 3
low power high bandwidth NoCPROPEL and E-PROPEL are both a low power high bandwidth NoC for future many-core processors electronic for packet switching optics for inter-router communicationPROPEL and E-PROPEL uses both electronic for packet switching and optics for inter-router communication, allowing for a reduction in electrical and optical components outperform and dissipate less powerPROPEL and E-PROPEL are able to outperform and dissipate less power when compared to well-known network topologies adaptive routingIn future work, incorporate adaptive routing technique to balance the load across the entire network 35 Conclusion