Photonic On-Chip Networks for Performance-Energy Optimized Off-Chip Memory Access Gilbert Hendry Johnnie Chan, Daniel Brunina, Luca Carloni, Keren Bergman Lightwave Research Laboratory Columbia University New York, NY
Motivation The memory gap warrants a paradigm shift in how we move information to and from storage and computing elements [www.OpenSparc.net] [Exascale Report, 2008] Lightwave Research Lab, Columbia University 9/17/2018
Main Premise Current memory subsystem technology and packaging are not well-suited to future trends Networks on chip Growing cache sizes Growing bandwidth requirements Growing pin counts Lightwave Research Lab, Columbia University 9/17/2018
SDRAM context DIMMs controlled fully in parallel, sharing access on data and address busses Many wires/pins Matched signal paths (for delay) DIMMs made for short, random accesses Chip Lately, this is on chip [Intel] DIMM Memory Controller DIMM DIMM Lightwave Research Lab, Columbia University 9/17/2018
Future SDRAM context Example: Tilera TILE 64 9/17/2018 Lightwave Research Lab, Columbia University 9/17/2018
SDRAM DIMM Anatomy Cntrl IO 9/17/2018 DRAM_Bank DRAM_Chip Row Decoder data DRAM_Bank Col Decoder Sense Amps Col addr/en DRAM_Chip data DRAM cell arrays IO Cntrl Row addr/en Row Decoder Banks (usually 8) Addr/cntrl Ranks SDRAM device DRAM_DIMM Lightwave Research Lab, Columbia University 9/17/2018
Memory Access in an Electronic NoC Packetized, size of packet determined by router buffers message Chip Boundary NoC router Memory Controller End result: burst length in DRAM is dictated by packet size. (limited burst length). Results in short, random accesses. Burst length dictated by packet size Lightwave Research Lab, Columbia University 9/17/2018
Memory Control Complex DRAM control Scheduling accesses around: Open/closed rows Precharging Refreshing Data/Control bus usage [DRAMsim, UMD] Lightwave Research Lab, Columbia University 9/17/2018
Experimental Setup – Electronic NoC System: 5-port Electronic Router 2cm×2cm chip 8×8 Electronic Mesh 28 DRAM Access points (MCs) 2 DIMMs per DRAM AP Routers: 1 kb input buffers (per VC) 4 virtual channels 256 b packet size 128 b channels 32 nm tech. point (ORION) Normal Vt Vdd = 1.0 V Freq = 2.5 GHz Introduce: hypothetical CMP in 32nm. Explain we have a simulator. Traffic: DRAM: Random core-DRAM access point pairs Random read/write Uniform message sizes Poisson arrival at 1µs Modeled cycle-accurately with DRAMsim [Univ. MD] DDR3 (10-10-10) @ 1333 MT/s 8 chips per DIMM, 8 banks per Chip, 2 ranks Lightwave Research Lab, Columbia University 9/17/2018
Experiment Results 269 Gb/s 9/17/2018 Make sure to mention x-axis. Lightwave Research Lab, Columbia University 9/17/2018
Current Lightwave Research Lab, Columbia University 9/17/2018
Goal: Optically Integrated Memory Optical Fiber Optical Transceiver Vdd, Gnd Lightwave Research Lab, Columbia University 9/17/2018
Advantages of Photonics Decoupled energy-distance relationship No long traces to drive and synch with clock DRAM chips can run faster Less power Less pins on DIMM module and going into chip Eventually required by packaging constraints Waveguides can achieve dramatically higher density due to WDM DRAM can be arbitrarily distant – fiber is low loss Lightwave Research Lab, Columbia University 9/17/2018
Hybrid Circuit-Switched Photonic Network Broadband 1×2 Switch [Cornell, 2008] Broadband 2×2 Switch Transmission [Shacham, NOCS ’07] Lightwave Research Lab, Columbia University 9/17/2018
Hybrid Circuit-Switched Photonic Network Photonic Transmission Electronic Control Compute Lightwave Research Lab, Columbia University 9/17/2018
Hybrid Circuit-Switched Photonic Network International Symposium on Networks-on-Chip 9/17/2018
Hybrid Circuit-Switched Photonic Network Goal: to establish these circuit paths from anywhere on the network to any DRAM access point [Bergman, HPEC ’07] International Symposium on Networks-on-Chip 9/17/2018
Photonic DRAM Access DIMM DIMM DIMM 9/17/2018 Memory gateway Fiber / PCB waveguide Memory gateway DIMM DIMM Photonic + electronic DIMM Procesor gateway To network Modulators needed to send commands to DRAM Processor / cache electronic Chi p boundary Photonic switch Modulators Practice these. generates memory control commands cntrl Memory Control Network Interface Mem cntrl To/From network Lightwave Research Lab, Columbia University 9/17/2018
Memory Transaction DIMM DIMM DIMM Memory gateway DIMM 3 DIMM DIMM Procesor gateway To network 2 1 Read or write request is initiated from local or remote processor, travels on electronic network Processor Gateway forwards it to Memory gateway Memory gateway receives request 1 Processor / cache Chi p boundary Lightwave Research Lab, Columbia University 9/17/2018
Memory READ Transaction 4) MC receives READ command 5) Switch is setup from modulators to DIMM, and from DIMM to network 6) Path setup travels back to receiving Processor. Path ACK returns when path is set up 7) Row/Col addresses sent to DIMM optically 8) Read data returned optically 9) Path torn down, MC knows how long it will take 8 Photonic switch 7 Modulators Commands can be sent and data returned in the same switch setup 5 Control 4 8 6 Lightwave Research Lab, Columbia University 9/17/2018
Memory WRITE Transaction 4) MC receives WRITE command, which is also a path setup from the processor to memory gateway 5) Switch is setup from modulators to DIMM 6) Row/Col addresses sent to DIMM 7) Switch is setup from network to DIMM 8) Path ACK sent back to Processor 9) Data transmitted optically to DIMM 10) Path torn down from Processor after data transmitted 9 Photonic switch 6 Modulators Controller must alternate DRAM commands with incoming write data 5 7 Control 4 8 Lightwave Research Lab, Columbia University 9/17/2018
Optical Circuit Memory (OCM) Anatomy Packe t Format λ Detector Bank DRAM_OpticalTransceiver Cntrl Data Bank ID Burst length DLL Latches Mux Addr/cntrl (25) Modulator Bank Nλ Nλ Data (64) Row address Col address drivers clk t tRCD tCL Fiber Coupling All chips accessed in parallel. OR Waveguide Coupling VDD, Gnd Lightwave Research Lab, Columbia University 9/17/2018
Advantages of Photonics Decoupled energy-distance relationship No long traces to drive and synch with clock DRAM chips can run faster Less power Less pins on DIMM module and going into chip Eventually required by packaging constraints Waveguides can achieve dramatically higher density due to WDM DRAM can be arbitrarily distant – fiber is low loss Simplified memory control logic – no contending accesses, contention handled by path setup Accesses are optimized for large streams of data Lightwave Research Lab, Columbia University 9/17/2018
Experimental Setup - Photonic System: Photonic Torus Tile 2cm×2cm chip 8×8 Photonic Torus 28 DRAM Access points (MCs) 2 DIMMs per DRAM AP Routers: 256 b buffers 32 b packet size 32 b channels 32 nm tech. point (ORION) High Vt Vdd = 0.8 V Freq = 1 GHz Photonics - 13λ Traffic: DRAM: Random core-DRAM access point pairs Random read/write Uniform message sizes Poisson arrival at 1µs Modeled with our event-driven DRAM model DDR3 (10-10-10) @ 1600 MT/s 8 chips per DIMM, 8 banks per Chip Lightwave Research Lab, Columbia University 9/17/2018
Performance Comparison Lightwave Research Lab, Columbia University 9/17/2018
Experiment #2 Random Statically Mapped Address Space 9/17/2018 Lightwave Research Lab, Columbia University 9/17/2018
Results Lightwave Research Lab, Columbia University 9/17/2018
Network Energy Comparison Electronic Mesh Photonic Torus This only network power (no DRAM power, MC). Laser pwr more clear Power = 0.42 W Power = 13.3 W Total Power = 2.53 W (Including laser power) Lightwave Research Lab, Columbia University 9/17/2018
Summary Extending a photonic network to include access to DRAM looks good for many reasons: Circuit-switching allows large burst lengths and simplified memory control, for increased bandwidth. Energy efficient end-to-end transmission Alleviates pin count constraints with high-density waveguides PhotoMAN Lightwave Research Lab, Columbia University 9/17/2018