Silicon Nanophotonic Network-On-Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf, Luca P. Carloni, Keren Bergman
Why Photonics? OPTICS: ELECTRONICS: Photonics changes the rules for Bandwidth, Energy, and Distance. OPTICS: Modulate/receive high bandwidth data stream once per communication event. Broadband switch routes entire multi-wavelength stream. Off-chip BW = On-chip BW for nearly same power. ELECTRONICS: Buffer, receive and re-transmit at every router. Each bus lane routed independently. (P NLANES) Off-chip BW is pin-limited and power hungry. TX RX RX RX RX RX RX TX RX TX TX TX TX TX TX TX TX
Silicon Photonic Integration Cornell, 2009 Cornell, 2005 Sandia, 2008 Ghent, 2007 Columbia, 2008
Photonic Networks-on-Chip Corona Photonic Clos PhotonicTorus [U. of Wisconsin, HP] [MIT] [Columbia]
Ring Resonators Modulator/filter Broadband λ λ
Circuit-switched P-NoCs 0V 1V n-region p-region Electronic Control Ohmic Heater Thermal Control Transmission Injected Wavelengths Off-resonance profile On-resonance profile S D
Circuit-switched P-NoCs Pros: Cons: Energy-efficient end-to-end transmission High bandwidth through WDM Electronic network still available for small control messages* Network-level support for secure regions Path setup latency Path setup contention (no fairness) Longer paths block more Head-of-line blocking at gateways * [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]
Head of Line Blocking External Concentration* Core Receivers Core Core Electronic Crossbar Control Router To/From Control plane Core Network IF Core Deserialization Receivers Tx/Rx Core Serialization Drivers Core Make it clear that the orange one is the optical switch 5-port photonic switch To/From Data plane Bidirectional Electronic Channel Bidirectional Waveguide External Concentration* * [P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]
TDM Arbitration t2 tC-1 t1 t4 tC-2 t0 t3 tC-3 Time slot 0 Time slot 1 … Time slot T
Synchronous Gateway/Control Time slot ~ 10ns TDM sync clock ~ 100MHz fix
Nonblocking Network Scheduling Time slot 0 Time slot 1 Time slot 2 Required time slots = N-1
[M. Petracca et al. IEEE Micro, 2008] However… Nonblocking topology difficult to implement because of Insertion Loss [M. Petracca et al. IEEE Micro, 2008] * [J. Chan et al. Architectural Exploration of Chip-Scale Photonic Interconnection Network Designs Using Physical-Layer Analysis. JLT, May 2010
Scheduling Time Slots Problem: Constraints: Blocking Network Full coverage Minimize Time Slots (most comm. per slot) Constraints: Source contention Destination contention Topology contention Say we’re doing full coverage, but specialized comm. patterns would be better. Possibly additional slide to set up why/WHEN we do this.
Solution: Genetic Search Initialization Population (size P) Selection (down to size psxP) Reproduction (back to P) Mutation (still P) S S S S S Genetic search established, using it to solve this problem. This is the overall flow. Communication is source-destination pair. Slot 0: c0 Slot 1: c1 … Slot N2: cN2 Slot 0: c0, c5, c7, c8 Slot 1: c23, c6, c58 … Slot T: c42, c65, c1 Fitness = 1/(number of time slots)
Reproduction: Birds and Bees c0, c3, c60, c19 c12, c2, c1, c60 c27, c4 c100, c82, c9 c100, c71, c9 c0 … … c1, c17, c23 c89, c56, c16, c63 C c0, c3, c60, c19 c12, c2, c1, c60
Mutation: Secret of the Ooze c0, c3, c60, c19 c100 c27, c4 c71 c100, c71, c9 c9 … c1, c17, c23 S c0, c3, c60, c19, c9 c100 c27, c4, c100 c71 c9 … c1, c17, c23, c71
Schedule Results Pop size = 50 Mutation prob = 0.8 16-node 36-node
Implementation: Photonic Switch 200µm rings Total switch size = 1.4mm x 1.4mm No S->W, S->E, N->W, N->E (X-then-Y routing) Highlight dimensions, make bigger, or put in bullets. Show paths for implemented/unimplemented paths
Implementation: Switch Control Width of LUT = 12 (number of rings) Length of LUT = T (number of time slots) Say something about overhead (area, power) - small
Implementation: Network Gateway 1. Send request 2. Grant, set x-bar and transmit to serializer 3. Receive, deserialize 4. Store in temp buffer, request to core
Simulation Setup PhoenixSim* – Photonic and Electronic network simulator 64 cores E-mesh, P-mesh, P-TDM Traffic Random – 32B, 1kB, 32kB messages Scientific application traces Put message sizes here. Might want pictures of each network * [Chan et al. PhoenixSim: A Simulator for Physical-Layer Analysis of Chip-Scale Photonic Interconnection Networks. In DATE 2010]
Results – Random Traffic 32B
Results – Random Traffic 32B 1kB
Results – Random Traffic 32B 1kB 32kB
Results – Scientific Applications Benchmark Num Phases Num Messages Total Size (MB) Avg Msg Size (B) Cactus 2 285 7.3 25600 GTC 63 8.1 129796 MADbench 195 15414 86.5 5613 PARATEC 34 126059 5.4 43.3 Say first: higher is better. Maybe efficiency graph (1/et)
Conclusion TDM implements fairness TDM improves network utilization Genetic Search useful for finding full-coverage static schedule Future Work: Scaling gracefully* Reducing time slots* Dynamic scheduling Contact: gilbert@ee.columbia.edu * [Hendry et al. Time-Division-Multiplexed Arbitration in Silicon Nanophotonic Networks-on-Chip for High Perf. CMPs. In JPDC, Jan 2011]