Download presentation
Published byMorris Allen Parrish Modified over 8 years ago
1
Silicon Nanophotonic Network-On-Chip Using TDM Arbitration
Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf, Luca P. Carloni, Keren Bergman
2
Why Photonics? OPTICS: ELECTRONICS:
Photonics changes the rules for Bandwidth, Energy, and Distance. OPTICS: Modulate/receive high bandwidth data stream once per communication event. Broadband switch routes entire multi-wavelength stream. Off-chip BW = On-chip BW for nearly same power. ELECTRONICS: Buffer, receive and re-transmit at every router. Each bus lane routed independently. (P NLANES) Off-chip BW is pin-limited and power hungry. TX RX RX RX RX RX RX TX RX TX TX TX TX TX TX TX TX
3
Silicon Photonic Integration
Cornell, 2009 Cornell, 2005 Sandia, 2008 Ghent, 2007 Columbia, 2008
4
Photonic Networks-on-Chip
Corona Photonic Clos PhotonicTorus [U. of Wisconsin, HP] [MIT] [Columbia]
5
Ring Resonators Modulator/filter Broadband λ λ
6
Circuit-switched P-NoCs
0V 1V n-region p-region Electronic Control Ohmic Heater Thermal Control Transmission Injected Wavelengths Off-resonance profile On-resonance profile S D
7
Circuit-switched P-NoCs
Pros: Cons: Energy-efficient end-to-end transmission High bandwidth through WDM Electronic network still available for small control messages* Network-level support for secure regions Path setup latency Path setup contention (no fairness) Longer paths block more Head-of-line blocking at gateways * [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]
8
Head of Line Blocking External Concentration* Core Receivers Core Core
Electronic Crossbar Control Router To/From Control plane Core Network IF Core Deserialization Receivers Tx/Rx Core Serialization Drivers Core Make it clear that the orange one is the optical switch 5-port photonic switch To/From Data plane Bidirectional Electronic Channel Bidirectional Waveguide External Concentration* * [P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]
9
TDM Arbitration t2 tC-1 t1 t4 tC-2 t0 t3 tC-3 Time slot 0 Time slot 1
… Time slot T
10
Synchronous Gateway/Control
Time slot ~ 10ns TDM sync clock ~ 100MHz fix
11
Nonblocking Network Scheduling
Time slot 0 Time slot 1 Time slot 2 Required time slots = N-1
12
[M. Petracca et al. IEEE Micro, 2008]
However… Nonblocking topology difficult to implement because of Insertion Loss [M. Petracca et al. IEEE Micro, 2008] * [J. Chan et al. Architectural Exploration of Chip-Scale Photonic Interconnection Network Designs Using Physical-Layer Analysis. JLT, May 2010
13
Scheduling Time Slots Problem: Constraints: Blocking Network
Full coverage Minimize Time Slots (most comm. per slot) Constraints: Source contention Destination contention Topology contention Say we’re doing full coverage, but specialized comm. patterns would be better. Possibly additional slide to set up why/WHEN we do this.
14
Solution: Genetic Search
Initialization Population (size P) Selection (down to size psxP) Reproduction (back to P) Mutation (still P) S S S S S Genetic search established, using it to solve this problem. This is the overall flow. Communication is source-destination pair. Slot 0: c0 Slot 1: c1 … Slot N2: cN2 Slot 0: c0, c5, c7, c8 Slot 1: c23, c6, c58 … Slot T: c42, c65, c1 Fitness = 1/(number of time slots)
15
Reproduction: Birds and Bees
c0, c3, c60, c19 c12, c2, c1, c60 c27, c4 c100, c82, c9 c100, c71, c9 c0 … … c1, c17, c23 c89, c56, c16, c63 C c0, c3, c60, c19 c12, c2, c1, c60
16
Mutation: Secret of the Ooze
c0, c3, c60, c19 c100 c27, c4 c71 c100, c71, c9 c9 … c1, c17, c23 S c0, c3, c60, c19, c9 c100 c27, c4, c100 c71 c9 … c1, c17, c23, c71
17
Schedule Results Pop size = 50 Mutation prob = 0.8 16-node 36-node
18
Implementation: Photonic Switch
200µm rings Total switch size = 1.4mm x 1.4mm No S->W, S->E, N->W, N->E (X-then-Y routing) Highlight dimensions, make bigger, or put in bullets. Show paths for implemented/unimplemented paths
19
Implementation: Switch Control
Width of LUT = 12 (number of rings) Length of LUT = T (number of time slots) Say something about overhead (area, power) - small
20
Implementation: Network Gateway
1. Send request 2. Grant, set x-bar and transmit to serializer 3. Receive, deserialize 4. Store in temp buffer, request to core
21
Simulation Setup PhoenixSim* – Photonic and Electronic network simulator 64 cores E-mesh, P-mesh, P-TDM Traffic Random – 32B, 1kB, 32kB messages Scientific application traces Put message sizes here. Might want pictures of each network * [Chan et al. PhoenixSim: A Simulator for Physical-Layer Analysis of Chip-Scale Photonic Interconnection Networks. In DATE 2010]
22
Results – Random Traffic
32B
23
Results – Random Traffic
32B 1kB
24
Results – Random Traffic
32B 1kB 32kB
25
Results – Scientific Applications
Benchmark Num Phases Num Messages Total Size (MB) Avg Msg Size (B) Cactus 2 285 7.3 25600 GTC 63 8.1 129796 MADbench 195 15414 86.5 5613 PARATEC 34 126059 5.4 43.3 Say first: higher is better. Maybe efficiency graph (1/et)
26
Conclusion TDM implements fairness TDM improves network utilization
Genetic Search useful for finding full-coverage static schedule Future Work: Scaling gracefully* Reducing time slots* Dynamic scheduling Contact: * [Hendry et al. Time-Division-Multiplexed Arbitration in Silicon Nanophotonic Networks-on-Chip for High Perf. CMPs. In JPDC, Jan 2011]
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.