Presentation is loading. Please wait.

Presentation is loading. Please wait.

Energy-Efficient Time-Division Multiplexed Hybrid-Switched NoC for Heterogeneous Multicore Systems Jieming Yin *, Pingqiang Zhou +, Sachin S. Sapatnekar.

Similar presentations


Presentation on theme: "Energy-Efficient Time-Division Multiplexed Hybrid-Switched NoC for Heterogeneous Multicore Systems Jieming Yin *, Pingqiang Zhou +, Sachin S. Sapatnekar."— Presentation transcript:

1 Energy-Efficient Time-Division Multiplexed Hybrid-Switched NoC for Heterogeneous Multicore Systems Jieming Yin *, Pingqiang Zhou +, Sachin S. Sapatnekar * and Antonia Zhai * * University of Minnesota, Twin Cities, USA + ShanghaiTech University, China 28 th IEEE International Parallel & Distributed Processing Symposium

2 ShanghaiTech 2 Heterogeneous Multicore System GPUCPU GPU L2 MEM Interconnection Network

3 3 On-chip Traffic Characteristics CPU GPU Traffic PatternSwitching Mechanism Erratic Random Latency-sensitive Streaming Dedicated Throughput-intensive Packet Switching Circuit Switching NoCs must handle different traffic differently ShanghaiTech

4 Src node Intm. node1 Intm. node2 Intm. node3 Dest node Src node Intm. node1 Intm. node2 Intm. node3 Dest node data link traversal router pipeline Network delay setup ack Network delay Setup delay data Packet-switchedCircuit-switched link traversal router pipeline Packet Switching vs. Circuit Switching 4 Performance Perspective

5 Packet Switching vs. Circuit Switching Packet-switched Circuit-switched 5 Circuit-switched NoC: potentially energy efficient for certain traffic pattern Allocation & Arbitration Energy Perspective ShanghaiTech

6 Packet Switching  Flexible, Scalable  Latency, Energy Circuit Switching  Latency, Energy  Setup, Maintenance RegularErratic Fixed Frequency Destination Random Packet Switching Circuit Switching Packet Switching 6 Packet Switching or Circuit Switching NoC with both packet and circuit switching? ShanghaiTech

7 Multi-plane vs. Single-plane 7 CS PS PS+CS Multi-plane: Independent packet-switched (PS) and circuit- switched (CS) planes Single-plane: Packet and circuit switching sharing the same communication fabric  Increasing hardware requirement  Low resource utilization How can Packet and Circuit Switching share the same fabric? ShanghaiTech

8 SDM A B C D 4 bits 2 bits 1 bits Space-Division Multiplexing A B C D A B C D 8 (Space-division Multiplexing) PS+CS Physically divide a channel into sub-channels K. Lusala et al., IJRC 2012 S. Secchi et al., DSD 2008 A. K. Lusala, ReCoSoC 2011 M. Modarressi et al., DATE 2009 SDM suffers from packet serialization problem ShanghaiTech

9 A B C D 0 D 1 C 2 B 3 B 4 A 5 A 6 A 7 A time ABCD 8 bits TDM Time-Division Multiplexing A B C D 9 (Time-division Multiplexing) PS+CS We propose TDM-based hybrid-switched NoC ! ShanghaiTech

10 10 Outline Introduction Design TDM-based Hybrid-switching NoC Optimizations for Hybrid Switching Conclusion ShanghaiTech

11 Output 1 BW RC BW RC VA SA ST Packet-switched Pipeline HP ST HP ST Circuit-switched Pipeline Routing Logic Crossbar Input 1 Packet-switched Circuit-switched Slot Table VC Allocator SW Allocator Output n Input n Packet-switched Circuit-switched Slot Table Hybrid-switched Router 11 ShanghaiTech

12 R0R1R2 R3R5R4 Circuit-switched Path Setup 12 R0R1R2R3 t0 t1 t2 t3 t4 t5 t6 t7 CS t0 Set up the path before transmission Setup messages are sent through the packet-switched network Acknowledge the source upon successful setup Keep time-slot assignment in Slot Tables ShanghaiTech

13 in_ in_2 s0 s1 s2 s out_4 1 in_ in_2 s0 s1 s2 s out_4 1 in_ in_2 s0 s1 s2 s out_4 0 in_ in_2 s0 s1 s2 s3 setup 1 (succeed) in_1 → out_4 slot_id = 2 duration = 2 setup 2 (fail) in_1 → out_3 slot_id = 3 duration = 1 teardown 1 in_1 → out_4 slot_id = 2 duration = 2 ①② ③④ vout v v v v v v v Slot Table Configuration Walkthrough 13 ShanghaiTech

14 14 Slot Table Size Smaller slot table Less energy overhead Smaller packet waiting time Coarser-grain multiplexing Larger slot table More energy overhead Longer packet waiting time Finer-grain multiplexing Initial(reset) more request (reset) Slot table V.S. Slot table size should be adjusted dynamically active inactive ShanghaiTech

15 15 Circuit-Switched Path Exclusiveness Slot Table s0 s1 s2 s3 s4 s5 s6 s vout out_3 (PS) out_2 (PS) out_1 Crossbar SW Allocator Crossbar must be configured before a circuit-switched flit’s arrival. Time slot is wasted if circuit-switched flit is not presented. configuration signals Exclusively occupied by circuit-switched paths ShanghaiTech

16 16 Time-slot Stealing SW Allocator Crossbar vout Decoder Line Address valid Slot Table VC Allocator configuration signals CS flit enable From upstream router Enable path reuse between packet- and circuit-switched data paths

17 Routing decision is made based on the utilization of slot tables in neighbor routers Hybrid-switched Network Path Setup – Endpoint Selection: Frequent communication pairs – Route Selection: Adaptive Routing Switching Decision – Referring to packet slack * 17 * J. Yin et al., ISLPED 2012 ShanghaiTech

18 18 CPU Core/ GPU SM/ L2 Cache/ MC R R Full System Evaluation Platform Benchmarks – CPU: ammp, applu, art, equake, gafort, mgrid, swim, wupwise – GPU: blackscholes, lps, lib, nn, hotspot, pathfinder, sto ShanghaiTech

19 19 Performance Evaluation ↑ 0.3% CPU GPU ↑ 4.1% GPU performance is improved CPU performance impact is negligible ShanghaiTech

20 20 Network Energy Evaluation 6.3% saving ShanghaiTech

21 21 Overall – Basic Hybrid-switched NoC CPU SpeedupGPU SpeedupNetwork Energy 0.3% CPU performance improvement 4.1% GPU performance improvement 6.3% Network energy reduction Can we do better? ShanghaiTech

22 22 Outline Introduction Design TDM-based Hybrid-switching NoC Optimizations for Hybrid Switching Conclusion ShanghaiTech

23 Opportunity: Low Path Utilization 23 Circuit-switched paths are under utilized Large number of overlapped circuit-switched paths Circuit-switched paths are not fully utilized Waste of on-chip resource (slot-tables) Overlapped paths ShanghaiTech

24 Circuit-switched Path Hitchhiker-sharing Sources Optimization: Path Sharing Circuit-switched Path Vicinity-sharing Destinations Hitchhiker-sharing Vicinity-sharing 24 Enable path reuse among circuit-switched data paths

25 25 Performance Evaluation ↑ 0.3%↑ 0.2% CPU GPU ↑ 4.1%↑ 3.7% ShanghaiTech

26 26 Network Energy Evaluation Can we do EVEN better? 6.3% saving 9.0% saving ShanghaiTech

27 27 Percentage of flits that are circuit-switched Opportunity: Lower Buffer Pressure Packet-switched Circuit-switched GPU benchmark Circuit-switched flits percent (%) Blackscholes55.7 Hotspot29.1 Lib34.4 Lps55.0 Nn38.9 Pathfinder49.1 Sto18.5 Observation: Circuit switching diverts on-chip traffic, alleviating the buffer pressure on packet- switched data paths. ShanghaiTech

28 Circuit switching some of the packets alleviates buffer pressure, facilitates more aggressive power gating. Input 1 Packet-switched Circuit-switched Slot Table 28 Optimization: Aggressive Power-gating Reduce dynamic and leakage power dissipation active inactive ShanghaiTech

29 29 Performance Evaluation ↑ 0.3%↑ 0.2% CPU GPU ↑ 4.1%↑ 3.7% ↑ 2.6% ↓ 1.6% ShanghaiTech

30 30 Network Energy Evaluation Energy saving is significant 6.3% saving 9.0% saving 17.1% saving ShanghaiTech

31 31 Overall CPU SpeedupGPU SpeedupNetwork Energy 1.6% CPU performance degradation 2.6% GPU performance improvement 17.1% Network energy reduction ShanghaiTech

32 32 Conclusion TDM-based Hybrid-switched Network  TDM is an efficient way to enable on-chip resource sharing  Hybrid-switched NoC handles different traffic differently  Performance  Energy efficiency  Scalability (in paper) TDM-based Hybrid-switched Network  TDM is an efficient way to enable on-chip resource sharing  Hybrid-switched NoC handles different traffic differently  Performance  Energy efficiency  Scalability (in paper) ShanghaiTech


Download ppt "Energy-Efficient Time-Division Multiplexed Hybrid-Switched NoC for Heterogeneous Multicore Systems Jieming Yin *, Pingqiang Zhou +, Sachin S. Sapatnekar."

Similar presentations


Ads by Google