Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim

Similar presentations


Presentation on theme: "Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim"— Presentation transcript:

1 Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science and Engineering Texas A&M University

2 Multi-Core Wave & Networks-On-Chip
Uniprocessors hit the power wall. Multi-processors provide high performance at lower power budget. Shared-bus architecture has scalability limitation. Networks-On-Chip (NOCs) orchestrate chip-wide communications towards future many-core processors. MIT Raw (0.18um, 300MHz) 16-core chip Four 4x4 mesh networks Intel Polaris (65nm, 4GHz) 80-core chip 8x10 mesh network First, let’s look at two changes in our processor design. Lei Wang - NOCS 2009

3 Challenges in On-Chip Communication
High performance Low communication latency is critical for high system performance. Bandwidth-efficient Well-designed routing algorithms provide high network throughput. Power and Area Constraints Simple topologies and slim routers reduce communication power consumption and save chip area. Efficient Multicast supporting Cache coherence protocols heavily rely on multicast or broadcast communication characteristics. We propose a bandwidth-efficient routing for multicast communication in NOCs with low latency and power consumption. Lei Wang - NOCS 2009

4 Prior Work in Multicast Communication
Routing Evaluation Criteria for Multicast Communication [Ni93] Multicast in multicomputer system Tree-based Multicast Routing for DSM Multiprocessor [Torrellas96] Short message multicast in DSM system Virtual Circuit Tree Multicasting for NOCs[Lipasti08] Demonstrate necessity of multicasting on-chip Propose table-based multicast routing Region-based Multicast for CMPs [Duato08] Multicast routing for irregular topology in CMPs Lei Wang - NOCS 2009

5 Outline Motivation Multicast Router Design
State-of-art Unicast Router Architecture Replication Schemes Destination List Management Recursive Partitioning Multicast (RPM) Network Partitioning Routing Rules Example Deadlock Avoidance Evaluation Conclusion Lei Wang - NOCS 2009

6 Different Bandwidth Usage Example
Source Destination 1 2 3 1 2 3 4 5 6 7 4 5 6 7 8 9 10 11 8 9 10 11 12 13 14 15 12 13 14 15 Left Path requires 11 link traversals, 12 buffer writes, 15 buffer reads, and 15 crossbar traversals Right Path requires 5 link traversals, 6 buffer writes, 10 buffer reads, and 10 cross-bar traversals Lei Wang - NOCS 2009

7 State-of-Art Wormhole Unicast Router
RC VA SA ST LT Router Link RC VA SA ST LT Router Link RC: Route Computation VA: VC Allocation; SA: Switch Allocation ST: Switch Traversal; LT: Link Traversal Lei Wang - NOCS 2009

8 What we need in a Multicast Router?
Packet Replication Synchronous Replication Asynchronous Replication Destination List Management All-destination Encoding Bit String Encoding Multiple-region Broadcast Encoding Lei Wang - NOCS 2009

9 Synchronous Replication
Head flit Time (Cycle) M Middle flit 1 2 3 Tail flit T Input 0 Output 0 T M M M H H Input 1 Output 1 Input 2 Output 2 Input 3 Output 3 Packet replication happens at Switch Traversal Stage. Lei Wang - NOCS 2009

10 Asynchronous Replication
Head flit Time (Cycle) M Middle flit 1 2 3 Tail flit T Input 0 Output 0 T M M M M H H Input 1 Output 1 Input 2 Output 2 Input 3 Output 3 Lei Wang - NOCS 2009

11 Network Partitioning Source node N W E S 1 2 3 7 4 8 5 Eight Parts
Source node 2 N 3 7 W E 4 8 5 Eight Parts Three Parts (5, 6, 7) S Three Parts (0, 1, 7) Three Parts (3, 4, 5) Three Parts (1, 2, 3) Lei Wang - NOCS 2009

12 Basic Routing Rules North: top right corner. West: top left corner.
South: bottom left corner. East: bottom right corner. N W E S Source N N E E W W S S Destination Lei Wang - NOCS 2009

13 Optimized Routing Rules
Source Destination Deadlock!!! Lei Wang - NOCS 2009

14 RPM Example-step 1 Multicast Packet Source Destination Partitioning
Lei Wang - NOCS 2009

15 RPM Example-step 2 Multicast Packet Source Destination Partitioning
Ejection Lei Wang - NOCS 2009

16 RPM Example-step 3 Multicast Packet Source Destination Partitioning
Lei Wang - NOCS 2009

17 RPM Example-step 4 Multicast Packet Source Destination Partitioning
Ejection Ejection M M M M Ejection Lei Wang - NOCS 2009

18 RPM Example-step 5 Multicast Packet Source Destination Partitioning
Ejection M M Lei Wang - NOCS 2009

19 Deadlock Avoidance RPM has no turn restrictions, potentially introducing deadlock. We use Virtual Network (VN) to avoid deadlock. Two VNs lie in the same physical network. Virtual Channels of each port are equally divided into each virtual network. Virtual network Id (0 or 1) for each packet is decided at the source. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Virtual Network 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Virtual Network 1 Lei Wang - NOCS 2009

20 Evaluation Methodology
Performance Model: Cycle-accurate Network Simulator Models all router pipeline stages in detail Highly parameterized Power Model: Orion with both dynamic and leakage power models Network configuration Topology 8×8 Mesh (6×6 Mesh, 10×10 Mesh, 16×16 Mesh) Routing RPM VC/Port 4 VC Depth Packet Length (flits) Unicast Traffic Pattern Uniform Random (Bit Complement, Transpose) Multicast Packet Portion 10% (5%, 20%, 40%, 80%) Multicast Destination Number 0 -16 (uniformly distributed) Lei Wang - NOCS 2009

21 Uniform Random Traffic
50% 40% 40% Latency is improved around 50% before network saturation. Network throughput is extended 40%. Lei Wang - NOCS 2009

22 Link Utilization 33% 45% In low workload, RPM saves 33% link utilization. In high workload, RPM saves 45% link utlization. Lei Wang - NOCS 2009

23 Dynamic Power Consumption
50% 40% Lei Wang - NOCS 2009

24 Scalability Study-Network Size
Over 50% Lei Wang - NOCS 2009

25 Scalability Study-Multicast Traffic Portion
Lei Wang - NOCS 2009

26 Scalability Study-Destination Number
Lei Wang - NOCS 2009

27 Conclusion Propose a new multicast routing algorithm, Recursive Partitioning Multicast (RPM) Bandwidth-efficient and Scalable Performance Improvement Up to 50% latency reduction 33% link utilization reduction Power Savings Up to 40% total dynamic power savings 25% crossbar and link power savings Lei Wang - NOCS 2009

28 Thank you! Lei Wang - NOCS 2009

29 Backup Lei Wang - NOCS 2009

30 Hardware Implementation of Routing logic
Lei Wang - NOCS 2009

31 Bit Complement Traffic
Lei Wang - NOCS 2009

32 Transpose Traffic Lei Wang - NOCS 2009


Download ppt "Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim"

Similar presentations


Ads by Google