Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on.

Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on Citation: 6

Network-on-chip (NoC) has become a critical shared resource in the emerging Chip Multiprocessor (CMP) era. Most prior NoC designs have used the same type of router across the entire network. While this homogeneous network design eases the burden on a network designer, partitioning the resources equally among all routers across the network does not lead to optimal resource usage, and hence, affects the performance-power envelope. In this work, we propose to apportion the resources in an NoC to leverage the non-uniformity in network resource demand. Our proposal includes partitioning the network resources, specifically buffers and links, in an optimal manner. 2

This approach results in redistributing resources such that routers that require more resources are allocated more buffers and wider links compared to routers demanding fewer resources. This results in a novel heterogeneous network, called HeteroNoC, which is composed of two types of (1)routers – small power efficient routers, and (2)big high performance routers. We evaluate a number of heterogeneous network configurations, composed of big and small routers, and show that giving more resources to routers along the diagonals in a mesh network provides maximum benefits in terms of performance and power. We also show the potential benefits of the HeteroNoC design by (1)co-evaluating it with memory- controllers and (2)configuring it with an asymmetric CMP consisting of heterogeneous cores. 3

Mesh Architecture is composed by..  Router,Core Router Architecture is composed by..  Crossbar, Buffer, SA(Switch Allocator) 4

In traditional NOC mesh architecture, Whether it is reasonable to use the same network structure design? Need to review the higher part of Mesh network utilization(Buffer utilization, Link utilization). Central Diagonal 5

What can we do?  Network resource Re-distribution (Called HeteroNoC).  How many big and small router?  How to place can get the maximum benefit? (Performance, Power) What is that we can a HeteroNoC design be leveraged to achieve better performance/power in a CMP?  Combined HeteroNoC and re-placement memory-controllers. 6 Introduction

HeteroNoC Buffer Re-distribution  The number of VCs affects power consumption and performance.  Small router (2VCs)  Big router (6VCs) HeteroNoC Link Re-distribution  Flit width 192b -> 128b (flit width significantly affects router power.)  Wide link 256b  Narrow link 128b HeteroNoC combined Link and Buffer Re-distribution  Two 128b flits can be combined to be simultaneously sent over the wider link. 7

The total number of VCs and network bi-section bandwidth are kept the same in the baseline and heterogeneous networks.  VCs(Virtual Channel) : 。 4 Baseline router (VCs = 4 x 3) = 3 Small router + 1 Big router (VCs = 2 x 3 + 6)  Bandwidth(Link resource) : 。 W(homo) x N = W(hetero) x N(narrow) + 2W(hetero) x N(wide) Reduction in power consumption  0.67 x N^2 >= 0.3 x N(s) + 1.19 x (N^2 – N(s)) (N(s) > 38) 8

Reposition the Resources(Three type layout)  Only links to redesign (+B).  Combined router and link redesign (+BL). Three kinds of layout placement (Big router)  Center layout  Row 2_5 layout  Diagonal layout 9

In the HeteroNoC architecture with combined buffer and link re-distribution, we use both 128b and 256b links with flit size being 128b. When communication takes place between a small and a big router (or two big routers) between which a 256b link exists, two 128b flits can be combined to be simultaneously sent over the wider link. 10 Baseline Small router to Small routerTwo Small router to Big router Big router to Big routerTwo Small router to Big routerBig router to Small router

Impact on Buffer Read/Write Stage(Left figure)  The primary overhead comes from including the second layer of smaller muxes. Impact on SA Stage(Right figure)  The area overhead of additional arbiters is around 2.5% of the router area (obtained from Synopsys synthesis). 11

Experimental Methods  Synthetic traffic 。 Network-only Analysis  Standard Application Benchmark 。 System-level Analysis 。 IPC(Inter-Process Communication) improvement Performance/power improvement  Average latency  Average power consumption  Network throughput  Latency breakdown  Power breakdown 12

Evaluate three placement layouts  Diagonal (Best layout)  Center (middle layout)  Row 2_5 (Poor layout) Buffer-only redistribution Combined buffer and link redistribution 13

Latency reduction reason Placement Big router in a crowded place. Combine two flits and transmit them simultaneously over the wider link. Power reduction reason Buffer reduction(33%) Crossbar power(Reduction the flit size ) 14

Commercial Application PARSEC Application 15

16 The Diagonal+BL has the best results with 12% and 10% average improvements in IPC. We not only use 33% fewer buffer and save in network power consumption, but also see IPC improvement.

Co-evaluating it with memory-controllers. Reduce request-response latency when cache miss. Reduce request to memory controller latency. 17

Conclusion This paper propose HeteroNoC designing a heterogeneous network, composed of big and small routers, by redistributing the buffer and link bandwidth. The diagonals layout performs signiﬁcantly better than the traditional homogeneous network under a variety of trafﬁc patterns. My Comment It proposed a good idea for NoC improvement, And is very simple to implement. 18

Redesign network router and link architecture  Small power efficient routers.  Big high performance routers.  Narrow link (width =128 bits).  Wide link (width =256 bits). Reposition the Resources(Three type layout)  Only links to redesign.  Combined router and link redesign. 19

20 Network-only analysis  UR(uniform random) pattern.  NN(Nearest Neighbor) pattern. Use other network topology Tours. Standard Application Benchmark

What’s problem High single-threaded performance when thread parallelism is low. High throughput when thread parallelism is high. Modify routing algorithms Using Table-based routing (Packet to/from large core) 21

(1)Should we design a heterogeneous NoC with two types (small and big) of routers? (2)If yes, how many such big/small routers do we need and how do we redistribute the buffer and link width between these routers without changing the original bisection width and buffer resources? (3)Is there an optimal placement of big routers that would maximize the performance and power benefits compared to the baseline homogeneous mesh? and (4) How else can a Het- eroNoC design be leveraged to achieve better performance/power in a CMP? 22

23 Network-only analysis(Synthetic)  UR(uniform random) pattern  NN(Nearest Neighbor) traffic pattern(Worse results)  Bit-complement traffic pattern  Self-similar traffic pattern Use other network topology Tours. Standard Application Benchmark Network analysis IPC(Inter process)

Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on.

Similar presentations

Presentation on theme: "Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on.

Similar presentations

Presentation on theme: "Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on."— Presentation transcript:

Similar presentations

About project

Feedback