Presentation is loading. Please wait.

Presentation is loading. Please wait.

Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and.

Similar presentations


Presentation on theme: "Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and."— Presentation transcript:

1 Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and Dr. Avinoam Kolodny

2 February, 2008NoC Seminar2 Outline Network on Chip (NoC) and QNoC Capacity Allocation (Joint work with Zvika Guz) Hot Modules in Wormhole NoCs Summary Module HS Module R R RRR R RRRRR RR R R R R

3 February, 2008NoC Seminar3 System on Chip (SoC) Interconnect Explosion in the number of modules in a single chip Networks are replacing system busses Low area Low power Better scalability Higher parallelism Spatial reuse Unicast

4 February, 2008NoC Seminar4 Grid topology Packet-switched XY Routing Service-levels Wormhole hop-to-hop flow-control QNoC Architecture Module R R R R RR R R R RRRRR RRRRR RRRRR R Router Link E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, “ QoS Architecture and Design Process for Cost-Effective Network on Chip ”, Journal of Systems Architecture, 2004

5 February, 2008NoC Seminar5 Data D7 Wormhole Flow-Control D0D1D2D3D4D5D6 Dest. TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL Flit-based communication Flit SL: Service level (0/1/2/3) Type: Head/Body/Tail flit Destination appears (only) in the header flit Each flit must include a Type field

6 February, 2008NoC Seminar6 IP1 Interface IP2 Wormhole Routing Interface Suits well on chip interconnect Small number of buffers Low latency Virtual Channels for concurrent flits transmission on the same link -Flits of different packets are locally labeled

7 February, 2008NoC Seminar7 Quality of Service in QNoC Defined by throughput and latency requirements -e.g. Interrupts, real time, block transfers -Implemented using separated buffers (service levels) and static priority policy Requirements should be met at low cost -Design parameters -Run-time mechanisms High Bandwidth Low Latency

8 February, 2008NoC Seminar8 Module QNoC Design Flow Define inter- module traffic Place modules Allocate link capacities Verify QoS and cost R R RRR R RRRRR RR R R R RRR R RRRRR RR RRRR RR R RR R R R R

9 February, 2008NoC Seminar9 Module R R RRR R RRRRR RR R R R R QNoC Design Flow Module R R RRR R RRRRR RR R R R R Define inter- module traffic Place modules Allocate link capacities Verify QoS and cost Too low capacity results in poor QoS Too high capacity wastes power/area Module R R RRR R RRRRR RR R R R R R R RRR R RRRRR RR R R R R

10 February, 2008NoC Seminar10 Use Existing Algorithms…? Efficient algorithms exist for store-and- forward networks These algorithms are useless for wormhole networks, as they ignore inter- link dependencies

11 February, 2008NoC Seminar11 Our Approach Analytical model to forecast QoS Capacity Allocation algorithm that exploit the model Z. Guz, I. Walter, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny “ Efficient Link Capacity and QoS Design for Wormhole Network-on Chip ”, accepted to Design, Automation and Test (DATE), 2006 Z. Guz, I. Walter, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, “ Network Delays and Link Capacities in Application- Specific Wormhole NoCs ”, VLSI Design, 2007

12 February, 2008NoC Seminar12 Delay Analysis - Goal s1s1 d2d2 s2s2 d1d1 R R RRR R RRRRR RR R R R R Replace extensive simulations, with an analytical model to forecast QoS Approximate per-flow latencies Given: -Network topology -Communication demands -Link capacities

13 February, 2008NoC Seminar13 Though many wormhole analyses exists, they don ’ t fit because they assume: -symmetrical communication demands -no virtual channels -identical link capacity! Generally, they calculate the delay of an “ average flow ” -A per-flow analysis is needed Delay Analysis – Prior work 1/4

14 February, 2008NoC Seminar14 Delay Analysis – Prior work 2/4 H. Sarbazi-Azad, A. Khonsari and M. Ould-Khaoua, “ Performance Analysis of Deterministic Routing in Wormhole K-ary n-cubes with Virtual-Channels ”, Journal of Interconnection Networks, 2002

15 February, 2008NoC Seminar15 Delay Analysis – Prior work 3/4 Approximate the delay of an “ average flow ” H. Sarbazi-Azad, A. Khonsari and M. Ould-Khaoua, “ Performance Analysis of Deterministic Routing in Wormhole K-ary n-cubes with Virtual-Channels ”, Journal of Interconnection Networks, 2002

16 February, 2008NoC Seminar16 Delay Analysis – Prior work 4/4 S. Loucif and M. Ould-Khaoua, “Modeling Latency in Deterministic Wormhole-Routed Hypercubes under Hot-Spot Traffic”, The Journal of Supercomputing, 2004

17 February, 2008NoC Seminar17 Wormhole Delay Analysis Network Topology Communication Demands Links’ Capacity Per-flow Latencies

18 February, 2008NoC Seminar18 Delay Analysis - Basics Focus on long packets Packet transmission can be divided into two separated phases: -Path acquisition -Flits ’ transmission For simplicity, we assume “ enough ” VCs on every link -Path acquisition time is negligible

19 February, 2008NoC Seminar19 IP1 Interface IP2 Interface Main Observation The delivery resembles a pipeline pass

20 February, 2008NoC Seminar20 IP1 Interface IP2 Interface The delivery time of long packets is dominated by the slowest link -Transmission rate -Link sharing Packet Delivery Time Low-capacity link

21 February, 2008NoC Seminar21 IP1 Interface IP2 Packet Delivery Time The delivery time of long packets is dominated by the slowest link -Transmission rate -Link sharing IP3

22 February, 2008NoC Seminar22 Analysis Basics Determines the flow ’ s effective bandwidth Per link Account for interleaving t t

23 February, 2008NoC Seminar23 Single Hop Flow, no Sharing - mean time to deliver a flit of flow i over link j [sec] - capacity of link j [bits per sec] - flit length [bits/flit]

24 February, 2008NoC Seminar24 The Effect of Sharing H. Sarbazi-Azad, A. Khonsari and M. Ould-Khaoua, “ Performance Analysis of Deterministic Routing in Wormhole K-ary n-cubes with Virtual-Channels ”, Journal of Interconnection Networks, 2002 Use heuristics to model “ flit interleaving delay ” of each link on its path

25 February, 2008NoC Seminar25 - mean time to deliver a flit of flow i over link j - capacity of link j [bits per second] - flit length [bits/flit] - total flit injection rate of all flows sharing link j except for flow i [flits/sec] Single Hop Flow, with Sharing Bandwidth used by other flows on link j

26 February, 2008NoC Seminar26 The Convoy Effect Consider inter-link dependencies -Wormhole backpressure -Traffic jams down the road Link Load Account for all subsequent hops Basic delay weighted by distance

27 February, 2008NoC Seminar27 Total Packet Transmission Time Slowest link dominates transmission time Packet size [flits/packet] Account for weakest link

28 February, 2008NoC Seminar28 Source Queuing And finally:

29 February, 2008NoC Seminar29 Analysis Validation Analytical model was validated using simulations -Different link capacities -Different communication demands Analysis and Simulation vs. Load Utilization Normalized Load

30 February, 2008NoC Seminar30 Per-Flow Validation Example

31 February, 2008NoC Seminar31 Capacity Allocation Problem Use the delay analysis to solve an optimization problem Given: - System topology and routing - Each flow’s bandwidth ( f i ) and delay bound ( T i REQ ) Minimize total link capacity Such that:

32 February, 2008NoC Seminar32 Capacity Allocation Algorithm Greedy, iterative algorithm For each src-dst pair: Use delay model to identify most sensitive link Increase its capacity Repeat until delay requirements are met

33 February, 2008NoC Seminar33 Capacity Allocation – Example#1 A simple 4-by-4 system with uniform traffic pattern and uniform requirements “ Classic ” design: 74.4Gbit/sec Using the delay model and algorithm: 69Gbit/sec Total capacity reduced by 7% Before optimization After optimization 00010203 10111213 20212223 30313233

34 February, 2008NoC Seminar34 A More Realistic Case

35 February, 2008NoC Seminar35 DVD Decoder - Results A SoC-like system with specific traffic demands and delay requirements “ Classic ” design: 41.8Gbit/sec Using the algorithm: 28.7Gbit/sec Total capacity reduced by 30% After optimization Before optimization 00010203 10111213 20212223

36 February, 2008NoC Seminar36 Cost Reduction by Slack Elimination

37 February, 2008NoC Seminar37 Results - Flow Latencies

38 February, 2008NoC Seminar38 Example#3 - VOPD Application Video Object Plane Decoder “ Classic ” design: 640Gbit/sec Using the algorithm: 369Gbit/sec Total capacity reduced by 40%

39 February, 2008NoC Seminar39 Summary Capacity Allocation -Simple analytical model, capturing multiple VCs, different link capacities, different communication demands -Allocation algorithm that reduces network cost

40 February, 2008NoC Seminar40 Future Work Extensions -Finite number of VCs -Analytical delay modeling -Allocation algorithm New Applications -Core Placement -Topology selection -Routing

41 February, 2008NoC Seminar41 Outline NoC and QNoC Capacity Allocation (Joint work with Zvika Guz) Hot Modules in QNoC Summary Module HS Module R R RRR R RRRRR RR R R R R

42 February, 2008NoC Seminar42 Hot-Modules NoC is designed and dimensioned to meet QoS requirements -Buffer sizing, routing, router arbitration, link capacities, … NoC designers cannot tune everything -Modules typically have limited capacity High-demanded, bandwidth limited modules create edge bottlenecks -In SoC, often known in advance Off-chip DRAM, on-chip special purpose processor System performance is strongly affected -Even if the NoC has infinite bandwidth

43 February, 2008NoC Seminar43 Hot Module (HM) in NoC Wormhole, BE NoC At high Hot Module utilization, multiple worms “ get stuck ” in the network Two problems arise: -System Performance -Source Fairness IP (HM) Interface

44 February, 2008NoC Seminar44 IP3 Interface IP2 Interface IP1 (HM) Interface HM is not a local problem. Traffic not destined at the HM suffers too! Hot Module Affects the System Problem#1

45 February, 2008NoC Seminar45 Multiple locally fair decisions Global fairness HM Interface The limited, expensive HM resource isn ’ t fairly shared Source Fairness Problem Problem#2

46 February, 2008NoC Seminar46 IP R R R Saturation (Un)Fairness A saturated router divides available BW equally between inputs HMIP RR R RR R RR R Less than 1% of HM BW!

47 February, 2008NoC Seminar47 Blocked Output Ports…

48 February, 2008NoC Seminar48 Related Work Hotspots solution were comprehensively studied in the last two decades (e.g. Pfister and Norton 1985, Duato et al., 2005) Classically, solutions are categorized by the mechanism policy -Avoidance-based (frequently impossible) -Detection-based (requires threshold tuning) -Prevention-based (overhead during light load) And by the mechanism implementation -Central arbitration -Router-based -End-to-end flow-control Seem to draw most attention

49 February, 2008NoC Seminar49 Router-Based Solutions X-Bar Input Buffer Output Buffer Solving HS by routers -Virtual circuit -Fair queuing -Dedicated queues -Deflective routing -Packet combining -Packet dropping -Backpressure (credit/rate based) -and more … Routers can(?) detect congested periods -Easier in store-and-forward networks

50 February, 2008NoC Seminar50 Router-Based Solutions QNoC routers are simple Fast, power and area efficient -A few buffers -Efficient routing -Simple arbitration policy -No state/flow memory X-Bar Input Buffer Output Buffer

51 February, 2008NoC Seminar51 Related Work Examples: - “ Self-Tuned Congestion Control for Multiprocessor Networks ”, M. Thottethodi, A. R. Lebeck and S. Mukherjee, HPCA 2000 - “ A New Scalable and Cost-Effective Congestion Management Strategy for Lossless Multistage Interconnection Networks ”, J. Duato, I. Johnson, J. Flich, F. Naven, P. Garcia and T. Nachiondo, HPCA 2005 A few end-to-end solutions do exist -Stop-and-wait based -Do not prevent hotspot effects -Do not address fairness problem

52 February, 2008NoC Seminar52 Our Approach Problem is not caused by the NoC -But rather by a congested end-point Solution should address the root cause -Not the symptoms Utilize existing NoC infrastructure Solve both problems -Simple and efficient

53 February, 2008NoC Seminar53 Hot Module Congestion During congested periods, sources should not inject packets towards the HM -Will experience increased delay anyway -Better wait at the source, not in the network Keep routers unmodified!

54 February, 2008NoC Seminar54 IP1 Control IP4 NoC Interface IP3 IP2 (HM) HM Allocation Control Basics Interface Allocation Controller Interface

55 February, 2008NoC Seminar55 IP1 IP4 NoC Interface IP3 IP2 (HM) Interface Control HM Allocation Control Basics Allocation Controller Interface

56 February, 2008NoC Seminar56 IP1 Control NoC IP2 (HM) Allocation Controller Interface IP3 IP4 Interface HM Allocation Control Basics Interface

57 February, 2008NoC Seminar57 HM Control Packets The HM Controller receives all requests and can employ any scheduling policy Control packets are sent using a high service level -Bypassing (blocked) data packets! Dest. Req. Credit Source Dest. Credit Source Credit request packetCredit reply packet

58 February, 2008NoC Seminar58 QNoC Router Input Port #1 Input Port #5 Output Port #1 Output Port #5 R RR R Module R R R R R

59 February, 2008NoC Seminar59 Enhanced Request packet The request may include additional data as needed -payload ’ s priority, deadline, expiration time, etc. Dest. Deadline Expiration Priority Req. Credit Source …… Optional fields Credit request packet

60 February, 2008NoC Seminar60 …… ExpirationdeadlinePrioritySizeSRC The HM Allocation Controller is customized according to system ’ s requirements HM Allocation Controller Pending Requests Table Local Arbiter Credit Requests Credit Replies Requests Decoder Reply Encoder Optional HM Access Controller

61 February, 2008NoC Seminar61 Short packets are not negotiated Source ’ s quota is slowly self-refreshing The mechanism is turned-off when the network is not congested Crediting modules ahead of time hides request-grant latency -For light-load periods Further Enhancements

62 February, 2008NoC Seminar62 Not Classic Flow-Control Flow-control protects destination ’ s buffer -A pair-wise protocol HM access regulation protects the system -Many-to-one protocol

63 February, 2008NoC Seminar63 Results – Synthetic scenario Hotspot traffic -All-to-one traffic with all-to-all background traffic High network capacity Limited hot module bandwidth HM controller arbitration: Round-robin Module HM Module R R R R RR R RRRR RRRR R

64 February, 2008NoC Seminar64 System Performance Without regulation With Regulation X30 X10 Average Packet Latency

65 February, 2008NoC Seminar65 Hot vs. non-Hot Module Traffic HM Traffic without regulation Background Traffic Without regulation HM Traffic with regulation Background Traffic With regulation Using regulation, non-HM traffic latency is drastically reduced X40 Average Packet Latency

66 February, 2008NoC Seminar66 Source Fairness Source#16 no regulation Source#5 no regulation Source#5 with regulation Source#16 with regulation 2 6 1 5 3 7 4 8 1091112 14131516 R R R R RR R RRRR RRRR R

67 February, 2008NoC Seminar67 Fairness in Saturated Network Hot-Module Utilization: 99.99% Regulated Hot-Module Utilization: 98.32% Simulation results for a 4-by-4 system, Data packet length: 200 flits Control packet length: 2 flits No allocation control With allocation control

68 February, 2008NoC Seminar68 MPEG-4 Decoder Real SoC Over provisioned NoC Two hot-modules VU AU MED CPU RAST SDRAM SRAM1 SRAM2 IDCT ADSP UP SAMP BAB RISC 25% of all traffic 22% of all traffic SDRAMSRAM2

69 February, 2008NoC Seminar69 Results – MPEG-4 Decoder @80% load: X2 reduction @80% load: X8 reduction All traffic HM/non-HM traffic breakdown X2 X8

70 February, 2008NoC Seminar70 The HMs are better utilized Without regulation, the hot-modules are only 60% utilized -Traffic to one HM blocks the traffic to the other! No allocation control With allocation control 1  HM1 2  HM1 3  HM1 4  HM1 9  HM1 10  HM1 11  HM1 8  HM2 10  HM2 11  HM2 12  HM2 Total Flows destined at HM1 Significant differences in BW! Flows destined at HM2

71 February, 2008NoC Seminar71 Hot-Module Placement

72 February, 2008NoC Seminar72 Future Work Dynamically set hot-modules Other scheduling policies at hot-module controller Single/Multiple control modules for multiple HMs Effect of Placement

73 February, 2008NoC Seminar73 Summary Hot-modules are common in real SoCs Hot-modules ruin system performance and are not fairly shared -Even in NoCs with infinite capacity -The network intensifies the problem -But can also provide tools for resolving it Simple mechanism achieves dramatic improvement -Completely eliminating the HM effects Hot-Modules, Cool NoCs!

74 February, 2008NoC Seminar74 Thank you! Questions? zigi@tx.technion.ac.il Hot-Modules, Cool NoCs! QNoC Research Group Group Research QNoC


Download ppt "Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and."

Similar presentations


Ads by Google