Download presentation
Presentation is loading. Please wait.
1
Module R R RRR R RRRRR RR R R R R Quality of Service in Network on Chip Isask’har (Zigi) Walter Supervised by: Prof. Israel Cidon, Prof. Ran Ginosar and Dr. Avinoam Kolodny
2
February, 2008NoC Seminar2 Outline Network on Chip (NoC) and QNoC Capacity Allocation (Joint work with Zvika Guz) Hot Modules in Wormhole NoCs Summary Module HS Module R R RRR R RRRRR RR R R R R
3
February, 2008NoC Seminar3 System on Chip (SoC) Interconnect Explosion in the number of modules in a single chip Networks are replacing system busses Low area Low power Better scalability Higher parallelism Spatial reuse Unicast
4
February, 2008NoC Seminar4 Grid topology Packet-switched XY Routing Service-levels Wormhole hop-to-hop flow-control QNoC Architecture Module R R R R RR R R R RRRRR RRRRR RRRRR R Router Link E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, “ QoS Architecture and Design Process for Cost-Effective Network on Chip ”, Journal of Systems Architecture, 2004
5
February, 2008NoC Seminar5 Data D7 Wormhole Flow-Control D0D1D2D3D4D5D6 Dest. TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL TYPE SL Flit-based communication Flit SL: Service level (0/1/2/3) Type: Head/Body/Tail flit Destination appears (only) in the header flit Each flit must include a Type field
6
February, 2008NoC Seminar6 IP1 Interface IP2 Wormhole Routing Interface Suits well on chip interconnect Small number of buffers Low latency Virtual Channels for concurrent flits transmission on the same link -Flits of different packets are locally labeled
7
February, 2008NoC Seminar7 Quality of Service in QNoC Defined by throughput and latency requirements -e.g. Interrupts, real time, block transfers -Implemented using separated buffers (service levels) and static priority policy Requirements should be met at low cost -Design parameters -Run-time mechanisms High Bandwidth Low Latency
8
February, 2008NoC Seminar8 Module QNoC Design Flow Define inter- module traffic Place modules Allocate link capacities Verify QoS and cost R R RRR R RRRRR RR R R R RRR R RRRRR RR RRRR RR R RR R R R R
9
February, 2008NoC Seminar9 Module R R RRR R RRRRR RR R R R R QNoC Design Flow Module R R RRR R RRRRR RR R R R R Define inter- module traffic Place modules Allocate link capacities Verify QoS and cost Too low capacity results in poor QoS Too high capacity wastes power/area Module R R RRR R RRRRR RR R R R R R R RRR R RRRRR RR R R R R
10
February, 2008NoC Seminar10 Use Existing Algorithms…? Efficient algorithms exist for store-and- forward networks These algorithms are useless for wormhole networks, as they ignore inter- link dependencies
11
February, 2008NoC Seminar11 Our Approach Analytical model to forecast QoS Capacity Allocation algorithm that exploit the model Z. Guz, I. Walter, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny “ Efficient Link Capacity and QoS Design for Wormhole Network-on Chip ”, accepted to Design, Automation and Test (DATE), 2006 Z. Guz, I. Walter, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, “ Network Delays and Link Capacities in Application- Specific Wormhole NoCs ”, VLSI Design, 2007
12
February, 2008NoC Seminar12 Delay Analysis - Goal s1s1 d2d2 s2s2 d1d1 R R RRR R RRRRR RR R R R R Replace extensive simulations, with an analytical model to forecast QoS Approximate per-flow latencies Given: -Network topology -Communication demands -Link capacities
13
February, 2008NoC Seminar13 Though many wormhole analyses exists, they don ’ t fit because they assume: -symmetrical communication demands -no virtual channels -identical link capacity! Generally, they calculate the delay of an “ average flow ” -A per-flow analysis is needed Delay Analysis – Prior work 1/4
14
February, 2008NoC Seminar14 Delay Analysis – Prior work 2/4 H. Sarbazi-Azad, A. Khonsari and M. Ould-Khaoua, “ Performance Analysis of Deterministic Routing in Wormhole K-ary n-cubes with Virtual-Channels ”, Journal of Interconnection Networks, 2002
15
February, 2008NoC Seminar15 Delay Analysis – Prior work 3/4 Approximate the delay of an “ average flow ” H. Sarbazi-Azad, A. Khonsari and M. Ould-Khaoua, “ Performance Analysis of Deterministic Routing in Wormhole K-ary n-cubes with Virtual-Channels ”, Journal of Interconnection Networks, 2002
16
February, 2008NoC Seminar16 Delay Analysis – Prior work 4/4 S. Loucif and M. Ould-Khaoua, “Modeling Latency in Deterministic Wormhole-Routed Hypercubes under Hot-Spot Traffic”, The Journal of Supercomputing, 2004
17
February, 2008NoC Seminar17 Wormhole Delay Analysis Network Topology Communication Demands Links’ Capacity Per-flow Latencies
18
February, 2008NoC Seminar18 Delay Analysis - Basics Focus on long packets Packet transmission can be divided into two separated phases: -Path acquisition -Flits ’ transmission For simplicity, we assume “ enough ” VCs on every link -Path acquisition time is negligible
19
February, 2008NoC Seminar19 IP1 Interface IP2 Interface Main Observation The delivery resembles a pipeline pass
20
February, 2008NoC Seminar20 IP1 Interface IP2 Interface The delivery time of long packets is dominated by the slowest link -Transmission rate -Link sharing Packet Delivery Time Low-capacity link
21
February, 2008NoC Seminar21 IP1 Interface IP2 Packet Delivery Time The delivery time of long packets is dominated by the slowest link -Transmission rate -Link sharing IP3
22
February, 2008NoC Seminar22 Analysis Basics Determines the flow ’ s effective bandwidth Per link Account for interleaving t t
23
February, 2008NoC Seminar23 Single Hop Flow, no Sharing - mean time to deliver a flit of flow i over link j [sec] - capacity of link j [bits per sec] - flit length [bits/flit]
24
February, 2008NoC Seminar24 The Effect of Sharing H. Sarbazi-Azad, A. Khonsari and M. Ould-Khaoua, “ Performance Analysis of Deterministic Routing in Wormhole K-ary n-cubes with Virtual-Channels ”, Journal of Interconnection Networks, 2002 Use heuristics to model “ flit interleaving delay ” of each link on its path
25
February, 2008NoC Seminar25 - mean time to deliver a flit of flow i over link j - capacity of link j [bits per second] - flit length [bits/flit] - total flit injection rate of all flows sharing link j except for flow i [flits/sec] Single Hop Flow, with Sharing Bandwidth used by other flows on link j
26
February, 2008NoC Seminar26 The Convoy Effect Consider inter-link dependencies -Wormhole backpressure -Traffic jams down the road Link Load Account for all subsequent hops Basic delay weighted by distance
27
February, 2008NoC Seminar27 Total Packet Transmission Time Slowest link dominates transmission time Packet size [flits/packet] Account for weakest link
28
February, 2008NoC Seminar28 Source Queuing And finally:
29
February, 2008NoC Seminar29 Analysis Validation Analytical model was validated using simulations -Different link capacities -Different communication demands Analysis and Simulation vs. Load Utilization Normalized Load
30
February, 2008NoC Seminar30 Per-Flow Validation Example
31
February, 2008NoC Seminar31 Capacity Allocation Problem Use the delay analysis to solve an optimization problem Given: - System topology and routing - Each flow’s bandwidth ( f i ) and delay bound ( T i REQ ) Minimize total link capacity Such that:
32
February, 2008NoC Seminar32 Capacity Allocation Algorithm Greedy, iterative algorithm For each src-dst pair: Use delay model to identify most sensitive link Increase its capacity Repeat until delay requirements are met
33
February, 2008NoC Seminar33 Capacity Allocation – Example#1 A simple 4-by-4 system with uniform traffic pattern and uniform requirements “ Classic ” design: 74.4Gbit/sec Using the delay model and algorithm: 69Gbit/sec Total capacity reduced by 7% Before optimization After optimization 00010203 10111213 20212223 30313233
34
February, 2008NoC Seminar34 A More Realistic Case
35
February, 2008NoC Seminar35 DVD Decoder - Results A SoC-like system with specific traffic demands and delay requirements “ Classic ” design: 41.8Gbit/sec Using the algorithm: 28.7Gbit/sec Total capacity reduced by 30% After optimization Before optimization 00010203 10111213 20212223
36
February, 2008NoC Seminar36 Cost Reduction by Slack Elimination
37
February, 2008NoC Seminar37 Results - Flow Latencies
38
February, 2008NoC Seminar38 Example#3 - VOPD Application Video Object Plane Decoder “ Classic ” design: 640Gbit/sec Using the algorithm: 369Gbit/sec Total capacity reduced by 40%
39
February, 2008NoC Seminar39 Summary Capacity Allocation -Simple analytical model, capturing multiple VCs, different link capacities, different communication demands -Allocation algorithm that reduces network cost
40
February, 2008NoC Seminar40 Future Work Extensions -Finite number of VCs -Analytical delay modeling -Allocation algorithm New Applications -Core Placement -Topology selection -Routing
41
February, 2008NoC Seminar41 Outline NoC and QNoC Capacity Allocation (Joint work with Zvika Guz) Hot Modules in QNoC Summary Module HS Module R R RRR R RRRRR RR R R R R
42
February, 2008NoC Seminar42 Hot-Modules NoC is designed and dimensioned to meet QoS requirements -Buffer sizing, routing, router arbitration, link capacities, … NoC designers cannot tune everything -Modules typically have limited capacity High-demanded, bandwidth limited modules create edge bottlenecks -In SoC, often known in advance Off-chip DRAM, on-chip special purpose processor System performance is strongly affected -Even if the NoC has infinite bandwidth
43
February, 2008NoC Seminar43 Hot Module (HM) in NoC Wormhole, BE NoC At high Hot Module utilization, multiple worms “ get stuck ” in the network Two problems arise: -System Performance -Source Fairness IP (HM) Interface
44
February, 2008NoC Seminar44 IP3 Interface IP2 Interface IP1 (HM) Interface HM is not a local problem. Traffic not destined at the HM suffers too! Hot Module Affects the System Problem#1
45
February, 2008NoC Seminar45 Multiple locally fair decisions Global fairness HM Interface The limited, expensive HM resource isn ’ t fairly shared Source Fairness Problem Problem#2
46
February, 2008NoC Seminar46 IP R R R Saturation (Un)Fairness A saturated router divides available BW equally between inputs HMIP RR R RR R RR R Less than 1% of HM BW!
47
February, 2008NoC Seminar47 Blocked Output Ports…
48
February, 2008NoC Seminar48 Related Work Hotspots solution were comprehensively studied in the last two decades (e.g. Pfister and Norton 1985, Duato et al., 2005) Classically, solutions are categorized by the mechanism policy -Avoidance-based (frequently impossible) -Detection-based (requires threshold tuning) -Prevention-based (overhead during light load) And by the mechanism implementation -Central arbitration -Router-based -End-to-end flow-control Seem to draw most attention
49
February, 2008NoC Seminar49 Router-Based Solutions X-Bar Input Buffer Output Buffer Solving HS by routers -Virtual circuit -Fair queuing -Dedicated queues -Deflective routing -Packet combining -Packet dropping -Backpressure (credit/rate based) -and more … Routers can(?) detect congested periods -Easier in store-and-forward networks
50
February, 2008NoC Seminar50 Router-Based Solutions QNoC routers are simple Fast, power and area efficient -A few buffers -Efficient routing -Simple arbitration policy -No state/flow memory X-Bar Input Buffer Output Buffer
51
February, 2008NoC Seminar51 Related Work Examples: - “ Self-Tuned Congestion Control for Multiprocessor Networks ”, M. Thottethodi, A. R. Lebeck and S. Mukherjee, HPCA 2000 - “ A New Scalable and Cost-Effective Congestion Management Strategy for Lossless Multistage Interconnection Networks ”, J. Duato, I. Johnson, J. Flich, F. Naven, P. Garcia and T. Nachiondo, HPCA 2005 A few end-to-end solutions do exist -Stop-and-wait based -Do not prevent hotspot effects -Do not address fairness problem
52
February, 2008NoC Seminar52 Our Approach Problem is not caused by the NoC -But rather by a congested end-point Solution should address the root cause -Not the symptoms Utilize existing NoC infrastructure Solve both problems -Simple and efficient
53
February, 2008NoC Seminar53 Hot Module Congestion During congested periods, sources should not inject packets towards the HM -Will experience increased delay anyway -Better wait at the source, not in the network Keep routers unmodified!
54
February, 2008NoC Seminar54 IP1 Control IP4 NoC Interface IP3 IP2 (HM) HM Allocation Control Basics Interface Allocation Controller Interface
55
February, 2008NoC Seminar55 IP1 IP4 NoC Interface IP3 IP2 (HM) Interface Control HM Allocation Control Basics Allocation Controller Interface
56
February, 2008NoC Seminar56 IP1 Control NoC IP2 (HM) Allocation Controller Interface IP3 IP4 Interface HM Allocation Control Basics Interface
57
February, 2008NoC Seminar57 HM Control Packets The HM Controller receives all requests and can employ any scheduling policy Control packets are sent using a high service level -Bypassing (blocked) data packets! Dest. Req. Credit Source Dest. Credit Source Credit request packetCredit reply packet
58
February, 2008NoC Seminar58 QNoC Router Input Port #1 Input Port #5 Output Port #1 Output Port #5 R RR R Module R R R R R
59
February, 2008NoC Seminar59 Enhanced Request packet The request may include additional data as needed -payload ’ s priority, deadline, expiration time, etc. Dest. Deadline Expiration Priority Req. Credit Source …… Optional fields Credit request packet
60
February, 2008NoC Seminar60 …… ExpirationdeadlinePrioritySizeSRC The HM Allocation Controller is customized according to system ’ s requirements HM Allocation Controller Pending Requests Table Local Arbiter Credit Requests Credit Replies Requests Decoder Reply Encoder Optional HM Access Controller
61
February, 2008NoC Seminar61 Short packets are not negotiated Source ’ s quota is slowly self-refreshing The mechanism is turned-off when the network is not congested Crediting modules ahead of time hides request-grant latency -For light-load periods Further Enhancements
62
February, 2008NoC Seminar62 Not Classic Flow-Control Flow-control protects destination ’ s buffer -A pair-wise protocol HM access regulation protects the system -Many-to-one protocol
63
February, 2008NoC Seminar63 Results – Synthetic scenario Hotspot traffic -All-to-one traffic with all-to-all background traffic High network capacity Limited hot module bandwidth HM controller arbitration: Round-robin Module HM Module R R R R RR R RRRR RRRR R
64
February, 2008NoC Seminar64 System Performance Without regulation With Regulation X30 X10 Average Packet Latency
65
February, 2008NoC Seminar65 Hot vs. non-Hot Module Traffic HM Traffic without regulation Background Traffic Without regulation HM Traffic with regulation Background Traffic With regulation Using regulation, non-HM traffic latency is drastically reduced X40 Average Packet Latency
66
February, 2008NoC Seminar66 Source Fairness Source#16 no regulation Source#5 no regulation Source#5 with regulation Source#16 with regulation 2 6 1 5 3 7 4 8 1091112 14131516 R R R R RR R RRRR RRRR R
67
February, 2008NoC Seminar67 Fairness in Saturated Network Hot-Module Utilization: 99.99% Regulated Hot-Module Utilization: 98.32% Simulation results for a 4-by-4 system, Data packet length: 200 flits Control packet length: 2 flits No allocation control With allocation control
68
February, 2008NoC Seminar68 MPEG-4 Decoder Real SoC Over provisioned NoC Two hot-modules VU AU MED CPU RAST SDRAM SRAM1 SRAM2 IDCT ADSP UP SAMP BAB RISC 25% of all traffic 22% of all traffic SDRAMSRAM2
69
February, 2008NoC Seminar69 Results – MPEG-4 Decoder @80% load: X2 reduction @80% load: X8 reduction All traffic HM/non-HM traffic breakdown X2 X8
70
February, 2008NoC Seminar70 The HMs are better utilized Without regulation, the hot-modules are only 60% utilized -Traffic to one HM blocks the traffic to the other! No allocation control With allocation control 1 HM1 2 HM1 3 HM1 4 HM1 9 HM1 10 HM1 11 HM1 8 HM2 10 HM2 11 HM2 12 HM2 Total Flows destined at HM1 Significant differences in BW! Flows destined at HM2
71
February, 2008NoC Seminar71 Hot-Module Placement
72
February, 2008NoC Seminar72 Future Work Dynamically set hot-modules Other scheduling policies at hot-module controller Single/Multiple control modules for multiple HMs Effect of Placement
73
February, 2008NoC Seminar73 Summary Hot-modules are common in real SoCs Hot-modules ruin system performance and are not fairly shared -Even in NoCs with infinite capacity -The network intensifies the problem -But can also provide tools for resolving it Simple mechanism achieves dramatic improvement -Completely eliminating the HM effects Hot-Modules, Cool NoCs!
74
February, 2008NoC Seminar74 Thank you! Questions? zigi@tx.technion.ac.il Hot-Modules, Cool NoCs! QNoC Research Group Group Research QNoC
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.