Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 QoS Support.

Similar presentations


Presentation on theme: "Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 QoS Support."— Presentation transcript:

1 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 QoS Support

2 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2 What is QoS? q “Better performance” as described by a set of parameters or measured by a set of metrics. q Generic parameters: q Bandwidth q Delay, Delay-jitter q Packet loss rate (or loss probability) q Transport/Application-specific parameters: q Timeouts q Percentage of “important” packets lost

3 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 What is QoS (contd) ? q These parameters can be measured at several granularities: q “micro” flow, aggregate flow, population. q QoS considered “better” if q a) more parameters can be specified q b) QoS can be specified at a fine-granularity. q QoS vs CoS: CoS maps micro-flows to classes and may perform optional resource reservation per-class q QoS spectrum: Best Effort Leased Line

4 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4 Example QoS q Bandwidth: r Mbps in a time T, with burstiness b q Delay: worst-case q Loss: worst-case or statistical r tokens per second b tokens <= R bps regulator bits slope r Arrival curve d BaBa slope r a time b*R/(R-r)

5 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5 Fundamental Problems q In a FIFO service discipline, the performance assigned to one flow is convoluted with the arrivals of packets from all other flows! q Cant get QoS with a “free-for-all” q Need to use new scheduling disciplines which provide “isolation” of performance from arrival rates of background traffic B Scheduling Discipline FIFO B

6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6 Fundamental Problems q Conservation Law (Kleinrock):  (i)W q (i) = K q Irrespective of scheduling discipline chosen: q Average backlog (delay) is constant q Average bandwidth is constant q Zero-sum game => need to “set-aside” resources for premium services

7 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7 QoS Big Picture: Control/Data Planes

8 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8 Eg: Integrated Services (IntServ) q An architecture for providing QOS guarantees in IP networks for individual application sessions q Relies on resource reservation, and routers need to maintain state information of allocated resources (eg: g) and respond to new Call setup requests

9 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9 Call Admission q Call Admission: routers will admit calls based on their R-spec and T-spec and base on the current resource allocated at the routers to other calls.

10 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 Token Bucket q Characterized by three parameters (b, r, R) q b – token depth q r – average arrival rate q R – maximum arrival rate (e.g., R link capacity) q A bit is transmitted only when there is an available token q When a bit is transmitted exactly one token is consumed r tokens per second b tokens <= R bps regulator time bits b*R/(R-r) slope R slope r

11 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11 Per-hop Reservation  Given b,r,R and per-hop delay d  Allocate bandwidth r a and buffer space B a such that to guarantee d bits b slope r Arrival curve d BaBa slope r a

12 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12 Mechanisms: Queuing/Scheduling q Use a few bits in header to indicate which queue (class) a packet goes into (also branded as CoS) q High $$ users classified into high priority queues, which also may be less populated => lower delay and low likelihood of packet drop q Ideas: priority, round-robin, classification, aggregation,... Class C Class B Class A Traffic Classes Traffic Sources $$$$$$ $$$ $

13 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13 Mechanisms: Buffer Mgmt/Priority Drop q Ideas: packet marking, queue thresholds, differential dropping, buffer assignments Drop RED and BLUE packets Drop only BLUE packets

14 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Classification

15 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15 Why Classification? Providing Value­Added Services Some examples q Differentiated services q Regard traffic from Autonomous System #33 as `platinum­ grade’ q Access Control Lists q Deny udp host 194.72.72.33 194.72.6.64 0.0.0.15 eq snmp q Committed Access Rate q Rate limit WWW traffic from sub­interface#739 to 10Mbps q Policy­based Routing q Route all voice traffic through the ATM network

16 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16 Packet Classification Action ---- PredicateAction Classifier (Policy Database) Packet Classification Forwarding Engine Incoming Packet HEADERHEADER

17 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17 Multi-field Packet Classification Given a classifier with N rules, find the action associated with the highest priority rule matching an incoming packet.

18 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 Prefix matching: 1-d range problem 0 2 32 -1 128.9/16 128.9.16.14 128.9.16/20128.9.176/20 128.9.19/24 128.9.25/24 Most specific route = “longest matching prefix”

19 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 R5 Classification: 2D Geometry problem R4 R3 R2 R1 R7 P2 Field #1 Field #2 R6 Field #1Field #2Data P1 e.g. (128.16.46.23, *) e.g. (144.24/16, 64/24)

20 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20 Packet Classification References q T.V. Lakshman. D. Stiliadis. “High speed policy based packet forwarding using efficient multi-dimensional range matching”, Sigcomm 1998, pp 191-202. q V. Srinivasan, S. Suri, G. Varghese and M. Waldvogel. “Fast and scalable layer 4 switching”, Sigcomm 1998, pp 203-214. q V. Srinivasan, G. Varghese, S. Suri. “Fast packet classification using tuple space search”, Sigcomm 1999. q P. Gupta, N. McKeown, “Packet classification using hierarchical intelligent cuttings”, Hot Interconnects VII, 1999. q P. Gupta, N. McKeown, “Packet classification on multiple fields”, Sigcomm 1999.

21 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21 Proposed Schemes

22 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22 Proposed Schemes (Contd.)

23 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23 Proposed Schemes (Contd.)

24 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24 Scheduling

25 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25 Output Scheduling scheduler Allocating output bandwidth Controlling packet delay

26 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26 Output Scheduling FIFO Fair Queueing

27 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27 Motivation: Parekh-Gallager theorem q Let a connection be allocated weights at each WFQ scheduler along its path, so that the least bandwidth it is allocated is g q Let it be leaky-bucket regulated such that # bits sent in time [t 1, t 2 ] <= g(t 2 - t 1 ) +  q Let the connection pass through K schedulers, where the kth scheduler has a rate r(k) q Let the largest packet size in the network be P

28 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28 Motivation q FIFO is natural but gives poor QoS q bursty flows increase delays for others q hence cannot guarantee delays Need round robin scheduling of packets q Fair Queueing q Weighted Fair Queueing, Generalized Processor Sharing

29 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29 Scheduling: Requirements q An ideal scheduling discipline q is easy to implement: VLSI space, exec time q is fair: max-min fairness q provides performance bounds: q deterministic or statistical q granularity: micro-flow or aggregate flow q allows easy admission control decisions q to decide whether a new flow can be allowed

30 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30 Choices: 1. Priority q Packet is served from a given priority level only if no packets exist at higher levels (multilevel priority with exhaustive service) q Highest level gets lowest delay q Watch out for starvation! q Usually map priority levels to delay classes Low bandwidth urgent messages Realtime Non-realtime Priority

31 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 31 Scheduling Policies: Choices #1 q Priority Queuing: classes have different priorities; class may depend on explicit marking or other header info, eg IP source or destination, TCP Port numbers, etc. q Transmit a packet from the highest priority class with a non-empty queue. Problem: starvation q Preemptive and non-preemptive versions

32 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 32 Scheduling Policies (more) q Round Robin: scan class queues serving one from each class that has a non-empty queue

33 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33 Choices: 2. Work conserving vs. non-work-conserving q Work conserving discipline is never idle when packets await service q Why bother with non-work conserving?

34 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34 Non-work-conserving disciplines q Key conceptual idea: delay packet till eligible q Reduces delay-jitter => fewer buffers in network q How to choose eligibility time? q rate-jitter regulator q bounds maximum outgoing rate q delay-jitter regulator q compensates for variable delay at previous hop

35 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35 Do we need non-work-conservation? q Can remove delay-jitter at an endpoint instead q but also reduces size of switch buffers… q Increases mean delay q not a problem for playback applications q Wastes bandwidth q can serve best-effort packets instead q Always punishes a misbehaving source q can’t have it both ways q Bottom line: not too bad, implementation cost may be the biggest problem

36 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 36 Choices: 3. Degree of aggregation q More aggregation q less state q cheaper q smaller VLSI q less to advertise q BUT: less individualization q Solution q aggregate to a class, members of class have same performance requirement q no protection within class

37 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 37 Choices: 4. Service within a priority level q In order of arrival (FCFS) or in order of a service tag q Service tags => can arbitrarily reorder queue q Need to sort queue, which can be expensive q FCFS q bandwidth hogs win (no protection) q no guarantee on delays q Service tags q with appropriate choice, both protection and delay bounds possible: q eg: differential buffer management, packet drop

38 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 38 Weighted round robin q Serve a packet from each non-empty queue in turn q Unfair if packets are of different length or weights are not equal q Different weights, fixed packet size q serve more than one packet per visit, after normalizing to obtain integer weights q Different weights, variable size packets q normalize weights by mean packet size q e.g. weights {0.5, 0.75, 1.0}, mean packet sizes {50, 500, 1500} q normalize weights: {0.5/50, 0.75/500, 1.0/1500} = { 0.01, 0.0015, 0.000666}, normalize again {60, 9, 4}

39 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 39 Problems with Weighted Round Robin q With variable size packets and different weights, need to know mean packet size in advance q Can be unfair for long periods of time q E.g. q T3 trunk with 500 connections, each connection has mean packet length 500 bytes, 250 with weight 1, 250 with weight 10 q Each packet takes 500 * 8/45 Mbps = 88.8 microseconds q Round time =2750 * 88.8 = 244.2 ms

40 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 40 Generalized Processor Sharing(GPS) q Assume a fluid model of traffic q Visit each non-empty queue in turn (RR) q Serve infinitesimal from each q Leads to “max-min” fairness q GPS is un-implementable! q We cannot serve infinitesimals, only packets

41 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 41 Fair Queuing (FQ) q Idea: serve packets in the order in which they would have finished transmission in the fluid flow system q Mapping bit-by-bit schedule onto packet transmission schedule q Transmit packet with the lowest F i at any given time q Variation: Weighted Fair Queuing (WFQ)

42 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 42 FQ Example F=10 Flow 1 (arriving) Flow 2 transmitting Output F=2 F=5 F=8 Flow 1Flow 2 Output F=10 Cannot preempt packet currently being transmitted

43 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 43 WFQ: Practical considerations q For every packet, the scheduler needs to q classify it into the right flow queue and maintain a linked-list for each flow q schedule it for departure q Complexities of both are o(log [# of flows]) q first is hard to overcome (studied earlier) q second can be overcome by DRR

44 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 44 Deficit Round Robin 50 700 250 400 600 200600100 500 Quantum size 250 500 400 750 1000 Good approximation of FQ Much simpler to implement

45 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 45 WFQ Problems q To get a delay bound, need to pick g q the lower the delay bounds, the larger g needs to be q large g => exclusion of more competitors from link q g can be very large, in some cases 80 times the peak rate! q Sources must be leaky-bucket regulated q but choosing leaky-bucket parameters is problematic q WFQ couples delay and bandwidth allocations q low delay requires allocating more bandwidth q wastes bandwidth for low-bandwidth low-delay sources

46 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 46 Delay-Earliest Due Date (EDD) q Earliest-due-date: packet with earliest deadline selected q Delay-EDD prescribes how to assign deadlines to packets q A source is required to send slower than its peak rate q Bandwidth at scheduler reserved at peak rate q Deadline = expected arrival time + delay bound q If a source sends faster than contract, delay bound will not apply q Each packet gets a hard delay bound q Delay bound is independent of bandwidth requirement q but reservation is at a connection’s peak rate q Implementation requires per-connection state and a priority queue

47 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 47 Rate-controlled scheduling q A class of disciplines q two components: regulator and scheduler q incoming packets are placed in regulator where they wait to become eligible q then they are put in the scheduler q Regulator shapes the traffic, scheduler provides performance guarantees q Considered impractical; interest waning after QoS decline

48 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 48 Examples q Recall q rate-jitter regulator q bounds maximum outgoing rate q delay-jitter regulator q compensates for variable delay at previous hop q Rate-jitter regulator + FIFO q similar to Delay-EDD q Rate-jitter regulator + multi-priority FIFO q gives both bandwidth and delay guarantees (RCSP) q Delay-jitter regulator + EDD q gives bandwidth, delay,and delay-jitter bounds (Jitter- EDD)

49 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 49 Stateful Solution Complexity q Data path q Per-flow classification q Per-flow buffer management q Per-flow scheduling q Control path q install and maintain per-flow state for data and control paths Classifier Buffer management Scheduler flow 1 flow 2 flow n output interface … Per-flow State

50 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 50 Differentiated Services Model q Edge routers: traffic conditioning (policing, marking, dropping), SLA negotiation q Set values in DS-byte in IP header based upon negotiated service and observed traffic. q Interior routers: traffic classification and forwarding (near stateless core!) q Use DS-byte as index into forwarding table Ingress Edge Router Egress Edge Router Interior Router

51 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 51 Diffserv Architecture Edge router: - per-flow traffic management - marks packets as in- profile and out-profile Core router: - per class TM - buffering and scheduling based on marking at edge - preference given to in-profile packets - Assured Forwarding scheduling... r b marking

52 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 52 Diff Serv: implementation q Classify flows into classes q maintain only per-class queues q perform FIFO within each class q avoid “curse of dimensionality”

53 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 53 Diff Serv q A framework for providing differentiated QoS q set Type of Service (ToS) bits in packet headers q this classifies packets into classes q routers maintain per-class queues q condition traffic at network edges to conform to class requirements May still need queue management inside the network

54 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 54 Network Processors (NPUs) Slides from Raj Yavatkar, raj.yavatkar@intel.com

55 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 55 CPUs vs NPUs q What makes a CPU appealing for a PC q Flexibility: Supports many applications q Time to market: Allows quick introduction of new applications q Future proof: Supports as-yet unthought of applications q No-one would consider using fixed function ASICs for a PC

56 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 56 Why NPUs seem like a good idea q What makes a NPU appealing q Time to market: Saves 18months building an ASIC. Code re-use. q Flexibility: Protocols and standards change. q Future proof: New protocols emerge. q Less risk: Bugs more easily fixed in s/w. q Surely no-one would consider using fixed function ASICs for new networking equipment?

57 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 57 The other side of the NPU debate… q Jack of all trades, master of none q NPUs are difficult to program q NPUs inevitably consume more power, q …run more slowly and q …cost more than an ASIC q Requires domain expertise q Why would a/the networking vendor educate its suppliers? q Designed for computation rather than memory- intensive operations

58 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 58 NPU Characteristics q NPUs try hard to hide memory latency q Conventional caching doesn’t work q Equal number of reads and writes q No temporal or spatial locality q Cache misses lose throughput, confuse schedulers and break pipelines q Therefore it is common to use multiple processors with multiple contexts

59 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 59 Network Processors Load-balancing cache Off chip Memory Dispatch CPU Dedicated HW support, e.g. lookups Dedicated HW support, e.g. lookups Dedicated HW support, e.g. lookups Dedicated HW support, e.g. lookups Incoming packets dispatched to: 1. Idle processor, or 2. Processor dedicated to packets in this flow (to prevent mis-sequencing), or 3. Special-purpose processor for flow, e.g. security, transcoding, application-level processing.

60 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 60 Network Processors Pipelining cache Off chip Memory CPU cache CPU cache CPU cache CPU Dedicated HW support, e.g. lookups Dedicated HW support, e.g. lookups Dedicated HW support, e.g. lookups Dedicated HW support, e.g. lookups Processing broken down into (hopefully balanced) steps, Each processor performs one step of processing.

61 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 61 NPUs and Memory q Network processors and their memory q Packet processing is all about getting packets into and out of a chip and memory. q Computation is a side-issue. q Memory speed is everything: Speed matters more than size.

62 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 62 NPUs and Memory Buffer MemoryLookupCountersSchedule StateClassification Program DataInstruction Code Typical NPU or packet-processor has 8-64 CPUs, 12 memory interfaces and 2000 pins

63 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 63 Intel IXP Network Processors q Microengines q RISC processors optimized for packet processing q Hardware support for multi-threading q Fast path q Embedded StrongARM/Xscale q Runs embedded OS and handles exception tasks q Slow path, Control plane ME 1ME 2ME n StrongARM SRAMDRAM Media/Fabric Interface Control Processor

64 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 64 NPU Building Blocks: Processors

65 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 65 Division of Functions

66 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 66 NPU Building Blocks: Memory

67 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 67 Memory Scaling

68 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 68 Memory Types

69 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 69 NPU Building Blocks: CAM and Ternary CAM CAM Operation: Ternary CAM (T-CAM):

70 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 70 Memory Caching vs CAM CACHE: Content Addressable Memory (CAM):

71 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 71 Ternary CAMs 10.0.0.0R1 10.1.0.0R2 10.1.1.0R3 10.1.3.0R4 255.0.0.0 255.255.0.0 255.255.255.0 255.255.255.25510.1.3.1R4 ValueMask Priority Encoder Next Hop Associative Memory Using T-CAMs for Classification:

72 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 72 IXP: A Building Block for Network Systems q Example: IXP2800 q 16 micro-engines + XScale core q Up to 1.4 Ghz ME speed q 8 HW threads/ME q 4K control store per ME q Multi-level memory hierarchy q Multiple inter-processor communication channels q NPU vs. GPU tradeoffs q Reduce core complexity q No hardware caching q Simpler instructions  shallow pipelines q Multiple cores with HW multi- threading per chip MEv2 10 MEv2 11 MEv2 12 MEv2 15 MEv2 14 MEv2 13 MEv2 9 MEv2 16 MEv2 2 MEv2 3 MEv2 4 MEv2 7 MEv2 6 MEv2 5 MEv2 1 MEv2 8 RDRAM Controller Intel® XScale™ Core Media Switch Fabric I/F PCI QDR SRAM Controller Scratch Memory Hash Unit Multi-threaded (x8) Microengine Array Per-Engine Memory, CAM, Signals Interconnect

73 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 73 IXP2800 Features q Half Duplex OC-192 / 10 Gb/sec Ethernet Network Processor q XScale Core q 700 MHz (half the ME) q 32 Kbytes instruction cache / 32 Kbytes data cache q Media / Switch Fabric Interface q 2 x 16 bit LVDS Transmit & Receive q Configured as CSIX-L2 or SPI-4 q PCI Interface q 64 bit / 66 MHz Interface for Control q 3 DMA Channels q QDR Interface (w/Parity) q (4) 36 bit SRAM Channels (QDR or Co-Processor) q Network Processor Forum LookAside-1 Standard Interface q Using a “clamshell” topology both Memory and Co-processor can be instantiated on same channel q RDR Interface q (3) Independent Direct Rambus DRAM Interfaces q Supports 4i Banks or 16 interleaved Banks q Supports 16/32 Byte bursts

74 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 74 Hardware Features to ease packet processing q Ring Buffers q For inter-block communication/synchronization q Producer-consumer paradigm q Next Neighbor Registers and Signaling q Allows for single cycle transfer of context to the next logical micro-engine to dramatically improve performance q Simple, easy transfer of state q Distributed data caching within each micro-engine q Allows for all threads to keep processing even when multiple threads are accessing the same data

75 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 75 XScale Core processor q Compliant with the ARM V5TE architecture q support for ARM’s thumb instructions q support for Digital Signal Processing (DSP) enhancements to the instruction set q Intel’s improvements to the internal pipeline to improve the memory-latency hiding abilities of the core q does not implement the floating-point instructions of the ARM V5 instruction set

76 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 76 Microengines – RISC processors q IXP 2800 has 16 microengines, organized into 4 clusters (4 MEs per cluster) q ME instruction set specifically tuned for processing network data q 40-bit x 4K control store q Six-stage pipeline in an instruction q On an average takes one cycle to execute q Each ME has eight hardware-assisted threads of execution q can be configured to use either all eight threads or only four threads q The non-preemptive hardware thread arbiter swaps between threads in round-robin order

77 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 77 MicroEngine v2 128 GPR Control Store 4K Instructions 128 GPR Local Memory 640 words 128 Next Neighbor 128 S Xfer Out 128 D Xfer Out Local CSRs CRC Unit 128 S Xfer In 128 D Xfer In LM Addr 1 LM Addr 0 D-Push Bus S-Push Bus D-Pull BusS-Pull Bus To Next Neighbor From Next Neighbor A_Operand B_Operand ALU_Out P-Random # 32-bit Execution Data Path Multiply Find first bit Add, shift, logical 2 per CTX CRC remain Lock 0-15 Status and LRU Logic (6-bit) TAGs 0-15 Status Entry# CAM Timers Timestamp Prev B B_op Prev A A_op

78 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 78 Registers available to each ME q Four different types of registers q general purpose, SRAM transfer, DRAM transfer, next-neighbor (NN) q 256, 32-bit GPRs q can be accessed in thread-local or absolute mode q 256, 32-bit SRAM transfer registers. q used to read/write to all functional units on the IXP2xxx except the DRAM q 256, 32-bit DRAM transfer registers q divided equally into read-only and write-only q used exclusively for communication between the MEs and the DRAM q Benefit of having separate transfer and GPRs q ME can continue processing with GPRs while other functional units read and write the transfer registers

79 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 79 Different Types of Memory Type of Memory Logical width (bytes) Size in bytesApprox unloaded latency (cycles) Special Notes Local to ME425603Indexed addressing post incr/decr On-chip scratch 416K60Atomic ops 16 rings w/at. get/put SRAM4256M150Atomic ops 64-elem q- array DRAM82G300Direct path to/from MSF

80 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 80 Resource Manager Library Control Plane PDK Control Plane Protocol Stacks Core Components IXA Software Framework Microengine Pipeline XScale™ Core Micro block Micro block Micro block Microblock Library Utility LibraryProtocol Library External Processors Hardware Abstraction Library Microengine C Language C/C++ Language Core Component Library

81 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 81 Micro-engine C Compiler q C language constructs q Basic types, q pointers, bit fields q In-line assembly code support q Aggregates q Structs, unions, arrays

82 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 82 What is a Microblock q Data plane packet processing on the microengines is divided into logical functions called microblocks q Coarse Grain and stateful q Example q 5-Tuple Classification, IPv4 Forwarding, NAT q Several microblocks running on a microengine thread can be combined into a microblock group. q A microblock group has a dispatch loop that defines the dataflow for packets between microblocks q A microblock group runs on each thread of one or more microengines q Microblocks can send and receive packets to/from an associated Xscale Core Component.

83 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 83 XScale™ Core Micro- engines Core Components and Microblocks User-written code Microblock Library Intel/3 rd party blocks Microblock Microblock Library Microblock Core Component Core Component Core Component Core Libraries Core Component Library Resource Manager Library

84 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 84 Debate about network processors Context Data cache(s) DataHdr Characteristics: 1. Stream processing. 2. Multiple flows. 3. Most processing on header, not data. 4. Two sets of data: packets, context. 5. Packets have no temporal locality, and special spatial locality. 6. Context has temporal and spatial locality. Characteristics: 1. Shared in/out bus. 2. Optimized for data with spatial and temporal locality. 3. Optimized for register accesses. The nail: The hammer:


Download ppt "Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 QoS Support."

Similar presentations


Ads by Google