Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Similar presentations


Presentation on theme: "ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer."— Presentation transcript:

1 ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer

2 Ning WengECE 5262 NP Architectures Last class: ─ Key requirement of network processor: flexibility and scalability ─ Optimized instruction set and parallel processing using multiprocessors This class: ─ Internal organization of NP: Computation, storage and communication Operating support Content addressable memory (CAM) ─ NP scaling issues

3 Ning WengECE 5263 NP Architectures NP architecture characteristics ─ Computation Processor hierarchy Special-purpose functional units ─ Storage Memory hierarchy Content addressable memory (CAM) ─ Communication Internal buses External interfaces ─ Operation support Concurrent/parallel execution support Programming models Dispatch mechanisms

4 Ning WengECE 5264 Processor Functionality

5 Ning WengECE 5265 Processor Pyramid

6 Ning WengECE 5266 Packet Flow through Hierarchy Accommodating tasks of different complexity and frequency ─ Low level: simple and frequent processing ─ High level: occasional and complex processing Computation scaling ─ Faster processor ─ More concurrent threads ─ More processors ─ More processor types

7 Ning WengECE 5267 Memory Hierarchy Different memory technologies used for performance, cost and area Conventional Approach: ─ Register + cache + off-chip DRAM Exploiting locality: temporal and spatial ─ Optimized for average case ─ Transparent to programmer Network Processors: ─ Register, scratch pad, control store, onboard RAM, CAM/TCAM, SRAM and SDRAM ─ Specialized for network processing application Little temporal locality ─ Explicit to application developer Different to programming More control ─ Memory hierarchy is not “cached” but used explicitly

8 Ning WengECE 5268 Memory Technology Characterized by access latency, area ─ SRAM: 2-10 ns, 4-6 transistors ─ DRAM: 50-70 ns, 1 or 3 transistors What data should be store where? ─ Instruction data ─ Packets data: header, payload and meta-data ─ Temporal data: data structure allocated on the stack ─ Application data: persistent data, e.g., routing table, rule file

9 Memory Size Example Consider a network system that processes IP datagram. Assume the system executes 5,000 instructions per packet, each instruction occupies 4 bytes, 10% of instructions need to access 4-byte value memory, each datagram consists of 1500 bytes, a lookup examines 10 4-byte values on average in an IP routing table, and a datagram arrives and leaves in an Ethernet frame. Compute the total number of memory locations accessed to process on datagram. Assume no memory caching. ─ Instruction Memory: ─ Packet Memory: ─ Application Memory: ─ Temporary Memory: Total: Ning WengECE 5269

10 Ning WengECE 52610 Memory Scaling Memory access time: raw access speed ─ Technology dependent ─ Important for random access Memory bandwidth ─ Important for overall system performance ─ Scale with Multiple ports Multiple banks Wider bus ─ Limits by Pins and package cost

11 Ning WengECE 52611 Content Addressable Memory Not using address to locate content CAM using content as input in a query-style format Organized as array of slots Combination of mechanisms ─ Random access storage ─ Exact-match pattern search Rapid search enabled with parallel hardware

12 Ning WengECE 52612 Lookup using Conventional CAM Given ─ Pattern for which to search ─ Known as key CAM returns ─ First slot that match key or ─ All slots that match key Algorithm for each slot do { if (key == slot) { declare key matches slot; } else { declare key does not match slot; }

13 Ning WengECE 52613 Ternary CAM (TCAM) Regular CAM ─ Binary value: 0 and 1 ─ Requiring key to match all the content in one slot ─ Not flexible TCAM ─ Ternary value: 0, 1 and don’t care ─ Implemented using masking of entries Good for network processor flow classification

14 Ning WengECE 52614 TCAM Lookup Each slot has bit mask Hardware uses mask to decide which bits to test Algorithm for each slot do { if (key & mask ) == (slot & mask)) { declare key matches slot; } else { declare key does not match slot; }

15 Ning WengECE 52615 Partial Matching using TCAM Key matched slot 1 Packet belonging to flow ID: 00.02 Here “additional information” stored in each slot

16 Ning WengECE 52616 Classification using TCAM Flexibility: “additional information” stored in separate memory Extracting values from fields in headers Forming values in contiguous string Using a key for TCAM lookup Storing classification in slot

17 Ning WengECE 52617 Communication Internal interfaces: channels between processing elements, memories ─ Internal bus ─ Hardware FIFO: sequential access ─ Transfer register: random access ─ Onboard shared memory: shared random access External interfaces ─ Memory interfaces: accesses to larger off-chip memory ─ Direct I/O interfaces: e.g., access to link interfaces ─ Bus interfaces: accesses to other devices, e.g., control CPU ─ Switching fabric interface Access to switching fabric Several standards (e.g., CSIX by NP Forum)

18 Communication Cost Example Consider a second generation network system that forwards IP datagram. If the system has 16 interfaces that each connect to an OC-192 line (data rate is 10 Gbps). These 16 interfaces are interconnected with a shared communication channel. The packet size is in the range of 40 bytes to 1500 bytes. What aggregate bandwidth is needed on the communication channel for the two design scenarios: ─ Every bit of a packet transfers through the shared communication channels. ─ Only a 4-byte packet memory address transfers through the shared communication channels. Ning WengECE 52618

19 Ning WengECE 52619 NP Operating Support Programming model: interrupt, event vs. thread based Parallel and concurrent execution support Dispatch mechanism: how threads are initiated

20 Ning WengECE 52620 Summary NP scaling by ─ Heterogeneous multiprocessors structured hierarchically ─ Mixed memory technologies explicitly available to programmer ─ Different communication mechanisms ─ Operating support important to achieve high system performance NP scaling limited by ─ Physical space: chip area (less than 400 mm 2 ) ─ Pin limits and packaging technology ─ Power consumption and heat dissipation

21 Ning WengECE 52621 For Next Class and Reminder Read Comer: chapter 15 and 16 Homework solution on-line by Friday Midterm: 10/6 Project ─ topic finalized 10/5 (group leader email me) ─ proposal presentation 10/22


Download ppt "ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer."

Similar presentations


Ads by Google