Paper Review Building a Robust Software-based Router Using Network Processors.

Paper Review Building a Robust Software-based Router Using Network Processors

ABSTRACT Need More Service  Software-based Routers Router: IXP1200 Network Processor development board PC 3.47 Mpps (minimum size packets) or 1.77 G of aggregate Hierarchical Architecture: Guarantees line speed for forwarding of simple packets Extra capacity for exceptional packets in P3(310 Kpps and 1510 cycles for each)

INTRODUCTION Most Network Processors use parallelism. IXP1200: 6 Micro Engines each supporting up to 4 hardware contexts. Router with a data plane (MEs) and a control plane (P3). Processor Hierarchy: OSPF, Updating Routing tables, …[More cycles] Missed packets from cache Minimum packet processing, forwarding,…[Fewer cycles]

ARCHITECTURE-Software Classifier Forwarder Scheduler Input queue Two default forwarder: Minimal IP forwarding fast path. Full IP protocol (IP options) Two main attributes: Explicit support for adding new forwarders in run time Does not specify where in Processor hierarchy

ARCHITECTURE-Hardware IXP Evaluation System (200MHz): 32MB DRAM (64-bit 100MHz) 2MB SRAM (32-bit 100MHz) 4KB On-chip scratch 64-bit 66MHz IX bus Ethernet ports(8*100M + 2*1G) 32-bit 100MHz PCI Bus 4KB ISTORE for each ME 4KB I-cache for StrongARM A pair of FIFOs: (16 slot*64 byte) rate of DRAM = 6.4Bbps Send/receive BW = 2*(8*100M+2*1G) = 5.6 Gbps Capacity of IX Bus = 4 Gbps

Forwarding Pipeline The common unit = 64-byte MAC-packet(MP) MAC breaks and tag as first, intermediate, last or only MP in packet Allocating slots to MACs and drains input FIFO and fill output FIFO Can MEs from input FIFO to output FIFO in a single step? 2 stage pipeline:

Input Processing INPUT_LOOP: 1 acquire_input_mutex() 2 if (!port_rdy(p)) goto INPUT_LOOP 3 load IN_FIFO[c] 4 release_input_mutex() 5 mp_addr = calculate mp_addr() 6 copy reg_mp_data IN_FIFO[c] 7 state = protocol_processing(reg_mp_data) 8 copy reg_mp_data  DRAM[ mp_addr] 9 if (at_start_of_packet(state)) 10 enqueue(state, state.queue) 11 goto INPUT_LOOP Validating header Updating TTL Re-computing checksum Set source and dest MACs Destination Queue For IP: Strict FIFO slots and context binding Minimum Forwarder: one-cycle hardware hash

Scheduling & Buffering A Queue that is serviced by StrongARM Statically allocates a set of contexts to run input loop 16 input contexts Token passing (hardware signaling mechanism) to serialize DMA access. 16MB of DRAM (8192 buffers of 2KB) consumed in a circular fashion A shared state variable Buffer scheduling:

Output Processing OUTPUT LOOP: 1 acquire_output_mutex() 2 release_output_mutex() 3 if (finished_last_ packet) 4 qid = select_queue() 5 state = dequeue(qid) 6 mp_addr = first_mp(state) 7 else 8 mp_addr =next_mp(state) 9 fifo_addr = calculate_fifo_addr() 10 copy DRAM[mp_addr]  OUT_FIFO[fifo_addr] 11 enable IN_FIFO[fifo_addr] 12 finished_last_packet =at_end_of_packet(state) 13 goto OUTPUT LOOP Select none empty queue form that port queues (Scheduling)

Queuing Queues are assigned statically to output contexts: Output context saves queues in 16 registers not in scratch memory. Multiple queues. Which one next? By prioritizing queues. Queues: Circular arrays of 32-bit entries in SRAM. 1.Use mutexes. 2.Have queues for each inputs in outputs  Single priority level Contention:

Queuing [cont] I.2 + O.1 I.2 + O.3 : Maximum flexibility I.1 + O.3 : Slower rate

Evaluation For one MP: 280 cycles for register operations 180(DRAM) + 90(SRAM) + 160(Scratch) = 430 cycles for memory Sum = 710 cycles = 3550 ns (for 200 MHz) 3.47 Mpps  each packet is processed in 288 ns Result: The system can forward 12 packets in parallel

Switching Paths Path A: Forward packets at maximum rate of 3.47Mpps Path B: Forward packets at 526 Kpps Path C: Forward packets at 534 Kpps(500cpp) StrongARM is involved too. |No additional tasks for MEs. PRIORITY

StrongARM OS on StrongARM: 1.Acts as a bridge that forward packets to P4 2.Supports a small collection of local forwarders Simple priority scheme: Gives packets being passed to P3 over packets that are to be processed locally. Complicated to decide forwarders: It supports Pentium It shares resources with MEs and can act like them

Virtual Router Processor MEs statically have 2 tasks: A router infrastructure (RI) that is able to forward minimum-sized packets A virtual router processor (VRP) that run additional code on behalf of each packet protocol_processing runs on abstract machine.

Interfacing & Implementation StrongARM interacts with MEs: fid = install(key, fwdr, size, where) remove(fid) data = getdata(fid) setdata(fid, data) (src addr, src port, dst addr, dst port) Key: Installs fwrd that matches the key and specified flow size and where indicates the processor ME: Load from StrongARM to ME’s ISTORE SA: Loads into DRAM PE: Loads into Pentium jump table Where:

Interfacing Some date forwarders:

Conclusions How to program the processor hierarchy with a fixed forwarding infrastructure that fully exploits the parallelism available on the IXP1200 MicroEngines. Demonstrates how new functionality can be injected into all three levels of the processor hierarchy. Statically partition the processing capacity of the MicroEngines into a fixed routing infrastructure and a programmable VRP. Can be used in many designs.

Paper Review Building a Robust Software-based Router Using Network Processors.

Similar presentations

Presentation on theme: "Paper Review Building a Robust Software-based Router Using Network Processors."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Paper Review Building a Robust Software-based Router Using Network Processors.

Similar presentations

Presentation on theme: "Paper Review Building a Robust Software-based Router Using Network Processors."— Presentation transcript:

Similar presentations

About project

Feedback