1 A Deficit Round Robin 20MB/s Layer 2 Switch Muraleedhara Navada Francois Labonte.

1 A Deficit Round Robin 20MB/s Layer 2 Switch Muraleedhara Navada Francois Labonte

2 Fairness in Switches How to provide fair bandwidth allocation at output link ? –Simple FIFO favors greedy flow Separate flows into FIFOs at output –Bit by Bit fair queuing –Weighted Fair Queuing allows different weight for flows –Packetized Weighted Fair Queuing (aka PGPS) calculates departure time for each packet Output Queued Switch 50100 50 150 Round-Robin bit by bit allocation

3 Deficit Round Robin Packetized Weighted Fair Queuing is complicated to implement Deficit Round Robin keeps track of credits for each flow –Flow sends according credits –Add credits according to weight –Essentially PWFQ at coarser level 50 150 7550100 75 Credits 50 150 7550100 25 75 Credits 50 150 50100 150 Credits Time

4 NetFPGA System 8 Port 10MB/s duplex ethernet Control FPGA (CFPGA) handles physical interface (MAC) Our design targets both the User FPGAs (UFPGA) CFPGA UFPGA1 UFPGA0 1MB SRAM 10MB/s Ethernet

5 Design Considerations 4 MACs behind each port (8) Each flow is a unique Source Address – Destination Address pair –~1024 flows Split across FPGAs –Each UFPGAs read incoming packets from different ports(0-3 and 4-7) –tradeoff between memory storage and fairness across all flows

6 Memory Buffer Allocation Static Partitioning of 1MB SRAM across 512 flows gives 2kbytes per flow < 2 max size packets Need more dynamic allocation –Segments: smaller size means less fragmentation, but more pointer and list handling overhead 128 bytes was chosen –Keep free segments list –Save on-chip only pointer to head and tail of each flow P4 P5 P6 P1 P2 P3

7 MAC address Learning Instead of telling which MAC addresses belong to which port Learn them from the source address –Note that our split FPGA design (reading from different ports) require them to communicate the MACs learned between them When destination MAC is not learned yet, broadcast (send to all other ports). So MAC learning implies broadcast capability

8 Read Operation Master Control Packet Memory Manager MAC Learning Flow Assignment DRR Engine Control Handler 1 MB SRAM CFPGA Interface DA, SA Flow ID Flow Tail Length, ptr Read, port Share SA

9 Write Operation Master Control Packet Memory Manager MAC Learning Flow Assignment DRR Engine Control Handler 1 MB SRAM CFPGA Interface Head, length Next head, length, latency Write, port Port REQ Port GNT Data Ready

10 DRR Engine How to handle 512 flows and stay work conserving: –Only one flow active at any time –DRR allocation happens on dequeuing –Fifos contain the next flow to be serviced for each port Statistics per flow –Weight –Latency –Byte sent –Packet sent –Packets active FLOW data 512 x 160bits SRAM Port 0 FIFOPort 1 FIFOPort 2 FIFOPort 3 FIFOPort 4 FIFOPort 5 FIFOPort 6 FIFOPort 7 FIFO

11 Conclusion A Deficit Round Robin Switch with 1k flows has been implemented Provides dynamic memory buffer allocation, MAC learning and broadcast Parallel design split across 2 chips Gathers statistics on flows

1 A Deficit Round Robin 20MB/s Layer 2 Switch Muraleedhara Navada Francois Labonte.

Similar presentations

Presentation on theme: "1 A Deficit Round Robin 20MB/s Layer 2 Switch Muraleedhara Navada Francois Labonte."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 A Deficit Round Robin 20MB/s Layer 2 Switch Muraleedhara Navada Francois Labonte.

Similar presentations

Presentation on theme: "1 A Deficit Round Robin 20MB/s Layer 2 Switch Muraleedhara Navada Francois Labonte."— Presentation transcript:

Similar presentations

About project

Feedback