Presentation is loading. Please wait.

Presentation is loading. Please wait.

John DeHart and Mike Wilson SPP V2 Router Design.

Similar presentations


Presentation on theme: "John DeHart and Mike Wilson SPP V2 Router Design."— Presentation transcript:

1 John DeHart and Mike Wilson SPP V2 Router Design

2 2 - Mike Wilson - 10/14/2015 Revision History 3 June 2008 »Initial release, presentation 25 June 2008 »Updates on feedback from presentation

3 3 - Mike Wilson - 10/14/2015 SPP Versions SPP Version 0: »What we used for SIGCOMM Paper SPP Version 1: »Bare minimum we would need to release something to PlanetLab Users SPP Version 2: »What we would REALLY like to release to PlanetLab users.

4 4 - Mike Wilson - 10/14/2015 Objectives for SPP-NPE version 2 Deal with constraints imposed by switch »can send to only one NPU; can receive from only one NPU »split processing across NPUs parsing, lookup on one; queueing on other Provide more resources for slice-specific processing Decouple QM schedulers from links »collection of largely independent schedulers »may use several to send to the same link e.g. separate rate classes (1-10M, 10-100M, 100-100M) optionally adjust scheduler rates dynamically Provide support for multicast »requires addition of next-hop IP address after queueing Enable single slice to operate at 10 Gb/s Support “slow” code options »Use separate rate classes to limit rate to slow code options »LCI QMs for Parse, NPUB QMs for HdrFmt

5 5 - Mike Wilson - 10/14/2015 SPP Version 2 System Architecture GPE Blade SPI Switch Switch Blade NPUA NPUB LC Ingress RTM LC Egress FIC SPI Switch FIC NPE 7010 Blade LC 7010 Blade 1 10Gb/s OR 10 1Gb/s Decap Parse Lookup AddShim Copy QM HdrFormat Default Data Path

6 6 - Mike Wilson - 10/14/2015 SPP Version 2 System Architecture GPE Blade SPI Switch Switch Blade NPUA NPUB LC Ingress RTM LC Egress FIC SPI Switch FIC NPE 7010 Blade LC 7010 Blade 1 10Gb/s OR 10 1Gb/s Decap Parse Lookup AddShim Copy QM HdrFormat Fast-Path Data

7 7 - Mike Wilson - 10/14/2015 SPP Version 2 System Architecture GPE Blade SPI Switch Switch Blade NPUA NPUB LC Ingress RTM LC Egress FIC SPI Switch FIC NPE 7010 Blade LC 7010 Blade 1 10Gb/s OR 10 1Gb/s Decap Parse Lookup AddShim Copy QM HdrFormat Exception Data Path Local Delivery

8 8 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM/0 NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM/3 Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch

9 9 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch

10 10 - Mike Wilson - 10/14/2015 PlanetLab NPE Input Frame from LC Ethernet Header: »DstAddr: MAC address of NPE »SrcAddr: MAC address of LC »VLAN: One VLAN per MR (MR == Slice) Only use lower 12 bits of Vlan Tag IP Header: »Dst Addr: IP address of this node How many IP Addresses can a NODE have? »Src Addr: IP address of previous hop »Protocol: UDP UDP Header: »Dst Port: Identifies input tunnel »Src Port: with IP Src Addr identifies sending entity Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Src Addr (4B) Dst Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Ethernet Trailer Indicates 8-Byte Boundaries Assuming no IP Options

11 11 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Port (4b) Reserved (12b) Eth. Frame Len (16b) Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

12 12 - Mike Wilson - 10/14/2015 RxA No change from V1

13 13 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Rx UDP DPort (16b)Slice ID (VLAN) (16b) MN Frm Offset (16b)MN Frm Length(16b) Rx IP SAddr (32b) Reserved (12b) Rx UDP SPort (16b) Code (4b) Slice Data Ptr (32b) Port (4b) Reserved (12b) Eth. Frame Len (16b) Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1 Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

14 14 - Mike Wilson - 10/14/2015 Decap Inputs: »Packet from RxA Outputs: »Meta-frame (handle, offset and length) »Slice ID (VLAN tag) …or is this lower 12b of VLAN tag and lower 4b of RX DA in? »Metainterface (Rx Saddr, Rx Sport, Rx Dport) »Code Option (4b, only 16 available) »Slice data pointer Initialization: »VLAN table Functionality: »Read VLAN tag from DRAM, determine correct code option. »Validate packet. Drop invalid, unmatched packets. IP Options for NPE dropped in LC, should never arrive here! »Enqueue valid packets to SRAM ring. »Update stats Status: »Change dl_sink from NN to SRAM. »No longer need to update buffer descriptor. …except for min-sized packets, which RxA does not update fully (pkt len)

15 15 - Mike Wilson - 10/14/2015 VLAN table VLANcode_opt slice_data_ptrslice_data_size 0000 1000 ………… 0x0aa1 ………… 0x7ff000 … SD data P data … code_option = 0 implies invalid slice »“on switch” for a slice in the data plane SD data is currently only counters 64B slice data Only use lower 12b of VLAN tag (4096 VLANs) Only changes from V1: »No longer need all data on NPUA, drop HF data, per-slice buffer limits

16 16 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Rx UDP DPort (16b)Slice ID (VLAN) (16b) MN Frm Offset (16b)MN Frm Length(16b) Rx IP SAddr (32b) Reserved (12b) Rx UDP SPort (16b) Code (4b) Slice Data Ptr (32b) Lookup Key[111-80] DA (32b) MN Frm Length (16b)MN Frm Offset (16b) Lookup Key[ 79-48] SA (32b) Lookup Key Proto/TCP_Flags [15- 0] (16b) Exception Bits (12b) Lookup Key[143-112] Type(1b)/Slice ID(15b)/Rx UDP DPort (16b) Code (4b) Lookup Key[ 47-16] Ports (32b) Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1 Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

17 17 - Mike Wilson - 10/14/2015 Parse Inputs: »Meta-frame (handle, offset and length) »Slice ID (VLAN tag) »Tunnel ID (Rx Saddr, Rx Sport, Rx Dport) »Code Option (4b, only 16 available) »Slice data pointer Outputs: »Meta-frame (handle, offset and length) »Lookup key (Includes slice ID, Rx UDP dport) Change to include lower 4b of RX DA in; shave VLAN bits for the SliceID. »Code Option (4b, only 16 available) »Exception bits (MN-specific) Initialization: »Slice Data Functionality: »Slice-specific processing: Parse meta-frame. Extract lookup key. Raise any relevant exceptions. Can pass slice data to HdrFmt in bytes 16..30 of packet. (0..15 are reserved for AddShim) »Substrate processing: Add substrate-specific information to lookup key (32b: Lookup type, Slice ID, Rx UDP dport) Status: »Change to multi-ME synchronization »Read, write to SRAM rings »No longer need all V1 outputs; some have been removed and the rest compacted. (This change is optional, but may remove a memory access) Slice data pointer, Rx UDP sport, Rx UDP Saddr

18 18 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Lookup Key[111-80] DA (32b) MN Frm Length (16b)MN Frm Offset (16b) Lookup Key[ 79-48] SA (32b) Lookup Key Proto/TCP_Flags [15- 0] (16b) Exception Bits (12b) Lookup Key[143-112] Type(1b)/Slice ID(15b)/Rx UDP DPort (16b) Code (4b) Lookup Key[ 47-16] Ports (32b) Result Index (32b) Exception Bits (12b) Slice ID (VLAN) (16b) Code (4b) MN Frm Length (16b)MN Frm Offset (16b)Rsvd(16b)Stats Index (16b) Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1 Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

19 19 - Mike Wilson - 10/14/2015 LookupA Inputs: »Meta-frame (handle, offset and length) »Lookup key (Includes slice ID, Rx UDP dport) »Slice ID (VLAN tag) »Code Option (4b, only 16 available) Outputs: »Meta-frame (handle, offset and length) »Lookup Result (Index into SRAM table on NPUB) 32b is overkill; some of these bits are reserved. »Slice ID (VLAN tag) »Code Option (4b, only 16 available) »Exception bits (from Parse) »Stats Index (from TCAM) Initialization: »Filters set in TCAM by control Functionality: »Look up key in TCAM »On miss, drop the packet Status: »Local Delivery is now handled at LookupB in SRAM table »Lookup result is now just a 32b index »No longer need all V1 input/outputs; some have been removed and the rest compacted. (This change is optional)

20 20 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Result Index (32b) Exception Bits (12b) Slice ID (VLAN) (16b) Code (4b) MN Frm Length (16b)MN Frm Offset (16b)Rsvd(16b)Stats Index (16b) Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1 Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

21 21 - Mike Wilson - 10/14/2015 AddShim Inputs: »Meta-frame (handle, offset and length) »Lookup Result (Index into SRAM table on NPUB) »Slice ID (VLAN tag) »Code Option (4b, only 16 available) »Exception bits (from Parse) »Stats Index (from TCAM) Outputs: »Shim Packet (buffer handle) Buffer descriptor contains updated offset and length, if needed Initialization: »None. Functionality: »Prepend shim header to preserve packet annotations across NPU’s »Overwrite the existing ethernet header (Up to 18B) with: Slice ID (16b) Code Option (4b) Exception Bits (12b) MN Frame Offset (16b) Result Index (32b) Stats Index (16b) [This is the same on NPUA, NPUB] 32B for opaque slice data. Ø Proper memory alignment required Ø This is written by Parse, not AddShim! Status: »New. Stub version is written.

22 22 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

23 23 - Mike Wilson - 10/14/2015 TxA Sends shim packet to NPUB. Unmodified 10 Gbps Tx 2×ME.

24 24 - Mike Wilson - 10/14/2015 SPP Version2 NPUA to NPUB Frame SHIM (16B) »Slice ID (16b) »Code Option (4b) »Exception Bits (12b) »Result Index (32b) »Stats Index (16b) »Offset of MN Packet (16b) »Memory Alignment Padding (4B) IP Header, UDP Header may be overwritten by: »opaque slice data, written in Parse PAD (nB) CRC (4B) UDP Payload (MN Packet) Src Addr (4B) Dst Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) SHIM (16B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) Type=IP (2B) IP Header UDP Header Ethernet Trailer Indicates 8-Byte Boundaries Assuming no IP Options

25 25 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Port (4b) Reserved (12b) Eth. Frame Len (16b) Buffer Handle(24b) Reserved (8b)

26 26 - Mike Wilson - 10/14/2015 RxB No change from V1

27 27 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Frame Length (16b) Stats Index (16b) Buffer Handle(24b) Reserved (8b) Reserved (12b) PerSchedQID (15b) Sch 3b QM 2b Port (4b) Reserved (12b) Eth. Frame Len (16b) Buffer Handle(24b) Reserved (8b)

28 28 - Mike Wilson - 10/14/2015 LookupB/Copy Inputs: »Shim packet (buffer handle, frame length) Outputs: »Packet (buffer handle, frame length) »QueueID (QM, Scheduler, Queue ID) »Stats Index Initialization: »ResultTable Functionality (Overview) »Copy shim header into buffer descriptor »Look up routing information from result index »If multicast, make the copies »Enqueue to correct QM (from ResultTable)

29 29 - Mike Wilson - 10/14/2015 LookupB/Copy – Code Sketch if not currently processing mcast packet read packet from SRAM ring extract shim load ResultTable value fill buffer descriptor if unicast if per-slice packet limit permits update per-slice packet count write to SRAM ring for correct QM. (By qmschedID in result table value). else drop buffer else start mcast processing if per-slice packet limit permits update per-slice packet count fetch first header buffer descriptor if payload length ≠ 0 write ref count into payload descriptor else drop payload buffer else drop buffer finish mcast processing else (Currently processing buffer, have empty header buffer handle) fill header buffer descriptor only chain if payload buffer is not empty if still making copies fetch next header buffer descriptor else finish mcast processing write current header buffer handle to SRAM ring for correct QM. (By qmschedID). signal next ME

30 30 - Mike Wilson - 10/14/2015 ResultTable – Unicast Data needed to enqueue, rewrite packet: »QID QMID, SchedID, QID (20b) (Lookup Result) »Src MI: IP Saddr (32b) (Per SchedID Table) UDP Sport (16b) (Lookup Result) »Tunnel Next Hop IP DAddr (32b) (Lookup Result) IP DPort (16b)(Lookup Result) »Chassis Addressing Ethernet Dst MAC (48b) (Per SchedID Table) »Slice Specific Lookup Result Data (?) (Lookup Result) Ethernet Src MAC »Should be constant across all pkts. QID (20b) IP DAddr (32b) UDP DPort (16b) UDP SPort (16b) Results Entry: IP SAddr (32b) Eth DA (48b) Per Sched Entry: HFIndex (16b)

31 31 - Mike Wilson - 10/14/2015 ResultTable – Multicast Fanout gives the number of copies (0..15) Data needed per copy on NPUB: »QID QMID, SchedID, QID (20b) (Lookup Result) »Src MI: IP Saddr (32b) (Per SchedID Table) UDP Sport (16b) (Lookup Result) »Tunnel Next Hop IP DAddr (32b) (Lookup Result) IP DPort (16b)(Lookup Result) »Chassis Addressing Ethernet Dst MAC (48b) (Per SchedID Table) »Slice Specific Lookup Result Data (?) (Lookup Result) Ethernet Src MAC »Should be constant across all pkts. Support Multicast but optimize for Unicast Fanout (4b) QID (20b) IP DAddr (32b) UDP DPort (16b) UDP SPort (16b) Results Entry: IP SAddr (32b) Eth DA (48b) Per Sched Entry: HFIndex (16b) ×16

32 32 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Frame Length (16b) Buffer Handle(24b) Stats Index (16b) Reserved (8b) Reserved (12b) PerSchedQID (15b) Sch 3b QM 2b Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

33 33 - Mike Wilson - 10/14/2015 QM No change from V1 »Incorporates recent change to limit queues by #pkts Some changes in how control allocates bandwidth »Need to ensure that slow HdrFmt blocks can’t tie up the system »Currently looking at worst-case engineering (everyone runs at slowest block speed)

34 34 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1 Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

35 35 - Mike Wilson - 10/14/2015 HdrFmt / SubEncap Inputs: »Buffer Handle »Remaining inputs from Buffer Descriptor: Multicast or Unicast (from buffer_next) Frame length, offset HFIndex (index into HFTable, a slice-specific table) QMSchedID (for per-sched lookup in ResultTable) Outputs: »Packet (buffer handle) Buffer descriptor contains updated offset and length Initialization: »HFTable, containing slice-specific data. For IPv4, this contains next-hop information (for both multicast and unicast traffic). Functionality: »Substrate level: read buffer descriptor and pass frame offset, length, HFIndex, mcast/ucast to slice- specific HdrFmt »Slice level: arbitrary processing. For IPv4, this writes the next-hop information. Returns new offset, length of frame. »Substrate level: Encapsulate for output tunnel (from ResultTable) Status: »Significant re-write at substrate level »Slice-specific code should change very little (add multicast support)

36 36 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1 Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

37 37 - Mike Wilson - 10/14/2015 Scr2NN/FreelistMgr Inputs: »Buffer Handle (possibly chained) Outputs: »Buffer Handle (possibly chained) Initialization: »None Functionality: »Combines Freelist Manager with Scr2NN glue »FM: Read from scratch ring. Free buffers, correctly handling chained buffers and reference counts. »Scr2NN: Read from Scratch, write to NN. Status: »Both blocks exist, but combining them is not straight-forward. Open question: how should we prioritize among these tasks? The author should ensure that no deadlock is possible. (TxB writes to FM; if FM ring is full, TxB stalls. If Scr2NN is writing to TxB, it stalls. Gridlock.)

38 38 - Mike Wilson - 10/14/2015 SRAM TCAM SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt/ SubEncap (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram NPUA NPUB SPI Switch Switch Blade GPE SRAM Scr2NN/ Freelist (1 ME) AddShim (1 ME) Decap (1 ME) Parse (8 ME) LookupA (1 ME) TxA (2 ME) SPI Switch Buffer Handle(24b) Rsv (3b) Intf (4b) V1V1

39 39 - Mike Wilson - 10/14/2015 TxB Must support chained buffers »Multicast uses header buffers and payload buffers »Headers are slice-specific; we can’t rely on known, static lengths as we did in ONL. Sends header from one buffer, payload from chained buffer. »Can TX do this? Comments in the code seem to imply that chained (non-SOP) buffers must start at offset 0. Our payloads usually won’t. According to DZar, this will probably take some TX modification, but there’s no reason why it won’t work. Might have a performance penalty, of course….

40 40 - Mike Wilson - 10/14/2015 SPP V2 SideB SRAM Buffer Descriptor HFIndex is an index into the HFTable. For IPv4, this provides Next Hop information. ResultIndex is used to get tunnel header info from the ResultTable Buffer_Next (32b) LW0 LW1 LW2 LW3 LW4 LW5 LW6 Packet_Next (32b) LW7 Reserved (4b) Free_list 0000 (4b) Ref_Cnt (8b) Slice ID(xsid)(12b)Stats Index (16b) ResultIndex (32b) Buffer_Size (16b) Packet_Size (16b) Offset (16b) Reserved (4b) MR Exception Bits (16b)HFIndex (16b) MR Bits (optional) (32b)

41 41 - Mike Wilson - 10/14/2015 Design Questions Small hole for abuse in HdrFmt »QM rate limits on payload length »HdrFmt (after QM) can vastly increase packet length »Should the LookupB table give the padding size for each entry? Enforced in SubEncap? »ANSWER: No, we will resort to our control of HdrFmt to force it to behave. (We write all of the code options right now.) What are the best places to update stats on NPUB? »ANSWER: Post-Q only Is there any remaining reason that NPUB would need the source tunnel information? »ANSWER: No. If a code option needs it, put it into opaque slice data. Still working out remaining data areas.

42 42 - Mike Wilson - 10/14/2015 Extra Slides The rest of the slides are old or for extra information

43 43 - Mike Wilson - 10/14/2015 Questions/Issues 4/28/08: »How many code options? Limit of 16? »To handle slow Code Options: LCI Queues would control traffic to Fast/Slow Parse Code Ø Classes of code options defined by how long their Parse code takes. Ø Scheduler assigned to a class of code option. NPE Queues would control traffic to Fast/Slow HF Code LCE Queues control the output rate to Interfaces. »Multicast Problems: Impact of multicast traffic overloading Lookup/Copy and becoming a bottleneck. »Rx on SideB, can it use SRAM output ring? All our other 10G Rx’s have NN output ring. »Option for HF to send out additional pkts? »How to pass MR and substrate hdrs to TxB? Through Ring or through Hdr Buffer associated with Hdr Buffer descriptor. Ø If the latter then what are the constraints in Tx for buffer chaining?

44 44 - Mike Wilson - 10/14/2015 Meeting Notes 1/15/08: »QM: Add Pkt count to Queue Params, change limit from QLen to PktCount »Add Per Slice Pkt limit to NPUA and NPUB »Limit Fanout to 16 »MCast: Control will allocate all 16 entries for a multicast result entry, result entry will be typed as multicast or unicast and will not transition from one to the other. »What happens to pkts in queues when there is a route change that sends that flow’s pkts to a different interface and queue? Pkt ordering problems?

45 45 - Mike Wilson - 10/14/2015 SRAM TxA (2 ME) TCAM Decap, Parse, LookupA, AddShim (8 MEs) SRAM Stats (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt (4 MEs) Stats (1 ME) SRAM NPE Version 2 Block Diagram Lookup produces resultIndx, statsIndx slice#, resultIndx, etc, passed in shim Lookup on yields fanout, list of QiDs; copy to queues, adding copy#; (slice#, resultIndx remain in packet buffer) use slice# to select slice to format packet; use resultIndx to get next-hop flow control? for unicast, resultIndx replaced by QiD; allowing output side to skip lookup SPI Switch NPUA NPUB SPI Switch Switch Blade GPE

46 46 - Mike Wilson - 10/14/2015 Questions/Issues Where are exit and entry points for packets sent to and from the GPE for exception processing? »Parse (NPUA) and LookupA (NPUA) are where most exceptions are generated: IP Options No Route Etc. »HdrFormat (NPUB) is where we do ethernet header processing What needs to be in the SHIM going from NPUA to NPUB? »ResultIndex (32b) »Exception Bits (12b) »StatsIndex (16b) »Slice# (12b) »??? Will we support multi-copy in a way similar to the ONL Router? How big can the fanout be? »How many QIDs need to be stored with the LookupB Result? Is there some encoding for the QIDs that can take into account support for multicast and the copy#? For example: Ø Multicast QID(20b) –Multicast (1b): 1 –Copy# (4b) –PerMulticast QID(15b): One PerMulticast QID allocated for each Multicast Ø Unicast QID(20b) –Unicast (1b): 0 –QID (19b) Are there timing/synchronization issues with adding, deleting or changing lookup entries between the two NPUs databases? Do we need flow control between TxA and RxB?

47 47 - Mike Wilson - 10/14/2015 SRAM TxA (2 ME) TCAM Decap, Parse, LookupA, AddShim (8 MEs) SRAM Stats (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt (4 MEs) Stats (1 ME) SRAM NPE Version 2 Block Diagram flow control? SPI Switch NPUA NPUB SPI Switch Switch Blade GPE NPUA: »RxA:Same as Version 0 »TxA: New 10Gb/s »Decap: Same as Version 0 »Parse: Same as Version 0 New code options? »LookupA: Results will be different from Version 0 »AddSim: New

48 48 - Mike Wilson - 10/14/2015 SRAM TxA (2 ME) TCAM Decap, Parse, LookupA, AddShim (8 MEs) SRAM Stats (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt (4 MEs) Stats (1 ME) SRAM NPE Version 2 Block Diagram flow control? SPI Switch NPUA NPUB SPI Switch Switch Blade GPE NPUB: »RxB:Same as Version 0 »TxB: New 10Gb/s with L2 Header coming in on input ring? »LookupB: New »Copy: New, may be able to use some code from ONL Copy »QM: New, decoupled from Links »HF: New, may use some code from Version 0

49 49 - Mike Wilson - 10/14/2015 SRAM TxA (2 ME) TCAM Decap, Parse, LookupA, AddShim (8 MEs) SRAM StatsA (1 ME) RxA (2 ME) SRAM Queue Manager (4 MEs) RxB (2 ME) TxB (2 ME) LookupB &Copy (2 ME) HdrFmt (4 MEs) StatsB (1 ME) SRAM NPE Version 2 Block Diagram flow control? SPI Switch NPUA NPUB SPI Switch Switch Blade GPE SRAM FreeList MgrB (1 ME) Scr2NN (1 ME) Sram2NN (1 ME) NPUB has 17 MEs currently spec’ed FreeList MgrA (1 ME)

50 50 - Mike Wilson - 10/14/2015 SPP V2: MR Specific Code Where does the MR Specific Code reside in V2: »Parse »HdrFormat What about LookupA and LookupB? »Lookup is a “service” provided to the MRs by the Substrate. »No MR specific code needed in LookupA or LookupB What about SideA AddShim? »The Exception bits that go in the shim are MR Specific but they should be passed to AddShim and it will write them into the Shim. »No MR Specific code needed in AddShim. What about SideB Copy? »Is there anything MR specific about setting up multiple copies of a packet? There shouldn’t be. We will have the Copy block allocate a new hdr buffer descriptor and link it to the existing data buffer descriptor and take care of reference counts. The actual building of the new header(s) for the copies will be left to HF. »No MR Specific code needed in Copy.

51 51 - Mike Wilson - 10/14/2015 SPP V2: Hdr Format Lots of changes for HF: »Move behind QM »More general: Support multiple source IP Addresses General support for Tunnels Ø Eventually different kinds of tunnels (UDP/IP, GRE, …)? »Support for Multicast Dealing with header buffer descriptors Reading Fanout table »Substrate portion of HF will need to do Decap type table lookup Slice ID  (Code Option, Slice Memory Pointer, Slice Memory Size) HF gets a buffer descriptor from the QM »The Substrate portion of HF must determine: Code Option (8b) Slice ID (12b) Location of Next Hop information (20b - 32b) Ø LD vs. FWD? Stats Index (16b) Ø Should HF do this of QM? »The MR portion of HF must determine: Exception bits (16b) Lets put all of the above data in the Buf Desc »LookupB/Copy will need to write it there based on what comes across from SideA in the shim

52 52 - Mike Wilson - 10/14/2015 SPP V2: Result We need to be much more general in our support for Tunnels, Interfaces, MetaInterfaces, and Next Hops. SideB Result: »Interface IP SAddr (32b) Eth MAC DAddr (48b) (LC, GPE1, GPE2, …, GPEn) SchedulerId (8b): which QM should handle pkt »TxMI: IP Sport (16b) »TxNextHop: IP DAddr (32b) IP DPort (16b)

53 53 - Mike Wilson - 10/14/2015 Data Areas Where are the tables and what data is transmitted from SideA to SideB? SideA Tables Shim between SideA and SideB SideB Tables

54 54 - Mike Wilson - 10/14/2015 Pkt Processing Data and Tables SideA: »MR/Slice Table: Generated by Control Used by: Ø Substrate Decap to retrieve a MR/Slice’s parameters Indexed by SliceId == VLAN Contains: –Code option –Slice Memory ptr –Slice Memory size –??? »TCAM: Generated by Control Used by: Ø LookupA Contains: Ø Key: Ø Result:

55 55 - Mike Wilson - 10/14/2015 Data Areas Shim between SideA and SideB »Written to DRAM Buffer to be sent from SideA to SideB »Contains: resultIndex (32b): Ø Generated by Control Ø Result of TCAM lookup on SideA Ø Translates into an SRAM Address on SideB exceptionBits (12b) Ø Generated by SideA Parse/Lookup Ø Used by: –SideB HF statsIndex (16b) Ø Generated by Control Ø Result of TCAM lookup on SideA Ø Used by: –SideA Lookup/AddShim to increment counters –SideB Lookup/Copy to increment PreQ Cntrs (or perhaps SideA is the PreQ cntrs) –SideB HF or QM to increment PostQ Cntrs sliceId (12b) Ø Generated by Control Ø Result of Decap read of Ethernet hdr (VLAN) Ø Used by: –??? codeOption (4b) Slice Memory Ptr (32b)

56 56 - Mike Wilson - 10/14/2015 Data Areas SideB »Data Buffer Descriptor »Hdr Buffer Descriptor Used for multi-copy packets SPP V2 may require Tx to handle multi-buffer packets. Ø It is unclear if we can cleanly do that same thing that we do with ONL where HF passes the Ethernet header to Tx. Ø We may also need to have support for MR specific per copy data »Results Table Generated by Control Used by: Ø LookupB/Copy Ø HF –Should HF get its per copy info from here as well. Contains: Ø Fanout (if fanout is > 1 we can overload some of the following fields with a pointer into a Fanout table) Ø QID Ø InterfaceId Ø TxMI Id –Probably doesn’t help to make it an index into a table for UDP Tunnels since UDP Port is 16 bits –But for tunnels other than UDP tunnels it may help? Ø TX NextHop Id –Index into a table of Tunnel Next Hops

57 57 - Mike Wilson - 10/14/2015 Data Areas (continued) SideB (continued) »Fanout Table Generated by Control Used by: Ø LookupB/Copy Ø HF Contains: Ø QID[Fanout] Ø InterfaceId Ø TxMI Id Ø Tx Next Hop ID[Fanout] Implementation Choices: Ø One contiguous block of memory –Fixed size or variable sized Ø Chained with one set of values per entry Ø Chained with N (N=4?) sets of values per entry


Download ppt "John DeHart and Mike Wilson SPP V2 Router Design."

Similar presentations


Ads by Google