Presentation is loading. Please wait.

Presentation is loading. Please wait.

David M. Zar Block Design Review: PlanetLab Line Card Header Format.

Similar presentations


Presentation on theme: "David M. Zar Block Design Review: PlanetLab Line Card Header Format."— Presentation transcript:

1 David M. Zar dzar@wustl.edu http://www.arl.wustl.edu/projects/techX Block Design Review: PlanetLab Line Card Header Format

2 2 - David M. Zar - 3/8/2016 Revision History 10/31/06 (DMZ): »Initial Draft 11/04/06 (DMZ): »Updates for performance issues

3 3 - David M. Zar - 3/8/2016 Line Card Centric Overview Lookup Phy Int Rx Switch Tx QM/Schd Key Extract Hdr Format Lookup Key Extract Switch Rx Phy Int Tx QM/Schd Hdr Format SWITCHSWITCH Port Splitter Port Splitter (Ingress and Egress): »Accepts packets on a NN ring »Based on the physical destination port number 0-4 go to QM1 on a scratch ring 5-9 go to QM2 on a scratch ring »Measured delay is about 120 cycles, including memory latency

4 Ingress Header Format

5 5 - David M. Zar - 3/8/2016 Ingress Header Format Microengine Usage »One microengine »Eight identical threads »NN ring input from Lookup »NN ring output to Port Splitter Main functions: »Using data from Lookup, modify packet header in DRAM for proper routing to PE: Destination MAC address Ø First five bytes are same as source MAC address Source MAC address Ø Address of this LC VLAN tag »Adjust pre-queue stats counters »Format input data for QM QID Port Number Ethernet Frame Length

6 6 - David M. Zar - 3/8/2016 LC Ingress Functional Blocks Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Src Addr (4B) Dst Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Ethernet Trailer Lookup Phy Int Rx Switch Tx QM/Schd Key Extract Hdr Format Buf Handle(32b) IP Pkt Length (16b) QID (20b) VLAN (16b)Stats Index (16b) DAddr (8b) Port (4b) Reserved (8b) Eth Hdr Len (8b) Stats Index (16b) Buffer Handle(32b) Frame Length (16b) QID(20b) Rsv (4b) Port (4b) Rsv (4b) Type=IP (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Possible Input Packet Formats Ouput Packet Format

7 7 - David M. Zar - 3/8/2016 MAC Address and VLAN Tag (Ingress) The source MAC address is fixed and set at boot time ( _WU_get_mac_address) The destination MAC address will only differ in the last byte and this byte is obtained from the Lookup data. The VLAN tag is obtained from the Lookup data.

8 8 - David M. Zar - 3/8/2016 Stats/Counters (Ingress/Egress) The Stats Index is obtained from the Lookup Data The pre-queue packet and byte counters are updated (_WU_update_counters) »Packet counter is incremented (atomic SRAM) »Byte count is incremented by the number of bytes in the entire Ethernet frame (_WU_get_enet_frame_length). Frame_length = IP_pkt_len + 18 Ø 18 is the VLAN Ethernet header length

9 9 - David M. Zar - 3/8/2016 QM Data Formatting (Ingress and Egress) QID is extracted from Lookup data Port number is extracted from Lookup data Total Ethernet frame length is passed to QM Stats index is passed on for post-queue counters Stats Index (16b) Buffer Handle(32b) Frame Length (16b) QID(20b) Rsv (4b) Port (4b) Rsv (4b)

10 10 - David M. Zar - 3/8/2016 Ingress HF Block Diagram _WU_get_enet_frame_length _ WU_write_vlan_header _ WU_update_counters _WU_update_buffer_descriptor Wait for prev ctx Signal next ctx NN Enqueue Wait for prev ctx Signal next ctx NN Dequeue init signal dl_sink() dl_source() DRAM: 4|5 4B writes Cycles: 26 SRAM: 1 read 1 write Cycles: 10 SRAM: 3 writes Cycles: 12 Cycles: 10 Cycles: 5 Cycles: 2 Cycles: 1 Total cycles: 33+66=99 Budget: 1400 MHz/(10Gbs/8*90) = 100.8 => 100 cycles Measured Latency: 745 Cycles: 17 Cycles: 16

11 11 - David M. Zar - 3/8/2016 Ingress Validation Send in non-tunneled packets and check output packets to see they are our internal, tunneled, packets. »Worked during development but not tested in integrated system at this point. Send in tunneled packets and check output packets to see they are our internal, tunneled, packets. »Example: 01020304 05060708 090a0b0c 81000aaa 08004500 00380000 0000ff11 3a61c0a8 0001c0a8 00020001 00010024 ffbd4500 001c0000 0000ff11 3a7dc0a8 0001c0a8 00020001 00020008 7e87 [6d7e d5be] CRC that’s stripped by RX -> »01020304 0a020102 03040a0b 81000002 08004500 00380000 0000ff11 3a61c0a8 0001c0a8 00020001 00010024 ffbd4500 001c0000 0000ff11 3a7dc0a8 0001c0a8 00020001 00020008 7e87

12 Egress Header Format

13 13 - David M. Zar - 3/8/2016 Egress Header Format Microengine Usage »One microengine »Eight identical threads »NN ring input from Lookup »NN ring output to Port Splitter Main functions: »Using data from Lookup, modify packet header in DRAM for proper routing to Switch: Destination MAC address Ø First five bytes are same as source MAC address Ø Destination MAC address is looked up based on IP address from lookup Source MAC address Ø Address of this LC VLAN tag »Adjust pre-queue stats counters »Format input data for QM QID Port Number Ethernet Frame Length

14 14 - David M. Zar - 3/8/2016 LC Egress Functional Blocks Lookup Key Extract Switch Rx Phy Int Tx QM/Schd Hdr Format SWITCHSWITCH Ethernet Frame Length (16b) Buffer Handle(32b) Stats Index (16b) QID(20b) Rsv (4b) Port (4b) Rsv (4b) Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Src Addr (4B) Dst Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Ethernet Trailer Input Packet Format Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Src Addr (4B) Dst Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Ethernet Trailer Output Packet Format Buf Handle(32b) IP Pkt Length (16b) Reserved (8b) Eth Hdr Len (8b) VLAN(12b) QID (20b) Rsvd (4b) Port (4b) Rsvd (4b) Stats Index (16b) Rsvd (4b) IP DAddr (32b)

15 15 - David M. Zar - 3/8/2016 MAC Address and VLAN Tag (Egress) The source MAC address is fixed and set at boot time ( _WU_get_mac_address) The destination MAC address will only differ in the last nibble and this nibble is obtained from the Lookup data. »_WU_ip_lookup will take 32 bits from the destination IP address and use the local CAM to obtain the least significant 4 bits of the MAC address. »The CAM state bits are used for this so that’s why there are only 4 bits of data returned The VLAN tag is obtained from the Lookup data.

16 16 - David M. Zar - 3/8/2016 Egress HF Block Diagram _WU_get_enet_frame_length _ WU_write_vlan_header _ WU_update_counters _WU_update_buffer_descriptor Wait for prev ctx Signal next ctx NN Enqueue Wait for prev ctx Signal next ctx NN Dequeue init signal dl_sink() dl_source() DRAM: 1 4B read 4 4B writes Cycles: 32 SRAM: 1 add 1 incr Cycles: 6 SRAM: 3 writes Cycles: 10 _WU_ip_lookup Cycles: 10 Cycles: 2 Cycles: 1 Total cycles: 65 Measured Latency * : ~660

17 17 - David M. Zar - 3/8/2016 Egress Validation Send in our internal, tunneled packets and check output packets to see they are our valid IP, tunneled, packets. »For the PlanetLab demo, there are no non-tunneled output packets Check packet and byte counters for valid updates Check CAM for proper initialization (data watch)

18 18 - David M. Zar - 3/8/2016 HF Initialization (Ingress/Egress) All memory locations defined in dl_system.h: »Base address for HF LC[I/E]_HF_SRAM_INIT_BASE Ø MAC_ADDR_HI32 Ø MAC_ADDR_LO16 »Pre-Queue Counters LC[I/E]_LU_COUNTERS_SRAM_INIT_BASE Ø LC[I/E]_LU_PRE_Q_PKT_CNT_OFFSET – offset into counters structure for packet counter Ø LC[I/E]_LU_PRE_Q_BYTE_CNT_OFFSET – offset into counters structure for byte counter. Thread 0 waits for signal from rx For Egress, the CAM is filled ( _WU_hfe_initialize_ip_lookup ) with data from LCE_HF_SRAM_INIT_BASE + 8: each entry is 64 bits: cam_entry (32b), RSVD (28b), MAC_DEST (4b)

19 19 - David M. Zar - 3/8/2016 File Locations (Ingress and Egress) Main code »Applications/LC_Ingress/src/hdr_format/PL/hdr_format.uc »Applications/LC_Egress/src/hdr_format/PL/hdr_format.uc Library »library/DataPlane/hdr_format_util.uc

20 20 - David M. Zar - 3/8/2016 Required Includes (Ingress and Egress) Files »build/PL/dispatch_loop/dl_system.h memory locations »IXA_SDK_4.0/src/library/microblocks_library/ dl_meta – for metadata macros »IXA_SDK_4.0/src/library/dataplane_library/ dram – for DRAM read/write macros sram – for SRAM read/write/add/incr macros xbuf – for transfer buffer macros

21 Performance Issues

22 22 - David M. Zar - 3/8/2016 Ingress Performance Anomalies These stalls are in various SRAM and DRAM accesses – the command FIFO is FULL!

23 23 - David M. Zar - 3/8/2016 Ingress Anomalies (Explanation)

24 24 - David M. Zar - 3/8/2016 Ingress Anomalies (Explanation) These bus arbiters are shared across all memory interfaces The SRAM Controllers have a command FIFO

25 25 - David M. Zar - 3/8/2016 Ingress/Egress SRAM Issues It seems that using atomic ADD/INCR instructions is expensive at the SRAM controller If I remove them and read the SRAM, add myself, write the SRAM, this is quicker and consumes less of the SRM controller time an, thus, the command queue never backs up. The this new design, there are more instructions executed, but there may be a few I could eliminate with some optimizing of code. No stalling in the WU microblocks (well QM does and RX and TX still do but these looks normal).

26 26 - David M. Zar - 3/8/2016 Ingress/Egress Performance ~99 CPU cycles ~745 cycles latency Expected performance »Should have no trouble going at 10 Gb/s but does… Simulated performance (as of 11/06/2006) »~10 Gb »With all other microengines in place (i.e. real simulation)

27 Future Work

28 28 - David M. Zar - 3/8/2016 Determine source of I/O stalls Update Stubs projects for validation of Ingress/Egress blocks (done for Ingress) Extend Both blocks for all possible packet formats »Ingress – inputs »Egress – outputs Possible instruction optimization to give a little headroom (99 cycles out of 100). Currently, design will not work for standard IPv4 packets; PlanetLab VLAN packets are OK. Ingress/Egress Future Work


Download ppt "David M. Zar Block Design Review: PlanetLab Line Card Header Format."

Similar presentations


Ads by Google