Presentation is loading. Please wait.

Presentation is loading. Please wait.

Brandon Heller Block Design Review: Substrate Decap and IPv4 Parse.

Similar presentations


Presentation on theme: "Brandon Heller Block Design Review: Substrate Decap and IPv4 Parse."— Presentation transcript:

1 Brandon Heller bdh4@cec.wustl.edu http://www.arl.wustl.edu/projects/techX Block Design Review: Substrate Decap and IPv4 Parse

2 2 - Brandon Heller - 1/19/2016 Revision History 9/26/06 (BDH): »Released 9/28/06 (BDH): »SD now at 5Gbps+

3 3 - Brandon Heller - 1/19/2016 Contents Lookup Rx Tx QM Parse Header Format Substr Decap slide taken from PlanetLab_Design.ppt For SD and Parse: »overview »block diagram »memory usage »code locations »test procedures Performance analysis »Unexpected interactions »Future work

4 Substrate Decap

5 5 - Brandon Heller - 1/19/2016 Substrate Decap Lookup Rx Tx QM Parse Header Format Substr Decap slide taken from PlanetLab_Design.ppt Main functions: »validate & consume Ethernet header »look up code_option and slice_data_ptr based on VLAN tag »validate & consume substrate UDP/IP headers »pass relevant fields to IPv4 parse Single code path NN communication Uses 8 threads Name change from Demux

6 6 - Brandon Heller - 1/19/2016 IPv4 MR Functional Blocks Lookup Rx Tx QM Parse Header Format Substr Decap Buf Handle(32b) Port (8b) Reserved (8b) Eth. Frame Len (16b) Type=802.1Q (2B) PAD (nB) CRC (4B) UDP Payload (MN Packet) Dst Addr (4B) Src Addr (4B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) DstAddr (6B) SrcAddr (6B) IP Options (0-40B) Src Port (2B) Dst Port (2B) UDP length (2B) UDP checksum (2B) VLAN (2B) Type=IP (2B) Ethernet Header IP Header UDP Header Ethernet Trailer Rx UDP DPort (16b) Buf Handle(32b) Slice ID (VLAN) (16b) MN Frm Offset (16b)MN Frm Length(16b) Rx IP SAddr (32b) Reserved (12b) Rx UDP SPort (16b) Code (4b) Slice Data Ptr (32b) slide taken from PlanetLab_Design.ppt

7 7 - Brandon Heller - 1/19/2016 Ethernet Validation No alignment necessary Counters kept in non-VLAN-specific region Tests for »invalid Ethernet packet length »non-VLAN tag protocol ID »non-locally-addressed packet »unrecognized VLAN

8 8 - Brandon Heller - 1/19/2016 VLAN Table VLANcode_opt slice_data_ptr 000 100 ……… 0xaaa1 ……… 0xfff00 … SD data P data HF data … code_option = 0 implies invalid slice »“on switch” for a slice in the data plane SD data is currently only counters 64B slice data SRAM space for all 4096 VLANs

9 9 - Brandon Heller - 1/19/2016 Substrate UDP/IP Validation Header checks per RFC1812: »IP ver other than 4 »invalid header length »length too small »IP len doesn't match Enet-deduced IP len »UDP len doesn't match IP-deduced UDP len NOTE: need to check Ethernet length, to ensure that padded 64B packets are using the correct length

10 10 - Brandon Heller - 1/19/2016 SD Block Diagram add one 4B SRAM increment per counter (none currently for common case) Read Eth/IP Hdrs Validate Ethernet Read VLAN table Validate IP Read UDP hdr Validate UDP Prepare ring data Wait for prev ctx Signal next ctx NN Enqueue Wait for prev ctx Signal next ctx NN Dequeue init signal substrate_decap() dl_sink() dl_source() DRAM: 5 8B reads SRAM: 2 4B reads DRAM: 2 8B reads mem access

11 11 - Brandon Heller - 1/19/2016 File locations (in …/IPv4_MR/) Code »src/substrate_decap/PL/substrate_decap.[c,h] »src/dispatch_loop/PL/substrate_decap_dl.[c,h] »src/dispatch_loop/PL/dl_source.[c,h] dl_source() and dl_sink() functions adds ordered thread synchronization if the following defined: Ø DL_ORDERED Ø FIRST_ORDERED_ME Ø LAST_ORDERED_ME »src/IXP2XXX_book/Chapter09/ordered_signal.[c,h] functions for ordered thread synchronization »src/dispatch_loop/PL/nn_rings.[c,h] functions for enqueuing and dequeuing NN ring data Data formats »src/PL/ipv4_common.h IP and UDP structure definitions »src/PL/substrate_common.h Ethernet VLAN structure definitions »src/dispatch_loop/PL/ring_formats.h ring data struct defs »build/PL/dispatch_loop/dl_system.h memory locations

12 12 - Brandon Heller - 1/19/2016 Required Includes Files »IXA_SDK_4.0\microengineC\src\intrinsic.c »IXA_SDK_4.0\microengineC\src\rtl.c Directories »IXA_SDK_4.0\src\library\microblocks_library\microc\ »IXA_SDK_4.0\MicroengineC\include\..\..\..\..\ »IXA_SDK_4.0\src\library\dataplane_library\microc\ These are required to gain access to the buffer libraries and intrinsic functions!

13 13 - Brandon Heller - 1/19/2016 SD Initialization All memory locations defined in dl_system.h, incl: »locations for MAC address IPV4_SD_MAC_ADDR_HI32 IPV4_SD_MAC_ADDR_LO16 »non-VLAN-specific counters IPV4_SD_COUNTERS_BASE IPV4_SD_COUNTERS_SIZE »VLAN table IPV4_SD_VLAN_CODE_OPT_TABLE_x (BASE, SIZE, ENTRY_SIZE) »VLAN-specific memory SLICE_DATA_TABLE_x (BASE, SIZE, ENTRY_SIZE, ENTRY_TOTAL) IPV4_SD_SLICE_DATA_ENTRY_OFFSET At least one slice must be initialized to send packets »Call init_slice() from system_init.ind »Currently 0xaaa initialized by default »All counters zeroed SD caches MAC address in registers Thread 0 waits for signal from rx

14 14 - Brandon Heller - 1/19/2016 Substrate Decap Validation All validation tests done with 1 thread and substrate_decap_tests.tcs »Ethernet validation/counter tests invalid Ethernet packet length non-VLAN tag protocol ID non-locally-addressed packet unrecognized VLAN »UDP/IP validation/counter tests IP ver other than 4 invalid header length length too small IP len doesn't match Enet-deduced IP len UDP len doesn't match IP-deduced UDP len »Watched counters for proper number of increments Fully valid packet: vlan_ip_udp_ip_udp/tcp (speed_test_all_valid.tcs) »Verified all fields of output ring data were as expected »Single-thread plus 8-thread Hardware testing »Uses Fred’s sp++ utility with a logged trace of the above packets »observed exact same behavior as in simulation

15 15 - Brandon Heller - 1/19/2016 SD Other Bugs »substrate IP proto not checked, should correspond to UDP Untested »buffer drops Data Structures »substrate_decap_vlan_table_entry_t »substrate_decap_stats_t »substrate_decap_vlan_stats_t »vlan_ip_header ipv4_header_struct vlan_header_struct »udp_header Performance »coming later

16 IPv4 Parse

17 17 - Brandon Heller - 1/19/2016 IPv4 Parse Lookup Rx Tx QM Parse Header Format Substr Decap slide taken from PlanetLab_Design.ppt Main functions »Read/align IP header »Validate and consume IP header (per RFC1812 5.2.2) »Update IP header Dec TTL Recalc IP checksum Write updated checksum to DRAM »Read/align L4 (UDP/TCP/other) header »Mark exceptions for Header Format »Extract fields for Lookup

18 18 - Brandon Heller - 1/19/2016 IPv4 MR Functional Blocks IPv4 Exception Bits »Bit 0: TTL = 0 or 1 »Bit 1: Options Lookup Rx Tx QM Parse Header Format DeMux Rx UDP DPort (16b) Buf Handle(32b) Slice ID (VLAN) (16b) MN Frm Offset (16b)MN Frm Length(16b) Rx IP SAddr (32b) Reserved (12b) Rx UDP SPort (16b) Code (4b) Lookup Key[111-80] DA (32b) Buf Handle(32b) IP Pkt Length (16b)IP Pkt Offset (16b) Lookup Key[ 79-48] SA (32b) Lookup Key[ 47-16] Ports (32b) Lookup Key Proto/TCP_Flags [15- 0] (16b) Exception Bits (12b) Lookup Key[143-112] Slice ID/Rx UDP DPort (32b) L Flags (4b) Slice Data Ptr (32b) Reserved (28b) Code (4b)

19 19 - Brandon Heller - 1/19/2016 Zeros (4b) IPv4 Internal Header Formats Type (6b)Len (6b) Type Dependent Data (8B) Rx UDP DPort (2B) Tx UDP DPort (2B)Tx UDP SPort (2B) Tx IP DAddr (4B) SourceCategoryType bit field ReasonInternal Hdr RMPE Action Ingress LC Normal FwdNoneClassify and fwd GPENo Classify (w/ FwdKey**) [0]Original pkt, reinjected to data path Rx UDP DPort + FwdKey Perform substrate lookup to resolve LCAddr, port and QID Classify (w/o FwdKey) [1]ICMP or local trafficRx UDP DPort Classify and fwd 4 bits at start discriminate between IPv4 and internal headers for more details see planetlab_IPv4_MR_parse_hdr_format.ppt in bdh4\techx\IPv4_MR_shared

20 20 - Brandon Heller - 1/19/2016 Parse Validation IPv4_parse_tests.tcs »Invalid internal header invalid len for internal header type internal header type unknown »Invalid IPv4 (RFC 1812 checks) IP ver other than 4 invalid header length length too small SD IP len doesn't match packet IP len invalid header checksum »IPv4 Exceptions options flag set in packet TTL equals zero TTL equals one IPv4_parse_valid.tcs »Fully valid, no-exceptions packets from GPE, classify from GPE, non-classify ingress, TCP ingress, UDP

21 21 - Brandon Heller - 1/19/2016 Parse Block Diagram add one 4B SRAM increment per counter (none currently for common case) Read Int Hdr Handle Internal Read IP Validate IP Read L4 Handle L4 Prepare ring data Wait for prev ctx Signal next ctx NN Enqueue Wait for prev ctx Signal next ctx NN Dequeue init signal ipv4_parse() dl_sink() dl_source() DRAM: 2 8B reads DRAM: 4 8B reads mem access (DRAM: 2 8B reads) Checksum

22 22 - Brandon Heller - 1/19/2016 File locations (in …/IPv4_MR/) Code »src/ipv4/PL/ipv4_parse[c,h] »src/dispatch_loop/PL/parse_dl.[c,h] »src/parse/PL/parse.[c,h] »src/dispatch_loop/PL/dl_source.[c,h] dl_source() and dl_sink() functions adds ordered thread synchronization if the following defined: Ø DL_ORDERED Ø FIRST_ORDERED_ME Ø LAST_ORDERED_ME »src/IXP2XXX_book/Chapter09/ordered_signal.[c,h] functions for ordered thread synchronization »src/dispatch_loop/PL/nn_rings.[c,h] functions for enqueuing and dequeuing NN ring data Data formats »src/PL/ipv4_common.h IP and UDP structure definitions »src/dispatch_loop/PL/ring_formats.h ring data struct defs »build/PL/dispatch_loop/dl_system.h memory locations

23 23 - Brandon Heller - 1/19/2016 Parse Initialization All memory locations defined in dl_system.h, incl: »VLAN-specific memory SLICE_DATA_TABLE_x (BASE, SIZE, ENTRY_SIZE, ENTRY_TOTAL) IPV4_PARSE_SLICE_DATA_ENTRY_OFFSET At least one slice must be initialized to send packets »Call init_slice() from system_init.ind »Currently 0xaaa initialized by default »All counters zeroed

24 24 - Brandon Heller - 1/19/2016 Other Bugs »none? Untested »buffer drops Unimplemented »checksum for IP options not handled yet Data Structures »parse_vlan_stats_t »ipv4_header_struct »udp_header_struct »tcp_header_struct Performance »coming next

25 Performance

26 26 - Brandon Heller - 1/19/2016 Packet Sizes Ethernet VLAN Header18B Substrate Header IPv4 Header20B UDP Header8B Metanet Frame GPE to MPEn IPv4 Header20B UDP Header8B Payloadn Ethernet Pad0 Ethernet FCS4B Total 78B + internal + payload Ethernet IFS12B Total Physical 90B + internal + payload

27 27 - Brandon Heller - 1/19/2016 Cycle Budget (min eth packets) To hit 5Gb rate: »76B per min IPv4 packet (64 min Eth + 12B IFS) »1.4Ghz clock rate »5 Gb/sec * 1B/8b * packet/76B = 8.22 Mp/sec »1.4Gcycle/sec * 1 sec/ 8.22 Mp = 170.3 cycles per packet »compute budget: 170 cycles »latency budget: (threads*170) 4 threads : 680 cycles 8 threads: 1360 cycles

28 28 - Brandon Heller - 1/19/2016 Cycle Budget (IPv4 MN packets) To hit 5Gb rate: »90B per min IPv4 packet (78 min IPv4MN + 12B IFS) »1.4Ghz clock rate »5 Gb/sec * 1B/8b * packet/90B = 6.94 Mp/sec »1.4Gcycle/sec * 1 sec/ 6.94 Mp = 201.7 cycles per packet »compute budget: 201 cycles »latency budget: (threads*201) 4 threads : 804 cycles 8 threads: 1608 cycles

29 29 - Brandon Heller - 1/19/2016 Performance Anomalies Substrate Decap Spot the issue! these issues have since been fixed! more DRAM contentionunhidden DRAM latency

30 30 - Brandon Heller - 1/19/2016 Substrate Decap Performance Optimized common case (ingress, no options) »Combined initial header checks »No options assumed  single DRAM read 153 cycles typical ~650 cycles latency 337 control store instructions Expected performance »(201/153)*5Gb = ~6.5Gb expected performance Simulated performance (as of 9/26/2006) »>5 Gb, but something else slows down 6Gb input

31 31 - Brandon Heller - 1/19/2016 SD Optimizations possible optimizations »caching VLAN-to-CodeOption table in Local Memory »optimize nn_dequeue_incr() via assembly coding »move VLAN counter computation off fast path? »use transfer regs directly saves 9 cycles »remove volatile statements

32 32 - Brandon Heller - 1/19/2016 Parse Performance single-threaded »~380 cycles for computation »1708 cycles latency »556 control store insts Expected performance »(201/380)*5Gb = <3Gb expected performance Going to optimize a bit before add all 8 threads

33 33 - Brandon Heller - 1/19/2016 Parse Optimizations possible optimizations »incremental IPv4 checksum update per RFC1624RFC1624 »checksum computation in assembler »optimized 5LW alignment for IP read »combined initial error-check to optimize common case reduces branch delays slows down exception path

34 34 - Brandon Heller - 1/19/2016 Implementation Status Parse needs »error testing »IP options with checksum »multithreading »drop tests

35 35 - Brandon Heller - 1/19/2016 Image Slide Template

36 36 - Brandon Heller - 1/19/2016 Text Slide Template

37 37 - Brandon Heller - 1/19/2016 Extra Slides

38 38 - Brandon Heller - 1/19/2016 Parse Memory Usage Memory reads/writes »2 8B DRAM reads: unaligned internal header »2 8B DRAM reads: unaligned internal header + FwdKey »4 8B DRAM reads: unaligned IPv4 header »[0,6] DRAM reads: unaligned IPv4 header options »4 8B DRAM reads: unaligned L4 header »1 SRAM increment: per counter »1 DRAM write: updated TTL and checksum

39 39 - Brandon Heller - 1/19/2016 Ethernet Validation First, read packet from memory, guaranteed aligned Not specific to any VLAN - in separate mem area For efficiency, can keep counters in LM and update to RAM when a signal is triggered typedef struct _substrate_decap_stats_t { unsigned int rx;// received unsigned int pass;// passed to next stage unsigned int dropLen// invalid Ethernet packet length unsigned int dropTPID;// non-VLAN tag protocol ID unsigned int dropDst;// non-locally-addressed packet unsigned int dropVLAN;// unrecognized VLAN } substrate_decap_stats_t;

40 40 - Brandon Heller - 1/19/2016 UDP/IP Validation typedef struct _substrate_decap_slice_stats_t { unsigned int dropIPVer;// IP ver other than 4 unsigned int dropHdrLen;// invalid header length unsigned int dropLenSmall;// length too small unsigned int dropLenMismatch;// IP len doesn't match Enet IP len unsigned int dropUDPLen; // UDP len doesn't match IP UDP len unsigned int pass; // passed to next stage } substrate_decap_slice_stats_t;

41 41 - Brandon Heller - 1/19/2016 RFC 1812 5.2.2 IP Header Validation (1) The packet length reported by the Link Layer must be large enough to hold the minimum length legal IP datagram (20 bytes) (2) The IP checksum must be correct. (3) The IP version number must be 4. If the version number is not 4 then the packet may be another version of IP, such as IPng or ST-II. 4) The IP header length field must be large enough to hold the minimum length legal IP datagram (20 bytes = 5 words). (5) The IP total length field must be large enough to hold the IP datagram header, whose length is specified in the IP header length field. from http://www.faqs.org/rfcs/rfc1812.html


Download ppt "Brandon Heller Block Design Review: Substrate Decap and IPv4 Parse."

Similar presentations


Ads by Google