Presentation is loading. Please wait.

Presentation is loading. Please wait.

NetFPGA Spring Camp Day 1

Similar presentations


Presentation on theme: "NetFPGA Spring Camp Day 1"— Presentation transcript:

1 NetFPGA Spring Camp Day 1
Presented by: Andrew W. Moore with Marek Michalski, Neelakandan Manihatty-Bojan, Gianni Antichi Georgina Kalogeridou, Jong Hun Han, Noa Zilberman aided by Yury Audzevich, Dimosthenis Pediaditakis Poznan University of Technology May 20 – 24, 2013

2 Tutorial Outline Background The Base Reference Router
Introduction The NetFPGA Platform The Base Reference Router Motivation: Basic IP review Example: Reference Router running on the NetFPGA Infrastructure Tree Build System Scripts The Life of a Packet Through the NetFPGA Hardware Datapath Interface to software: Exceptions and Host I/O Implementation Module Template Write Crypto NIC using a static key Simulation and Debug Write and Run Simulations for Crypto NIC Concluding Remarks

3 Section I: Motivation

4 NetFPGA = Networked FPGA
A line-rate, flexible, open networking platform for teaching and research

5 NetFPGA consists of… Four elements: NetFPGA board
Tools + reference designs Contributed projects Community NetFPGA 1G Board NetFPGA 10G Board

6 NetFPGA-10G A major upgrade over the 1Gb/s predecessor
State-of-the-art technology

7 10 Gigabit Ethernet 4 SFP+ Cages AEL2005 PHY 10G Support 1G Support
Direct Attach Copper 10GBASE-R Optical Fiber 1G Support 1000BASE-T Copper 1000BASE-X Optical Fiber

8 Others QDRII-SRAM RLDRAM-II PCI Express x8 Expansion Slot 27MB
Storing routing tables, counters and statistics RLDRAM-II 288MB Packet Buffering PCI Express x8 PC Interface Expansion Slot

9 Xilinx Virtex 5 TX240T Optimized for ultra high-bandwidth applications
48 GTX Transceivers 4 hard Tri-mode Ethernet MACs 1 hard PCI Express Endpoint

10 NetFPGA Board Comparison
NetFPGA 1G NetFPGA 10G 4 x 1Gbps Ethernet Ports 4 x 10Gbps Ethernet Ports 4.5 MB ZBT SRAM 64 MB DDR2 SDRAM 27 MB QDRII-SRAM 288 MB RLDRAM-II PCI PCI Express x8 Virtex II-Pro 50 Virtex 5 TX240T

11 Beyond Hardware NetFPGA-10G Board Xilinx EDK based IDE
Reference designs with ARM AXI4 Software (embedded and PC) Public Repository Public Wiki GitHub, User Community MicroBlaze SW PC SW Xilinx EDK Reference Designs AXI4 IPs

12 NetFPGA board Networking Software running on a standard PC
PC with NetFPGA Networking Software running on a standard PC CPU Memory PCIe A hardware accelerator built with a Field Programmable Gate Array driving 10 Gigabit network links FPGA Memory 10GE

13 Tools + Reference Designs
Compile designs Verify designs Interact with hardware Reference designs: Router (HW) Switch (HW) Network Interface Card (HW) Router Kit (SW) SCONE (SW)

14 Contributed Projects Project Contributor OpenFlow switch
Stanford University IPv4 Router Pisa & Cambridge Uni. RLDRAM I/F Cambridge Univerity ARP Reply Encap & DCAP Learning CAM Switch More projects:

15 Community Wiki Documentation Encourage users to contribute Forums
User’s Guide “so you just got your first NetFPGA” Developer’s Guide “so you want to build a …” Encourage users to contribute Forums Support by users for users Active community - 10s-100s of posts/week Incubating the 10G mailing-list into a forum

16 International Community
Over 1,000 users, using 2,000 cards at 150 universities in 40 countries

17 NetFPGA’s Defining Characteristics
Line-Rate Processes back-to-back packets Without dropping packets At full rate of 10 Gigabit Ethernet Links Operating on packet headers For switching, routing, and firewall rules And packet payloads For content processing and intrusion prevention Open-source Hardware Similar to open-source software Full source code available BSD-Style License for 1G and LGPL 2.1 for 10G But harder, because Hardware modules must meeting timing Verilog & VHDL Components have more complex interfaces Hardware designers need high confidence in specification of modules

18 Test-Driven Design Regression tests Example: Internet Router
Have repeatable results Define the supported features Provide clear expectation on functionality Example: Internet Router Drops packets with bad IP checksum Performs Longest Prefix Matching on destination address Forwards IPv4 packets of length bytes Generates ICMP message for packets with TTL <= 1 Defines how packets with IP options or non IPv4 … and dozens more … Every feature is defined by a regression test

19 Who, How, Why Who uses the NetFPGA? How do they use the NetFPGA?
Teachers Students Researchers How do they use the NetFPGA? To run the Router Kit To build modular reference designs IPv4 router 4-port NIC Ethernet switch, … Why do they use the NetFPGA? To measure performance of Internet systems To prototype new networking systems

20 Spring Camp Objectives
Overall picture of NetFPGA How reference designs work How you can work on a project NetFPGA Design Flow Directory Structure, library modules and projects How to utilize contributed projects Interface/Registers How to verify a design (Simulation and Regression Tests) Things to do when you get stuck AND… You can build your own projects!

21 Section II: Network review

22 Internet Protocol (IP)
Data to be transmitted: Data Data IP Hdr Data IP Hdr Data IP Hdr IP packets: Ethernet Frames: Eth Hdr Data IP Hdr Eth Hdr Data IP Hdr Eth Hdr Data IP Hdr

23 Internet Protocol (IP)
Data 16 32 4 1 Options (if any) Destination Address Source Address Header Checksum Protocol TTL Fragment Offset Flags Fragment ID Total Packet Length T.Service HLen Ver 20 bytes Data IP Hdr

24 Basic operation of an IP router
D E F R5 D R5 F R3 E D Next Hop Destination

25 Basic operation of an IP router
D E F R5

26 Forwarding tables 32 bits wide → ~ 4 billion unique address
IP address Naïve approach: One entry per address Entry Destination Port 1 2 232 12 ~ 4 billion entries Improved approach: Group entries to reduce table size Entry Destination Port 1 2 50 12

27 IP addresses as a line Your computer My computer Stanford Berkeley
Asia North America 232-1 All IP addresses Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default)

28 Longest Prefix Match (LPM)
Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) Matching entries: Stanford North America Everywhere Universities Continents Planet Most specific Data To: Stanford

29 Longest Prefix Match (LPM)
Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) Universities Continents Matching entries: North America Everywhere Planet Most specific Data To: Canada

30 Implementing Longest Prefix Match
Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) Searching Most specific FOUND Least specific

31 Basic components of an IP router
Management & CLI Routing Protocols Software Control Plane Routing Table Data Plane per-packet processing Forwarding Table Switching Queuing Hardware

32 IP router components in NetFPGA
Linux SCONE Management & CLI Management & CLI Routing Protocols OR Software Routing Protocols Routing Table Routing Table Router Kit Output Port Lookup Input Arbiter Output Queues Hardware Forwarding Table Switching Queuing

33 Section III: Example I

34 Operational IPv4 router
Java GUI Routing Table Protocols Management & CLI SCONE Software Control Plane Switching Forwarding Table Queuing Reference router Data Plane per-packet processing Hardware

35 Streaming video

36 Streaming video NetFPGA running reference router PC & NetFPGA
(NetFPGA in PC)

37 Video streaming over shortest path
Streaming video Video streaming over shortest path Video client Video server

38 Streaming video Video client Video server

39 Observing the routing tables
Columns: Subnet address Subnet mask Next hop IP Output ports

40 Example 1 (called exercise in the file store)

41 Review NetFPGA as IPv4 router: Reference hardware + SCONE software
Routing protocol discovers topology Demo: Ring topology Traffic flows over shortest path Broken link: automatically route around failure

42 Section III: Example II

43 Buffers in Routers Internal Contention Pipelining Congestion Rx Tx Rx

44 Buffer Sizing Story

45 Using NetFPGA to explore buffer size
Need to reduce buffer size and measure occupancy Alas, not possible in commercial routers So, we will use the NetFPGA instead Objective: Use the NetFPGA to understand how large a buffer we need for a single TCP flow.

46 Reference Router Pipeline
MAC RxQ CPU Input Arbiter Output Port Lookup TxQ Output Queues Five stages Input interfaces Input arbitration Routing decision and packet modification Output queuing Output interfaces Packet-based module interface Pluggable design

47 Extending the Reference Pipeline
MAC RxQ CPU RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ Input Arbiter Output Port Lookup Event Capture Output Queues MAC TxQ CPU TxQ Rate Limiter MAC TxQ CPU TxQ MAC TxQ CPU TxQ MAC TxQ CPU TxQ

48 Enhanced Router Pipeline
MAC RxQ CPU Input Arbiter Output Port Lookup TxQ Output Queues Rate Limiter Event Capture Two modules added Event Capture to capture output queue events (writes, reads, drops) Rate Limiter to create a bottleneck

49 extended reference router
Topology for Exercise 2 Recall: NetFPGA host PC is life-support: power & control So: The host PC may physically route its traffic through the local NetFPGA NetFPGA running extended reference router nf2c2 nf2c1 eth2 eth1 PC & NetFPGA (NetFPGA in PC) Iperf Client Iperf Server

50 Example 2

51 Review NetFPGA as flexible platform:
Reference hardware + SCONE software new modules: event capture and rate-limiting Example 2: Client Router Server topology Observed router with new modules Started tcp transfer, look at queue occupancy Observed queue change in response to TCP ARQ

52 Section IV: Infrastructure

53 Infrastructure Tree structure NetFPGA package contents
Reusable Verilog modules Verification infrastructure Build infrastructure Utilities Software libraries

54 NetFPGA package contents
Projects: HW: router, switch, NIC, buffer sizing router SW: router kit, SCONE Reusable Verilog modules Verification infrastructure: simulate full board with PCI-express + physical interfaces run tests against hardware test data generation libraries (eg. packets) Build infrastructure Utilities: register I/O, packaging, … Software libraries

55 Tree Structure (1) projects (including reference designs)
NetFPGA-10G projects (including reference designs) contrib-projects (contributed user projects) lib (custom and reference IP Cores and software libraries) tools (scripts for running simulations etc.) docs (design documentations and user-guides)

56 Tree Structure (2) hw (hardware logic as IP cores)
lib hw (hardware logic as IP cores) std (reference cores) contrib (contributed cores) xilinx (Xilinx provided cores) sw (core specific software drivers/libraries) std (reference libraries) contrib (contributed libraries) xilinx (Xilinx provided libraries)

57 Tree Structure (3) bitfiles (FPGA executables) hw (EDK based project)
projects/reference_nic bitfiles (FPGA executables) hw (EDK based project) data (contains user constraint files) nf10 (contains files used to run various tools) pcores (contains local hardware peripherals) sw embedded (contains code for microblaze) host (contains code for host communication etc.)

58 Reusable logic (IP cores)
Category IP Core(s) I/O interfaces Ethernet MAC (1G/10G) MDIO PCI Express UART GPIO Output queues BRAM based Output port lookup OpenFlow Switch NIC 1G to 10G ports Learning switch (CAM based) NIC Memory interfaces SRAM Flash Miscellaneous PBS* to AXIS bridge FIFOs RBS** to AXI-LITE bridge * PBS – packet bus stream in NF1G ** RBS – register bus stream in NF1G

59 What is a PCore? A brief detour

60 What’s pcore? IP Core? P Core? Pcore? pcore? - all the same.
“Processor Core” in EDK HDL (Verilog/VHDL) + metadata files ≈ module Examples: 10G MAC SRAM Controller NIC Output port lookup

61 HDL (Verilog) All NetFPGA-10G pcores are AXI-compliant
AXI = Advanced eXtensible Interface Used in ARM-based embedded systems Standard interface to bridge pcores together AXI4/AXI4-Lite: Control and status interface AXI4-Stream: Data path interface Xilinx IPs and tool chains are migrating to AXI

62 Pcore metadata files Help EDK put pcores together
PAO: Peripheral Analyze Order Files that needs synthesizing MPD: Microprocessor Peripheral Definition Defines interface of the pcore (e.g. # of AXI’s) Consult Xilinx UG642 for details

63 Verification Infrastructure (1)
Simulation and Debugging built on industry standard Xilinx “iSim” simulator and “Scapy” Python scripts for stimuli construction and verification

64 Verification Infrastructure (2)
iSim a High Level Description (HDL) simulator performs functional and timing simulations for embedded, VHDL, Verilog and mixed designs Scapy a powerful interactive packet manipulation library for creating “test data” provides primitives for many standard packet formats allows addition of custom formats

65 Build Infrastructure (1)
Based on the design flow of: “Xilinx Embedded Development Kit” “ARM’s AXI4 interface specification”

66 Build Infrastructure (2)
Build/Synthesis collection of shared hardware peripherals cores (pcores) stitched together with “AXI4”, “Lite” and “Stream” buses bitfile generation and verification using Xilinx synthesis and implementation tools XST Translate, MAP, PAR BitGen

67 Build Infrastructure (3)
Register system collates and generates addresses for all the registers and memories in a project uses Xilinx EDK – libgen utility to generate “include” files for software applications libgen

68 Inter-Module Communication
Using AXI-4 Stream (Packets are moved as Stream) Module i Module i+1 TDATA TUSER TSTRB TLAST TVALID TREADY

69 AXI4-Stream AXI4-Stream Description TDATA Data Stream TSTRB
Strobe signal (i.e. byte enable) TVALID Valid Indication TREADY Flow control indication TLAST End of packet/burst indication TUSER Out of band metadata

70 Packet Format TLAST TUSER TSTRB TDATA X 0xFF…F Eth Hdr IP Hdr … 1
X 0xFF…F Eth Hdr IP Hdr 1 0x0…1F Last word

71 TUSER Position Content [15:0] length of the packet in bytes [23:16]
source port: one-hot encoded [31:24] destination port: one-hot encoded [127:32] 6 user defined slots, 16bit each

72 TVALID/TREADY Signal timing
No waiting! Assert TREADY/TVALID whenever appropriate TVALID should not depend on TREADY TVALID The behavior of TREADY varies between modules output queues – when data is available (only when pkt is outbound) OPL – when input fifo is not full (almost always) TREADY

73 Byte ordering In compliance to AXI, NF 10G has a specific byte ordering 1st byte of the TDATA[7:0] 2nd byte of the TDATA[15:8]

74 Section V: Life of a Packet

75 Reference Router Pipeline
MAC RxQ CPU Input Arbiter Output Port Lookup Output Queues Five stages Input Input arbitration Routing decision and packet modification Output queuing Output Packet-based module interface Pluggable design

76 Full System Components
Software PCIe Bus NetFPGA PW-OSPF Java GUI Driver AXI Lite user data path DMA Registers nf0 nf1 nf2 nf3 ioctl MAC TxQ RxQ Ethernet CPU SCONE

77 Life of a Packet through the Hardware
port0 port2 y x IP packet

78 Dst MAC = port 0, Ethertype = IP
MAC Rx Queue MAC Rx Queue IP Hdr: IP Dst: , TTL: 64, Csum:0x3ab4 Eth Hdr: Dst MAC = port 0, Ethertype = IP Data

79 Rx Queue Rx Queue Payload IP Hdr:
IP Dst: , TTL: 64, Csum:0x3ab4 Eth Hdr: Dst MAC = port 0, Ethertype = IP Payload Length, Src port, Dst port, User defined TUSER TDATA

80 Input Arbiter Rx Q 4 Input Arbiter Pkt Rx Q 1 Pkt Rx Q 0 Pkt

81 Output Port Lookup Output Port Lookup Payload IP Hdr:
IP Dst: , TTL: 64, Csum:0x3ab4 Eth Hdr: Dst MAC = port 0, Ethertype = IP Payload Length, Src port, Dst port, User defined

82 Output Port Lookup Output Port Lookup 5- Update output port in TUSER
1- Check input port matches Dst MAC Output Port Lookup 6- Modify MAC Dst and Src addresses 2- Check TTL, checksum IP Hdr: IP Dst: , TTL: 63, Csum:0x3ac2 Eth Hdr: Dst MAC= nextHop , Src MAC = port 4, Ethertype = IP Payload Length, Src port, Dst port, User defined 3- Lookup next hop IP & output port (LPM) 7-Decrement TTL and update checksum 4- Lookup next hop MAC address (ARP) TUSER TDATA

83 Output Queues Output Queues OQ0 OQ2 Pkt OQ4

84 MAC Tx Queue MAC Tx Queue Payload IP Hdr:
IP Dst: , TTL: 63, Csum:0x3ac2 Eth Hdr: Dst MAC= nextHop , Src MAC = port 4, Ethertype = IP Payload Length, Src port, Dst port, User defined

85 Length, Src port, Dst port, User defined
MAC Tx Queue MAC Tx Queue Length, Src port, Dst port, User defined IP Hdr: IP Dst: , TTL: 63, Csum:0x3ac2 Eth Hdr: Dst MAC= nextHop , Src MAC = port 4, Ethertype = IP Payload

86 Exception Packets Example: TTL = 0 or TTL = 1
Packet has to be sent to the CPU Host generates an ICMP packet response Difference starts at the Output Port Lookup

87 Exception Packet Path Software PCI Bus PW-OSPF Java GUI Driver DMA
NetFPGA PW-OSPF Java GUI Driver AXI Lite user data path DMA Registers nf0 nf1 nf2 nf3 ioctl MAC TxQ RxQ Ethernet CPU SCONE

88 Output Port Lookup Output Port Lookup
1- Check input port matches Dst MAC Output Port Lookup 2- Check TTL, checksum – EXCEPTION! IP Hdr: IP Dst: , TTL: 1, Csum:0x3ab4 Eth Hdr: Dst MAC= 0 , Src MAC = port x, Ethertype = IP Payload Length, Src port, Dst port, User defined 3- Add output port module

89 Output Queues Output Queues OQ0 OQ1 OQ2 Pkt OQ4

90 CPU Tx Queue CPU Tx Queue Payload IP Hdr:
IP Dst: , TTL: 1, Csum:0x3ab4 Eth Hdr: Dst MAC= 0 , Src MAC = port x, Ethertype = IP Payload Length, Src port, Dst port, User defined

91 Length, Src port, Dst port, User defined
CPU Tx Queue CPU Tx Queue Length, Src port, Dst port, User defined IP Hdr: IP Dst: , TTL: 1, Csum:0x3ab4 Eth Hdr: Dst MAC= 0 , Src MAC = port x, Ethertype = IP Payload

92 ICMP Packet Packet arrives at the CPU Rx Queue from the PCI-express Bus Same path as a packet from the MAC until it reaches the Output Port Lookup (OPL) The OPL module sees the packet is from the CPU Rx Queue 1 and sets the output port directly to 0 The packet continues on the same path as the non-exception packet to the Output Queues and then MAC Tx queue 0

93 ICMP Packet Path Software PCI Bus PW-OSPF Java GUI Driver DMA SCONE
NetFPGA PW-OSPF Java GUI Driver AXI Lite user data path DMA Registers nf0 nf1 nf2 nf3 ioctl MAC TxQ RxQ Ethernet CPU SCONE

94 NetFPGA-Host Interaction
Linux driver interfaces with hardware Packet interface via standard Linux network stack Register reads/writes via ioctl system call with wrapper functions: rdaxi(int address, unsigned *rd_data); wraxi(int address, unsigned *wr_data); eg: rdaxi(0x7d , &val); Named registers are flakey

95 NetFPGA-Host Interaction
NetFPGA to host packet transfer 1. Packet arrives – forwarding table sends to CPU queue 2. Interrupt notifies driver of packet arrival 3. Driver sets up and initiates DMA transfer PCI Bus

96 NetFPGA-Host Interaction
NetFPGA to host packet transfer (cont.) 4. NetFPGA transfers packet via DMA 5. Interrupt signals completion of DMA PCI Bus 6. Driver passes packet to network stack

97 NetFPGA-Host Interaction
Host to NetFPGA packet transfers 3. Interrupt signals completion of DMA 2. Driver sets up and initiates DMA transfer PCI Bus 1. Software sends packet via network sockets Packet delivered to driver

98 NetFPGA-Host Interaction
Register access 2. Driver performs PCIe memory read/write PCI Bus 1. Software makes ioctl call on network socket ioctl passed to driver

99 Section V: Contributed Projects

100 Running the Router Kit User-space development, 4x10GE line-rate forwarding
Usage #1 OSPF BGP My Protocol user kernel Routing Table CPU Memory PCI-Express “Mirror” FPGA Memory IPv4 Router Fwding Table Packet Buffer 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE

101 Enhancing Modular Reference Designs
Usage #2 Verilog, System Verilog, VHDL, Bluespec…. NetFPGA Driver PW-OSPF EDA Tools (Xilinx, Mentor, etc.) Design Simulate Synthesize Download CPU Memory Java GUI Front Panel (Extensible) PCI-Express In Q Mgmt IP Lookup L2 Parse L3 Out Q Verilog modules interconnected by FIFO interfaces 10GbE 10GbE FPGA 10GbE 10GbE 10GbE My Block 10GbE Memory 10GbE 10GbE

102 (10GE MAC is soft/replaceable)
Creating new systems Usage #3 Verilog, System Verilog, VHDL, Bluespec…. NetFPGA Driver My Design (10GE MAC is soft/replaceable) EDA Tools (Xilinx, Mentor, etc.) Design Simulate Synthesize Download CPU Memory PCI-Express 10GbE 10GbE FPGA 10GbE 10GbE 10GbE 10GbE Memory 10GbE 10GbE

103 NetThreads, NetThreads-RE, NetTM
Geoff Salmon Monia Ghobadi Yashar Ganjali Martin Labrecque Gregory Steffan ECE Dept. CS Dept. U. of Toronto Efficient multithreaded design Parallel threads deliver performance System Features System is easy to program in C Time to results is very short

104 Soft Processors in FPGAs
Ethernet MAC DDR controller Soft processors: processors in the FPGA fabric User uploads program to soft processor Easier to program software than hardware in the FPGA Could be customized at the instruction level

105 Example 1G Contributed Projects
Contributor OpenFlow switch Stanford University Packet generator NetFlow Probe Brno University NetThreads University of Toronto zFilter (Sp)router Ericsson Traffic Monitor University of Catania DFA UMass Lowell

106 10G Contributions Alongside reference projects we have… Project
Contributor Bluespec switch MIT/SRI International Traffic Monitor University of Pisa NF1G legacy on NF10G Uni Pisa & Uni Cambridge Simple/better DMA core Stanford RAMcloud project Your ideas You? ….

107 Future for NetFPGA 10G Non-reference Projects we know about
High precision capture card 4 10GbE ports, GPS input, 8ns resolution. Traffic generator 4 port 10GbE real-traffic generator OpenFlow native port (Verilog) to OVS

108 How might we use NetFPGA?
Build an accurate, fast, line-rate NetDummy/nistnet element A flexible home-grown monitoring card Evaluate new packet classifiers (and application classifiers, and other neat network apps….) Prototype a full line-rate next-generation Ethernet-type Trying any of Jon Crowcrofts’ ideas (Sourceless IP routing for example) Demonstrate the wonders of Metarouting in a different implementation (dedicated hardware) Provable hardware (using a C# implementation and kiwi with NetFPGA as target h/w) Hardware supporting Virtual Routers Check that some brave new idea actually works e.g. Rate Control Protocol (RCP), Multipath TCP, toolkit for hardware hashing MOOSE implementation IP address anonymization SSL decoding “bump in the wire” Xen specialist nic computational co-processor Distributed computational co-processor IPv6 anything IPv6 – IPv4 gateway (6in4, 4in6, 6over4, 4over6, ….) Netflow v9 reference PSAMP reference IPFIX reference Different driver/buffer interfaces (e.g. PFRING) or “escalators” (from gridprobe) for faster network monitors Firewall reference GPS packet-timestamp things High-Speed Host Bus Adapter reference implementations Infiniband iSCSI Myranet Fiber Channel Smart Disk adapter (presuming a direct-disk interface) Software Defined Radio (SDR) directly on the FPGA (probably UWB only) Routing accelerator Hardware route-reflector Internet exchange route accelerator Hardware channel bonding reference implementation TCP sanitizer Other protocol sanitizer (applications… UDP DCCP, etc.) Full and complete Crypto NIC IPSec endpoint/ VPN appliance VLAN reference implementation metarouting implementation virtual <pick-something> intelligent proxy application embargo-er Layer-4 gateway h/w gateway for VoIP/SIP/skype h/w gateway for video conference spaces security pattern/rules matching Anti-spoof traceback implementations (e.g. BBN stuff) IPtv multicast controller Intelligent IP-enabled device controller (e.g. IP cameras or IP powermeters) DES breaker platform for flexible NIC API evaluations snmp statistics reference implementation sflow (hp) reference implementation trajectory sampling (reference implementation) implementation of zeroconf/netconf configuration language for routers h/w openflow and (simple) NOX controller in one… Network RAID (multicast TCP with redundancy) inline compression hardware accelorator for TOR load-balancer openflow with (netflow, ACL, ….) reference NAT device active measurement kit network discovery tool passive performance measurement active sender control (e.g. performance feedback fed to endpoints for control) Prototype platform for NON-Ethernet or near-Ethernet MACs Optical LAN (no buffers) How might we use NetFPGA? Build an accurate, fast, line-rate NetDummy/nistnet element A flexible home-grown monitoring card Evaluate new packet classifiers (and application classifiers, and other neat network apps….) Prototype a full line-rate next-generation Ethernet-type Trying any of Jon Crowcrofts’ ideas (Sourceless IP routing for example) Demonstrate the wonders of Metarouting in a different implementation (dedicated hardware) Provable hardware (using a C# implementation and kiwi with NetFPGA as target h/w) Hardware supporting Virtual Routers Check that some brave new idea actually works e.g. Rate Control Protocol (RCP), Multipath TCP, toolkit for hardware hashing MOOSE implementation IP address anonymization SSL decoding “bump in the wire” Xen specialist nic computational co-processor Distributed computational co-processor IPv6 anything IPv6 – IPv4 gateway (6in4, 4in6, 6over4, 4over6, ….) Netflow v9 reference PSAMP reference IPFIX reference Different driver/buffer interfaces (e.g. PFRING) or “escalators” (from gridprobe) for faster network monitors Firewall reference GPS packet-timestamp things High-Speed Host Bus Adapter reference implementations Infiniband iSCSI Myranet Fiber Channel Smart Disk adapter (presuming a direct-disk interface) Software Defined Radio (SDR) directly on the FPGA (probably UWB only) Routing accelerator Hardware route-reflector Internet exchange route accelerator Hardware channel bonding reference implementation TCP sanitizer Other protocol sanitizer (applications… UDP DCCP, etc.) Full and complete Crypto NIC IPSec endpoint/ VPN appliance VLAN reference implementation metarouting implementation virtual <pick-something> intelligent proxy application embargo-er Layer-4 gateway h/w gateway for VoIP/SIP/skype h/w gateway for video conference spaces security pattern/rules matching Anti-spoof traceback implementations (e.g. BBN stuff) IPtv multicast controller Intelligent IP-enabled device controller (e.g. IP cameras or IP powermeters) DES breaker platform for flexible NIC API evaluations snmp statistics reference implementation sflow (hp) reference implementation trajectory sampling (reference implementation) implementation of zeroconf/netconf configuration language for routers h/w openflow and (simple) NOX controller in one… Network RAID (multicast TCP with redundancy) inline compression hardware accelorator for TOR load-balancer openflow with (netflow, ACL, ….) reference NAT device active measurement kit network discovery tool passive performance measurement active sender control (e.g. performance feedback fed to endpoints for control) Prototype platform for NON-Ethernet or near-Ethernet MACs Optical LAN (no buffers) Well I’m not sure about you but here is a list I created:

109 How might YOU use NetFPGA?
Build an accurate, fast, line-rate NetDummy/nistnet element A flexible home-grown monitoring card Evaluate new packet classifiers (and application classifiers, and other neat network apps….) Prototype a full line-rate next-generation Ethernet-type Trying any of Jon Crowcrofts’ ideas (Sourceless IP routing for example) Demonstrate the wonders of Metarouting in a different implementation (dedicated hardware) Provable hardware (using a C# implementation and kiwi with NetFPGA as target h/w) Hardware supporting Virtual Routers Check that some brave new idea actually works e.g. Rate Control Protocol (RCP), Multipath TCP, toolkit for hardware hashing MOOSE implementation IP address anonymization SSL decoding “bump in the wire” Xen specialist nic computational co-processor Distributed computational co-processor IPv6 anything IPv6 – IPv4 gateway (6in4, 4in6, 6over4, 4over6, ….) Netflow v9 reference PSAMP reference IPFIX reference Different driver/buffer interfaces (e.g. PFRING) or “escalators” (from gridprobe) for faster network monitors Firewall reference GPS packet-timestamp things High-Speed Host Bus Adapter reference implementations Infiniband iSCSI Myranet Fiber Channel Smart Disk adapter (presuming a direct-disk interface) Software Defined Radio (SDR) directly on the FPGA (probably UWB only) Routing accelerator Hardware route-reflector Internet exchange route accelerator Hardware channel bonding reference implementation TCP sanitizer Other protocol sanitizer (applications… UDP DCCP, etc.) Full and complete Crypto NIC IPSec endpoint/ VPN appliance VLAN reference implementation metarouting implementation virtual <pick-something> intelligent proxy application embargo-er Layer-4 gateway h/w gateway for VoIP/SIP/skype h/w gateway for video conference spaces security pattern/rules matching Anti-spoof traceback implementations (e.g. BBN stuff) IPtv multicast controller Intelligent IP-enabled device controller (e.g. IP cameras or IP powermeters) DES breaker platform for flexible NIC API evaluations snmp statistics reference implementation sflow (hp) reference implementation trajectory sampling (reference implementation) implementation of zeroconf/netconf configuration language for routers h/w openflow and (simple) NOX controller in one… Network RAID (multicast TCP with redundancy) inline compression hardware accelorator for TOR load-balancer openflow with (netflow, ACL, ….) reference NAT device active measurement kit network discovery tool passive performance measurement active sender control (e.g. performance feedback fed to endpoints for control) Prototype platform for NON-Ethernet or near-Ethernet MACs Optical LAN (no buffers)

110 Section VI: Example Project

111 Project: Cryptographic NIC
Implement a network interface card (NIC) that encrypts upon transmission and decrypts upon reception

112 Cryptography XOR function XOR written as: ^ ⊻ ⨁
XOR is commutative: (A ^ B) ^ C = A ^ (B ^ C) A B A ^ B 1 XORing a value with itself always yields 0

113 Cryptography (cont.) Simple cryptography: Explanation:
Generate a secret key Encrypt the message by XORing the message and key Decrypt the ciphertext by XORing with the key Explanation: (M ^ K) ^ K = M ^ (K ^ K) Commutativity = M ^ 0 A ^ A = 0 = M

114 Cryptography (cont.) Example: Message: 00111011 Key: 10110001
Message ^ Key ^ Key:

115 Cryptography (cont.) ⨁ Idea: Implement simple cryptography using XOR
32-bit key Encrypt every word in payload with key Note: XORing with a one-time pad of the same length of the message is secure/uncrackable. See: Header Payload Key

116 Section VII: Implementation

117 Embedded Development Kit (1)
A development environment for building “embedded systems” NetFPGA-10G is largely built upon the “EDK”

118 Embedded Development Kit (2)
Xilinx EDK suite is composed of: Platform Studio (XPS), a top level integrated design tool for “hardware” synthesis , implementation and bitstream generation Software Development Kit (SDK), a development environment for “software application” running on embedded processors like Microblaze

119 Xilinx Platform Studio (1)
An XPS project consists of following: system.xmp top level XPS project file system.mhs microprocessor hardware specification file defines the embedded processor system, and includes bus architecture, peripherals, processor, and connectivity etc. system.ucf user constraint file defines constraints such as timing, area, IO placement etc.

120 Xilinx Platform Studio (2)
To invoke XPS design tool, run: xps <project_root>/hw/system.xmp This will open the project in the XPS graphical user interface open a new terminal cd ~/NetFPGA-10G-live/projects/crypto_nic source /opt/Xilinx/13.4/ISE_DS/settings64.sh xps hw/system.xmp

121 XPS Design Tool (1) Bus Assembly System Assembly IP Catalog

122 XPS Design Tool (2) IP Catalog: contains categorized list of all available peripheral cores Bus Assembly: shows connectivity of various modules over AXI bus System Assembly: provides a complete view of instantiated cores Cores can be added and deleted, in this view

123 XPS Design Tool (2) Port view Port view: provides listing of internal and external ports for each peripheral core

124 XPS Design Tool (3) Address view Address view: defines base and high address value for peripheral cores connected to AXI4 or AXI-LITE bus These values can be changed manually, here

125 Getting started with a new project (1)
Projects: Each design represented by a project Location: NetFPGA-10G-live/projects/<proj_name> ~/NetFPGA-10G-live/projects/crypto_nic Consists of: Verilog source Simulation tests Hardware tests Libraries Optional software

126 Getting started with a new project (2)
Normally: copy an existing project as the starting point Today: pre-created project Missing from pre-created project: Verilog files (with crypto implementation) Simulation tests Hardware tests Custom software

127 Getting started with a new project (3)
MAC RxQ MAC RxQ MAC RxQ MAC RxQ CPU RxQ Typically implement functionality in one or more modules under the top wrapper Input Arbiter Output Port Lookup Crypto Crypto module to encrypt and decrypt packets Output Queues MAC RxQ MAC RxQ MAC RxQ MAC RxQ CPU RxQ 127

128 Getting started with a new project (4)
Shared modules included from netfpga/lib/verilog Generic modules that are re-used in multiple projects Specify shared modules in project’s mhs file Local src modules override shared modules crypto_nic: Local Shared crypto.v Everything else

129 Getting started with a new project (5)
Create crypto.v from module_template.v: Rename project project_template to crypto Rename the local module_template.v to crypto.v Change the module name inside crypto.v (first non-comment line of the file) Add the crypto module to the mhs file Update mpd file Update pao file

130 Running XPS cd ~/NetFPGA-10G-live/projects/crypto_nic xps hw/system.xmp

131 Adding The Crypto Module
Double click the Crypto module in IP catalog It’s under “Project Local Pcores” Select 256 bit data width and click OK

132 Connect Crytpo module In “Bus Interfaces”, click Crypto’s S_AXIS and select Input Output_lookup’s M_AXIS Click Output_queue’s S_AXIS and select Crypto’s M_AXIS

133 Connect Clock and Reset
Click “Ports” Tab Clock -> clock_generator_0::CLKOUT1 Reset -> reset_0::Peripheral_aresetn

134 Implementing the Crypto Module (1)
What do we want to encrypt? IP payload only Plaintext IP header allows routing Content is hidden Encrypt bytes 35 onward Bytes 1-14 – Ethernet header Bytes – IPv4 header (assume no options) Remember AXI byte ordering For simplicity, assume all packets are IPv4 without options

135 Implementing the Crypto Module (2)
State machine (draw on next page): Module headers on each packet Datapath 256-bits wide 34 / 32 is not an integer!  Inside the crypto module Registers in_fifo_empty in_fifo_rd_en data/ctrl valid ready S_AXIS M_AXIS

136 Example packet ... No encryption Partial encryption Full encryption
TUSER (128 bits) TDATA (256 bits) No encryption XXX ETH HDR IP Hdr Partial encryption IP Hdr IP Payload IP Payload ... Full encryption N IP Payload UPDATE if TIME.

137 Crypto Module State Diagram
Hint: We suggest 3 states Detect Packet’s Header

138 Implementing the Crypto Module (3)
Implement your state machine inside crypto.v Use a static key initially Suggested sequence of steps: Create a static key value Constants can be declared in the module with localparam: localparam MY_EXAMPLE = 32’h ; Implement your state machine without modifying the packet Update your state machine to modify the packet by XORing the key and the payload Use eight copies of the key to create a 256-bit value to XOR with data words

139 module_template.v (1) Module port declaration
module module_template #( parameter C_FAMILY = "virtex5", parameter C_M_AXIS_DATA_WIDTH=256, parameter C_S_AXIS_DATA_WIDTH=256, parameter C_M_AXIS_TUSER_WIDTH=128, parameter C_S_AXIS_TUSER_WIDTH=128, ... ) ( // Signals // Local assignments // Registers Module port declaration

140 module_template.v (2) // Modules fallthrough_small_fifo #( .WIDTH(C_S_AXIS_DATA_WIDTH+C_S_AXIS_TUSER_WIDTH +(C_S_AXIS_DATA_WIDTH / 8) +1), .MAX_DEPTH_BITS(2) ) input_fifo ( .din (), // Data in .wr_en (), // Write enable .rd_en (), // Read the next word .dout (), // Data out .full (), //FIFO full indication .nearly_full (), //Almost full indication .empty (), //FIFO empty indication .reset (), //Rest .clk () //Clock ); Packet data dumped in a FIFO. Allows some “decoupling” between input and output.

141 module_template.v (3) Registers section.
// -- AXILITE IPIF axi_lite_ipif_1bar #(. . .) axi_lite_ipif_inst ( ); // -- IPIF REGS ipif_regs #(. . .) ipif_regs_inst // Registers logic Registers section. Ignore for now – we’ll explore this later

142 module_template.v (4) // Logic begin // Default values out_wr_int = 0; in_fifo_rd_en = 0; if (!in_fifo_empty && out_rdy) begin out_wr_int = 1; in_fifo_rd_en = 1; end Combinational logic to read data from the FIFO. (Data is output to output ports.) You’ll want to add your state in this section.

143 More Verilog: Assignments 1
Continuous assignments appear outside processes blocks): assign foo = baz & bar; targets must be declared as wires always “happening” (ie, are concurrent)

144 More Verilog: Assignments 2
Non-blocking assignments appear inside processes blocks) use only in sequential (clocked) processes: clk) begin a <= b; b <= a; end occur in next delta (‘moment’ in simulation time) targets must be declared as regs never clock any process other than with a clock!

145 More Verilog: Assignments 3
Blocking assignments appear inside processes blocks) use only in combinatorial processes: (combinatorial processes are much like continuous assignments) begin a = b; b = a; end occur one after the other (as in sequential langs like C) targets must be declared as regs – even though not a register never use in sequential (clocked) processes!

146 More Verilog: Assignments 3
Blocking assignments appear inside processes blocks) use only in combinatorial processes: (combinatorial processes are much like continuous assignments) begin tmp = a; a = b; b = tmp; end occur one after the other (as in sequential langs like C) targets must be declared as regs – even though not a register never use in sequential (clocked) processes! unlike non-blocking, have to use a temporary signal

147 (hints) Never assign one signal from two processes:
clk) begin foo <= bar; end foo <= quux;

148 (hints) In combinatorial processes: always @(*) begin if (cond) begin
take great care to assign in all possible cases begin if (cond) begin foo = bar; end (latches ‹as opposed to flip-flops› are bad for timing closure)

149 (hints) In combinatorial processes: always @(*) begin if (cond) begin
take great care to assign in all possible cases begin if (cond) begin foo = bar; else foo = quux; end

150 (hints) In combinatorial processes: always @(*) begin foo = quux;
(or assign a default) begin foo = quux; if (cond) begin foo = bar; end

151 Section VIII: Simulation and Debug

152 Testing: Simulation Simulation allows testing without requiring lengthy synthesis process NetFPGA simulation environment allows: Send/receive packets Physical ports and CPU Read/write registers Verify results Simulations run in iSim

153 Testing: Simulation We will simulate the “crypto_nic” design under the “10G simulation framework” We will show you how to create simple packets using scapy transmit and reconcile packets sent over 10G Ethernet and PCIe interfaces

154 Testing: Simulation(2)
Run a simulation to verify changes: cd ~/NetFPGA-10G-live/projects/crypto_nic/hw make sim or make simgui make simgui opens isim *more on this later* Now we can implement the crypto functionality

155 cd ~/NetFPGA-10G-live/projects/crypto_nic/hw vim make_pkts.py
This will create eight simple IP packets for injecting into the system from each port including CPU

156 The destination in this code snippet is the DMA
2 Using the created packets we create the stimulus (input to the simulator) and the expected output from the simulator NB. line 97 deciphers the packets from the device To test without a key – edit ~/NetFPGA-10G-live/projects/crypto_nic/hw/Makefile and set the argument of –k to be 0x0

157 The destinations in this code snippet are the ports
3 Using the created packets we create the stimulus (input to the simulator) and the expected output from the simulator NB. line 97 deciphers the packets from the device To test without a key – edit ~/NetFPGA-10G-live/projects/crypto_nic/hw/Makefile and set the argument of –k to be 0x0

158 The results are in… 4 As expected, two packets are received from PCIe on each 10G interface Total of 32 packets, eight from each 10G interface, are received on the PCIe

159 Running simulation in iSim (1)
To invoke iSim for design simulation, run: cd <project_root>/hw/ make simgui This will open simulation in the iSim graphical user interface

160 Running simulation in iSim (2)
Objects panel Waveform window Instance panel Tcl console

161 Running simulation in iSim (3)
Instance panel: displays process and instance hierarchy Objects panel: displays simulation objects associated with the instance selected in the instance panel Waveform window: displays wave configuration consisting of signals and busses Tcl console: displays simulator generated messages and can executes Tcl commands

162 Simulation gone wild When make sim 1
source /opt/Xilinx/13.4/ISE_DS/settings64.sh 2 XPS% make: *** No rule to make target `___xps/simgen.opt’, needed ….. Go fix the address in ….hw/system.mhs 3 ERROR:EDK – for point-2-point interface connector ‘nf10_crypto_0_m_axis’….. 3a Add the crypto module to the MHS file (edit system.mhs OR use the GUI) 3b Check the connectivity of the module (To each of the OPL and Output Queue modules) 4 ….Cannot find MPD for weird core name …. 5 ERROR: Simluator:778 – Static elaboration of top level Verilog design unit(s) in library work failed edit pao file (~/NetFPGA-10G-live/projects/crypto_nic/hw/pcores/nf10_crypto_v1_00_a/data/nf10_crypto_v2_1_0.pao) For simulation uncomment the first four lines (39-42) comment out the last line here (43) For synthesis uncomment lines 39,40,43 comment out lines 41 and 42 6 if sim finishes but complains that each test passes 2 packets but all tests FAIL – this means your static key is different between your code and your Makefile (argument to make_pkts.py) check the log (verilog hhhhhhhk

163 Simple Verilog error  Lots of errors
When the file is empty…

164 Section IX: Conclusion

165 Acknowledgments NetFPGA Team at Stanford University (Past and Present): Nick McKeown, Glen Gibb, Jad Naous, David Erickson, G. Adam Covington, John W. Lockwood, Jianying Luo, Brandon Heller, Paul Hartke, Neda Beheshti, Sara Bolouki, James Zeng, Jonathan Ellithorpe, Sachidanandan Sambandan, Eric Lo NetFPGA Team at University of Cambridge (Past and Present): Andrew Moore, David Miller, Muhammad Shahbaz, Martin Zadnik Matthew Grosvenor, Gianni Antichi, Neelakandan Manihatty-Bojan, Georgina Kalogeridou, Jong Hun Han, Noa Zilberman All Community members (including but not limited to): Paul Rodman, Kumar Sanghvi, Wojciech A. Koszek, Yahsar Ganjali, Martin Labrecque, Jeff Shafer, Eric Keller , Tatsuya Yabe, Bilal Anwer, Yashar Ganjali, Martin Labrecque Kees Vissers, Michaela Blott, Shep Siegel

166 Acknowledgement Support for the NetFPGA-10G project has been provided by the following companies and institutions Disclaimer: Any opinions, findings, conclusions, or recommendations expressed in these materials do not necessarily reflect the views of any sponsors supporting this project.

167 Thanks to our Sponsors:
Support for the NetFPGA project has been provided by the following companies and institutions Disclaimer: Any opinions, findings, conclusions, or recommendations expressed in these materials do not necessarily reflect the views of the National Science Foundation or of any other sponsors supporting this project. This effort is also sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contract FA C This material is approved for public release, distribution unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.


Download ppt "NetFPGA Spring Camp Day 1"

Similar presentations


Ads by Google