4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler
4/19/20022 Outline Motivation Target Platform Design Possible Applications Results Conclusion
4/19/20023 MOTIVATION
4/19/20024 Why work with TCP? Over 85% on internet traffic is TCP based Internet is growing TCP is a proven reliable transport for data delivery Provide high speed active networks the ability work with TCP flows
4/19/20025 Why not implement a full TCP stack in hardware? Complex protocol stack Several interactions on client interface (sockets?) Difficult to achieving high performance Large memories required for reassembly Limited number of simultaneous connections
4/19/20026 Solution Develop TCP flow monitor - TCPSplitter Utilize existing hardware infrastructure (FPX) Expand upon Layered Protocol Wrappers
4/19/20027 TARGET PLATFORM
4/19/20028 Configuration
4/19/20029 Washington University Gigabit Switch
4/19/ FPX Module
4/19/ FPX Internal Structure RAD: Reprogrammable Application Device Xilinx XCV1000E FPGA External SRAM/SDRAM Reprogrammable NID: Network Interface Device XCV600E FPGA Controls FPX Programs RAD Forwards traffic
4/19/ DESIGN
4/19/ Goals High Speed Design Small FPGA Footprint Simple Client Interface Support Large Number of Flows
4/19/ Challenges Dealing with dropped frames Packet reordering Maintaining state for large number of flows Developing an efficient implementation Processing data at line rates Minimizing resource requirements
4/19/ Assumptions/Limitations All frames must flow through switch Frames traversing in opposite direction handled as separate flow In-order processing of frames for each flow
4/19/ TCPSplitter Data Flow
4/19/ Input Processing Flow Classification TCP Checksum Engine Input State Machine Control FIFO Frame FIFO Output State Machine
4/19/ Layout
4/19/ Packet Routing Decisions Forward to outbound IP stack only Forward to both Client App and outbound IP stack Discard packet
4/19/ Packet Routing Non-TCP packets IP stack Invalid TCP checksum drop TCP SYN packets IP stack (Seq # < Expected Seq #) IP stack (Seq # > Expected Seq #) drop Else client AND IP stack
4/19/ Client Interface 1 bit Clock 1 bit Reset 32 bit Data Word 1 bit Data Enable 4 bit Start/End of Data Signals 2 bit Valid Data Bytes N bit Flow Identifier 2 bit Start/End of Flow Signals 1 bit TCA Client Application
4/19/ POSSIBLE APPLICATIONS
4/19/ Possible Application 1 Simultaneous update of multiple active network nodes
4/19/ Possible Application 2 Dynamic loading of customizable QoS algorithms
4/19/ Possible Application 3 Monitoring content of all TCP flows for security
4/19/ RESULTS
4/19/ Synthesis Results for Xilinx XCV1000E-7 TCPSplitterFull Wrappers (Cell + Frame + IP + TCP + Client) Space/LUTs617 (2%)4954 (20%) Register bits503 (2%)4933 (20%) Input processing delay 7 clock cycles *44-68 clock cycles * * Plus length of packet in 32 bit words
4/19/ Sample Run Start of frame Byte count IP payload TCP payload End of frame Flow ID
4/19/ Current State of Research Developed and simulated design Handles 256 simultaneous flows 33 bits * 256 entries = 1,056 bytes Synthesizes at 74MHz Simple test client counts TCP data bytes
4/19/ Future Directions Execute design in hardware Increase the number of simultaneous flows 262,144 flows require only 1 MByte (+) RAM Develop more elaborate client applications Improve processing performance Implement sliding window – passive solution Enhance frame generation utility for simulations
4/19/ CONCLUSION
4/19/ Conclusion Runs on reconfigurable hardware platform Process packets at Gigabit line rates Monitors all TCP flows Generates proper byte stream for each flow Requires only minimal memory (33 bits/flow) Simple client interface demonstrated
4/19/ Acknowledgments Advisor: Dr. John Lockwood
4/19/ Questions