CprE / ComS 583 Reconfigurable Computing

CprE / ComS 583 Reconfigurable Computing
Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Applications II

Recap – Splash 1 Architecture
VME Bus VSB Bus Interface Interface FIFO IN Control FIFO OUT F3 F2 F1 F0 F31 F30 F29 F28 M3 M2 M1 M0 M31 M30 M29 M28 M4 M5 M6 M7 M24 M25 M26 M27 F4 F5 F6 F7 F24 F25 F26 F27 F11 F10 F9 F8 F23 F22 F21 F20 M11 M10 M9 M8 M23 M22 M21 M20 M12 M13 M14 M15 M16 M17 M18 M19 F12 F13 F14 F15 F16 F17 F18 F19 September 14, 2006 CprE 583 – Reconfigurable Computing

From Prev Board M0 M1 M2 M3 M4 M5 M6 M7 M8 F1 F2 F3 F4 F5 F6 F7 F8 Crossbar Switch F0 To Next Board F16 F15 F14 F13 F12 F11 F10 F9 The crossbar switch takes 9 nibbles in, each of which can be routed to any destination. It can also change between 8 configurations on the fly. Its full reprogrammability power was never exploited, as the designers never used more than 4 modes, and often only 1. M16 M15 M14 M13 M12 M11 M10 M9 CprE 583 – Reconfigurable Computing September 14, 2006

External Input Array Boards R-Bus S-Bus Interface Board The External I/O ports were probably inspired by the PAM, which allowed direct connections to external devices. The (pink) SBus is used either as an extension of the SBus line from the host to load programs, or as a return path for data from the Array boards. The RBus is used when in Systolic Mode. the SIMD Bus is used in SIMD Mode. The lines on the right are used to chain the boards together in Systolic Mode using the RBus. Sparc-based Host SIMD S-Bus External Output CprE 583 – Reconfigurable Computing September 14, 2006

Recap – Dictionary Search
Shift amount: 7 bits Hash function: Clear hash register Input the letters “th” Temporary Result Result for “th” Input for letters “e_” Temporary result Result for “the_” XOR two character value with temp result and hash function Rotate result Different hash function for each FPGA CprE 583 – Reconfigurable Computing September 14, 2006

CprE 583 – Reconfigurable Computing
Outline Recap The Field-Programmable Port Extender (FPX) FPX Architecture FPX Programming Model FPX Applications Pattern Matching Packet Classification Rule Processing September 14, 2006 CprE 583 – Reconfigurable Computing

Application – Network Processing
Networking applications well-suited for reconfigurable hardware Target signatures change often Massive quantities of stream-based data Repetitive operations Connecting up to a realistic networking environment is hard Washington University experimental setup one of the best Shows importance of both memory and processing capability Numerous experiments performed over the past five years September 14, 2006 CprE 583 – Reconfigurable Computing

Network Routing with the FPX
IP Packets IP Packets FPX Modules distributed across each port of a switch IP packets (over ATM) enter and depart line card Packet fragments processed by modules Advantages: New protocols implemented directly in silicon Easy to upgrade in the field The FPX processes packets at the edge of a switch, in between the fiber-optic line card and the backplane switch fabric. The FPX sees both flows in the ingress (input to the swich) and egress (exit from the switch). The throughput of the packet processing system must be the same or greater than the maximum link speed of the fiber optic card. The current FPX handles link rates up to OC48. September 14, 2006 CprE 583 – Reconfigurable Computing

FPX Hardware Device The FPX is implemented on a 12-layer circuit board September 14, 2006 CprE 583 – Reconfigurable Computing

FPX Hardware in a WUGS-20 Switch
Washington University Gigabit Switch (WUGS) Up to 160 Gbps of bandwidth September 14, 2006 CprE 583 – Reconfigurable Computing

FPGA-based Router FPX module contains two FPGAs NID – network interface device Performs data queuing RAD – reprogrammable application device Specialized control sequences September 14, 2006 CprE 583 – Reconfigurable Computing

Reprogrammable Application Device
Spatial Re-use of FPGA Resources Modules implemented using FPGA logic Module logic can be individually reprogrammed Shared Access to off-chip resources Memory Interfaces to SRAM and SDRAM Common Datapath to send and receive data September 14, 2006 CprE 583 – Reconfigurable Computing

Architecture of the FPX
RAD Large Xilinx FPGA Attaches to SRAM and SDRAM Reprogrammable over network Provides two user-defined Module Interfaces NID Provides Utopia Interfaces between switch & line card Forwards cells to RAD Programs RAD The FPX is implemented with two FPGAs: the RAD and the NID. The Reprogrammable Application Device (RAD) contains the user-defined modules. External SRAM and SDRAM connect to the RAD. The RAD contains the multiple modules of application-specific functionality The NID contains the logic to: Interface with switch and line card Route traffic flows to the RAD Reprogram the modules on the RAD September 14, 2006 CprE 583 – Reconfigurable Computing

Architecture of the FPX (cont.)
September 14, 2006 CprE 583 – Reconfigurable Computing

FPX SRAM Provide low latency for fast table-lookups Zero Bus Turnaround (ZBT) allows back-to-back read / write operations every 10ns Dual, Independent Memories 36-bit wide bus The SRAM memories are pipelined, so that they can be fully utilized. Use Zero Byte Turnaround (ZBT) memory, a memory access has four cycles of latency. The SRAM is well suited for fast memory lookups. CprE 583 – Reconfigurable Computing September 14, 2006

FPX SDRAM Dual, independent SDRAM memories 64-bit wide, 100 MHz 64Mb / Module : 128 Mb total [expandable] Burst-based transactions [1-8 word transfers] Latency of 14 cycles to Read/Write 8-word burst SDRAM provides cost-effective storage of data. A 64-bit wide, pipelined module allows full-throughput queuing of traffic to and from the RAD. CprE 583 – Reconfigurable Computing September 14, 2006

Routing Traffic Flows Traffic flows routed among Switch Line Card RAD.Switch RAD.Linecard VC NID Functions Check packets for errors Process commands Control, status, & reprogramming Implement per-flow forwarding Switch LineCard ccp The NID provides a non-blocking, multi-port switch for forwarding data to the appropriate module. As flows enter the system EC CprE 583 – Reconfigurable Computing September 14, 2006

Typical Flow Configurations
VC EC ccp RAD Switch LineCard Default Flow Action (Bypass) VC EC ccp RAD Switch LineCard (Per-flow Output Queueing) Egress Processing VC EC ccp RAD Switch LineCard Ingress Processing (IP Routing) VC EC ccp RAD Switch LineCard Full RAD Processing (Packet Routing and Reassembly) VC EC ccp LineCard Switch RAD Full Loopback Testing (System Test) VC EC ccp RAD Switch LineCard Partial Loopback Testing (Egress Flow Processing Test) Several configurations of flow routing are possible. By default, the FPX forwards all flows directly between the ingress and egress ports To process packets just on the egress path, flows can be routed from the switch to a RAD module; then routed from the RAD module to the line card. To process packets on the ingress path, flows are routed from the line card to a RAD module; then routed from the RAD module to the switch. Using both modules, packets can be processed on the ingress and egress paths Modules can also be chained, allowing packets to be processed by multiple modules as they pass through the FPX September 14, 2006 CprE 583 – Reconfigurable Computing

Reprogramming Logic NID programs at boot from EPROM Switch Controller writes RAD configuration memory to NID Configuration file for RAD arrives transmitted over network via control cells Switch Controller issues {Full/Partial} reconfigure command NID reads RAD config memory to program RAD Performs complete or partial reprogramming of RAD Switch Element IPP OPP LC CprE 583 – Reconfigurable Computing September 14, 2006

FPX Interfaces Provides
Well defined Interface Utopia-like 32-bit fast data interface Flow control allows back-pressure Flow Routing Arbitrary permutations of packet flows through ports Dynamically Reprogrammable Other modules continue to operate even while new module is being reprogrammed Memory Access Shared access to SRAM and SDRAM Request/Grant protocol September 14, 2006 CprE 583 – Reconfigurable Computing

Pattern Matching using the FPX
Use Hardware to detect a pattern in data Modify packet based on match Pipeline operation to maximize throughput September 14, 2006 CprE 583 – Reconfigurable Computing

“Hello, World” Module Function
September 14, 2006 CprE 583 – Reconfigurable Computing

Logical Implementation
Append “WORLD” to payload VCI Match New Cell CprE 583 – Reconfigurable Computing September 14, 2006

The Wrapper Concept App Wrapper Wrapper September 14, 2006 CprE 583 – Reconfigurable Computing

AAL5 Encapsulation Payload is packed in cells Padding may be added 64 bit Trailer at end of cell Trailer contains CRC-32 Last Cell indication bit (last bit of PTI field) September 14, 2006 CprE 583 – Reconfigurable Computing

HelloBob Module SRAM Interface Input UDP Hello Bob Output Echo UDP Processor IP Processor Frame Processor Cell Processor September 14, 2006 CprE 583 – Reconfigurable Computing

Results: Performance Operating Frequency: 119 MHz. 8.4ns critical path Well within the 10ns period RAD's clock. Targeted to RAD’s V1000E-FG680-7 Maximum packet processing rate: 7.1 Million packets per second. (100 MHz)/(14 Clocks/Cell) Circuit handles back-to-back packets Slice utilization: 0.4% (49/12,288 slices) Less than one half of one percent of chip resources Search technique can be adapted for other types of data matching and modification Regular expressions Parsing image content … September 14, 2006 CprE 583 – Reconfigurable Computing

CAM-based Packet Matching
Sample Packet: Source Address = (dotted.decimal) Destination Address = (dotted.decimal) Source Port = 4096 (decimal) Destination Port = 50 (decimal) Protocol = TCP (6) Payload = “Consolidate your loans. CALL NOW” Payload Lists = { General SPAM (0), Save Money SPAM (1) } Content Vector = “ ” (binary) = x”03” (hex) 111 104 103 72 71 40 39 8 7 Con- tent = 03 Src IP (hex) = 80FC0505 Dest IP (hex) = 8D8E0202 Src Port = 1000 Dest Port = 0050 Proto = 06 September 14, 2006 CprE 583 – Reconfigurable Computing

Sample Filter DROP the packet : It matches the filter
Source Address = / 16 Destination Address = / 16 Source Port = Don’t Care Destination Port = 50 Protocol = TCP (6) Payload includes general SPAM (List 0) Con- ten t= 01 Src IP value = 80FC0000 Dest IP (hex) = 8D8E0000 Src Port = 0000 Dest Port = 50 Proto = 06 Value Con- ten t= 01 Src IP (hex) = FFFF0000 Dest IP (hex) = FFFF0000 Src Port = 0000 Dest Port = FFFF Proto = FF Mask: 1=care 0=don’t care 103 72 71 40 39 8 7 Con- tent= 03 Src IP (hex) = 80FC0505 Dest IP (hex) = 8D8E0202 Src Port = 1000 Dest Port = 0050 Proto = 06 IP Packet DROP the packet : It matches the filter September 14, 2006 CprE 583 – Reconfigurable Computing

Packet Classifier with FlowID
16 bits 112 bits Flow ID [1] CAM MASK [1] CAM VALUE [1] Flow ID [2] CAM MASK [2] CAM VALUE [2] 16 bits - - CAM Table - - Flow ID Flow ID [3] CAM MASK [3] CAM VALUE [3] . . . Resulting Flow Identifier . . . . . . Flow ID [N] CAM MASK [N] CAM VALUE [N] Bits in IP Header Flow List Priority Encoder Payload Match Bits Source Port Protocol Mask Matchers Dest. Port Value Comparators Source Address Destination Address CprE 583 – Reconfigurable Computing September 14, 2006

Fast IP Lookup Algorithm
Function Search for best matching prefix using Trie algorithm Prefix Next Hop * 01* 4 7 10* 2 110* 9 0001* 1 1011* 00110* 5 01011* 3 1 A trie is a tree-based data structure for storing strings in order to support fast pattern matching The name “trie” comes from the word “retrieval”, since the main application of tries is in information retrieval September 14, 2006 CprE 583 – Reconfigurable Computing

Hardware Implementation in the FPX
SRAM1 1 SRAM1 Interface Remap VCIs for IP packets Extract IP Headers Request Grant IP Lookup Engine counter On-Chip Cell Store SRAM2 Packet Reassembler Control Cell Processor RAD FPGA NID FPGA LC SW September 14, 2006 CprE 583 – Reconfigurable Computing

Pipelined FIPL Operations
Generate Address Generate Address Latch ADDR into SRAM SRAM D < M[A] Latch Data into FPGA Compute Time (cycles) Time (cycles) Space (Parallel lookup units on FPGA) Throughput : Optimized by interleaving memory accesses Operate 5 parallel lookups t_pipelined_lookup = 550ns / 5 = 110 ns Throughput = 9.1 Million packets / second September 14, 2006 CprE 583 – Reconfigurable Computing

Other Modules Implemented
IPv4 CAM Filter 104 Bit header matching Fast IP Lookup (FIPL) Longest Prefix Match MAE-West at 10M pkts/second Packet Content Scanner Reg. Expression Search Data Queueing Per-flow queue in SDRAM IPv6 Tunneling Module Tunnels IPv6 over IPv4 Statistics Module Event counter Traffic Generator Per-flow mixing Video Recoder Motion JPEG Embedded Processor KCPSM CprE 583 – Reconfigurable Computing September 14, 2006

Summary Field Programmable Port Extender (FPX) Network-accessible Hardware Reprogrammable Application Device Module Deployment Modules implement fast processing on data flow Network allows Arbitrary Topologies of distributed systems Project Website September 14, 2006 CprE 583 – Reconfigurable Computing

CprE / ComS 583 Reconfigurable Computing

Similar presentations

Presentation on theme: "CprE / ComS 583 Reconfigurable Computing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CprE / ComS 583 Reconfigurable Computing

Similar presentations

Presentation on theme: "CprE / ComS 583 Reconfigurable Computing"— Presentation transcript:

Similar presentations

About project

Feedback