Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.

Similar presentations


Presentation on theme: "A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University."— Presentation transcript:

1 A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University of Pittsburgh

2 Background ISP Internet Subnet ISP end user router end user

3 Network packet processing tasks  Packet forwarding Given an IP address Look up in a table (IP table) a matching prefix Make sure the chosen prefix is longest  LPM (Longest Prefix Matching) requirement  Rule-based packet filtering Given a set of packet fields (src/dst IP, src/dst port, protocol, …) Look up in a rule database matching entries  Deep packet inspection Given a string in packet payload Look up in a signature database matching entries

4 Lookup performance scalability  Lookup performance must match increasing line speeds For OC-768, up to 104M packets must be processed per second Network traffic has doubled every year [McKeown03] Router capacity doubles every 18 months  Capacity pressure Routing tables (~200K prefixes in a core router) are growing [RIS] # of firewall rules increases; 100K rules are practical [Baboescu04] IPv6  Power and thermal issues already a critical limiting factor in network processing device design [McKeown03]  Two conventional lookup solutions Software methods (tries, hash table, …) Hardware methods (TCAM, Bloom filter, …)

5 IP lookup using a trie  Consider an IP address:  “flexibility”  high memory capacity requirement  low memory bandwidth utilization  not SCALABLE

6 IP lookup using TCAM  Consider an IP address: * * * 01000* 01100* 01101* 11011* 0100* 0110* 1101* 10* 0* sort before storing choose the first among the matched  high bandwidth, constant time lookup  TCAMs are relatively small, expensive  power consumption very high  not SCALABLE

7 Recap: Why is TCAM inefficient?  all bits are involved in matching  large embedded match logic  “large” means more work in this case

8 CA-RAM–a hybrid approach  Can we do better than the existing schemes? Flexibility and search performance Exploit optimized RAM designs Hardware approach (software too slow)  CA-RAM combines hashing w/ hardware parallel matching  CA-RAM design goals High lookup performance Low power consumption Smaller chip area per stored datum Straightforward system-level integration

9 CA-RAM–Content Addressable RAM  Separate match logic and memory  Match logic for a single row, not every row  Allows the use of dense RAM technology  Enables highly reconfigurable match logic  (Keep keys sorted in each row, not in entire array) Match logic Memory cells Conventional CAM/TCAMCA-RAM

10 Very simple, yet efficient  Use hashing to store keys in a particular row  To look up, hash the key and retrieve one row  Perform matching on entire row in parallel  Achieve (full) content addressability w/o paying overhead! Index generator Key i1 Match processor 1 … … Key i2 Key j2 Key j1 Match processor 2 … key

11 Pipelined CA-RAM operation Index generatorSearch key Key i1 Match processor 1 Key i2 Key j2 Key j1 Match processor 2 ResultMatch processor 3 Key i3 Key j3 Step 1Step 2Step 3Step 4 Index Key j2 Key j1 Key j3 Search keyMatch processor 2 Index generationMemory access Key matching Result forwarding

12 Dealing w/ bucket overflows  Careful design of hash function  Increase bucket size Reduce load factor (  );  = # of occupied entries / # of total entries Trade-off space for performance  Use “chaining”; store overflows in subsequent rows Multiple accesses per lookup  Use a small overflow CAM, accessed in parallel Similar to popular “victim caching” in computer architecture  Use two-level hashing and employ multiple CA-RAM banks … …

13 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards … …

14 Adapting key size Key i1 Reconfigurable match logic Key i2 Key j2 Key j1 Key i3 Key j3 Match information Key i1 Key i2 Key j2 Key j1  Adapting key size is straightforward  Will benefit supporting multiple apps/ standards Select key bits for matching

15 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t … …

16 Supporting binary/ternary matching Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Mask j1 Mask i1  Developed configurable comparator  T-matching requires 2 bits / 1 symbol  Supporting different types of matching in different bit positions feasible Consider mask bits or not

17 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t  Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half … …

18 Simult. key matching & data access Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Data j1 Data i1  Data access follows TCAM lookup  CA-RAM supports data embedding  Cuts memory traffic & latency by half Match information & Data Match key & bypass data

19 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t  Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half  Providing range checking capabilities Beneficial for rule-based packet filtering … …

20 Supporting range checking Reconfigurable match logic Match information Key i1 Range i1 Range j1 Key j1 Search key  (Range checking causes troubles)  (Entries must be expanded)  CA-RAM can upport range checking efficiently Match key & check range

21 Evaluation  We implemented a CA-RAM design (w/ reconfigurability) and evaluated its power and area advantages over state-of-the-art TCAMs  We experimented with real routing tables to estimate the load factor and the average memory accesses per lookup

22 Mapping a large IP routing table Consider multiple design points: Design B Design A Design D Design C Design E Design F 2,048 rows  (32 entries) 4,096 rows  (64 entries) (  = 0.47) (  = 0.40) (  = 0.36) (  = 0.24) (  = 0.36)

23 Mapping a large IP routing table Spilled entries Average memory access latency (  = 0.47)(  = 0.40)(  = 0.36) (  = 0.24)(  = 0.36) “Uniform” traffic “Skewed” traffic  With a properly chosen ,  CA-RAM achieves near-constant AMAL

24 Comparing CA-RAM and TCAM Per Cell Area (um 2 4.5x 11x 4.5Mb Power 14x 4x Cell area (  m 2 CMOS Power (W)  CA-RAM area advantage 4.5x~11x  CA-RAM power advantage 4x~14x

25 Conclusions  Compared w/ software methods Less # of memory accesses; higher lookup performance  Compared w/ TCAM Higher density matching that of DRAM  large lookup table Exceeds the speed of TCAM Low power – a critical advantage for cost-effective system design  Reconfigurability Can accommodate apps having different key/record sizes, binary vs. ternary searching requirements, range checking, … Can adopt new standards much more easily, e.g., IPv6

26 Mapping a large IP routing table  CA-RAM advantageous over TCAM Design B

27 CA-RAM components Index generator Result Bus Key i1 Match processor 1 … … … Key i2 Key j2 Key j1 Match processorsMatch processor 2 C bits 2 R rows N bits


Download ppt "A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University."

Similar presentations


Ads by Google