Presentation is loading. Please wait.

Presentation is loading. Please wait.

CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

Similar presentations


Presentation on theme: "CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer."— Presentation transcript:

1 CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer Science University of Pittsburgh

2 ISPASS 2007 Search ops in applications  Search (or lookup) operations represent an important common function  Network packet processing For each arriving packet, determine the output port Given packet information, find a matching classification rule Each look up can incur many memory accesses  Speech recognition Searching (e.g., dictionary lookup) takes up ~24% of CPU cycles  Forthcoming RMS (Recognition, Mining, and Synthesis) apps

3 ISPASS 2007 Search performance and power  Search performance must match increasing line speeds For OC-768, up to 104M packets must be processed per second Network traffic has doubled every year [McKeown03] Routing tables (~200K prefixes in a core router) are growing [RIS] IPv6  Power and thermal issue already a critical limiting factor in network processing device design [McKeown03]  Search in battery-operated devices should be energy-efficient  Conventional search solutions Software methods (tries, hash table, …) Hardware methods (CAM, TCAM, …)

4 ISPASS 2007 IP lookup using a trie  Consider an IP address: 0 1 0 0 0 1 1 0  Software approach is “flexible”  high memory capacity requirement  high memory bandwidth requirement  not SCALABLE

5 ISPASS 2007 IP lookup using TCAM  Consider an IP address: 0 1 0 0 0 1 1 0 110100* 110101* 110111* 01000* 01100* 01101* 11011* 0100* 0110* 1101* 10* 0* sort before storing choose the first among the matched  high bandwidth, constant time lookup  TCAMs are relatively small, expensive  power consumption very high  not SCALABLE

6 ISPASS 2007 CA-RAM – a hybrid approach  Can we do better than the existing conventional schemes? CAM-like search performance RAM-like cost and power  CA-RAM combines hashing w/ hardware parallel matching  CA-RAM design goals High lookup performance Low power consumption Smaller chip area per stored datum Straightforward system-level integration

7 ISPASS 2007 Talk roadmap  What is CA-RAM?  Prototype design  Case study 1: IP lookup  Case study 2: Trigram lookup for speech recognition

8 ISPASS 2007 CA-RAM – Content Addressable RA M  Separate match logic and memory  Match logic for a single row, not every row  Allows the use of dense RAM technology  Enables highly reconfigurable match logic  Keep keys sorted in each row, not in entire array Match logic Memory cells Conventional CAM/TCAMCA-RAM

9 ISPASS 2007 Very simple, yet efficient  Use hashing to store keys in a particular row  To look up, hash the search key and retrieve one row  Perform matching on entire row in parallel  Achieve full content addressability w/o paying overhead! Index generator Key i1 Match processor 1 … … Key i2 Key j2 Key j1 Match processor 2 … search key

10 ISPASS 2007 Pipelined CA-RAM operation Index generatorSearch key Key i1 Match processor 1 Key i2 Key j2 Key j1 Match processor 2 ResultMatch processor 3 Key i3 Key j3 Step 1Step 2Step 3Step 4 Index Key j2 Key j1 Key j3 Search keyMatch processor 2 Index generationMemory access Key matching Result forwarding

11 ISPASS 2007 Dealing w/ bucket overflows  Careful design of hash function  Increase bucket size Reduce load factor (  );  = # of occupied entries / # of total entries  Use “chaining”; store overflows in subsequent rows Multiple accesses per lookup  Use a small overflow CAM, accessed in parallel Similar to popular “victim caching”  Use two-level hashing and employ multiple CA-RAM banks … …

12 ISPASS 2007 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards … …

13 ISPASS 2007 Adapting key size Key i1 Reconfigurable match logic Key i2 Key j2 Key j1 Key i3 Key j3 Match information Key i1 Key i2 Key j2 Key j1  Adapting key size is straightforward  Will benefit supporting multiple apps/ standards Select key bits for matching

14 ISPASS 2007 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t … …

15 ISPASS 2007 Supporting binary/ternary matching Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Mask j1 Mask i1  Developed configurable comparator  T-matching requires 2 bits / 1 symbol  Supporting different types of matching in different bit positions feasible Consider mask bits or not

16 ISPASS 2007 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t  Storing data and keys in a CA-RAM module Cuts # of memory accesses for a lookup by half … …

17 ISPASS 2007 Simult. key matching & data access Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Data j1 Data i1  Data access follows TCAM lookup  CA-RAM supports data embedding  Cuts memory traffic & latency by half Match result & Data Match key & bypass data

18 ISPASS 2007 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t  Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half  Providing range checking capabilities Beneficial for rule-based packet filtering … …

19 ISPASS 2007 Supporting range checking Reconfigurable match logic Match information Key i1 Range i1 Range j1 Key j1 Search key  (Range checking causes troubles)  (Entries must be expanded)  CA-RAM can upport range checking efficiently Match key & check range

20 ISPASS 2007 CA-RAM-based memory subsystem

21 ISPASS 2007 Prototype implementation  We implemented a prototype CA-RAM slice design (w/ a degree of reconfigurability) and evaluated its power and area advantages over state-of-the-art TCAMs  We used a standard cell (0.16  m) based ASIC design flow Step# cells Area,  m 2 Delay, ns Expand search key3,80466,228(0.89) Calculate match vector5,25210,5910.95 Decode match vector8991,9701.91 Extract result6,03721,7751.99 Total15,992100,5644.85

22 ISPASS 2007 Area and power: CA-RAM vs. TCAM Per Cell Area (um 2 ) @130nm 4.5x 11x 4.5Mb Power (W) @143MHz 14x 4x Cell area (  m 2 ) @130nm CMOS Power (W) 4.5Mb @143MHz  CA-RAM area advantage 4.5x~11x  CA-RAM power advantage 4x~14x

23 ISPASS 2007 Performance: CA-RAM vs. (T)CAM

24 Case study 1: IP lookup

25 ISPASS 2007 Problem description  Given A set of prefixes (each prefix is associated with output port number) IP address  Find a prefix that matches with input IP address and return output port number associated with it In the presence of multiple matching prefixes, choose the longest  Procedure Find a good hash function to distribute prefixes Determine CA-RAM organization

26 ISPASS 2007 Data set and hashing method  IP core router’s table having 186,760 entries  Bit selection scheme [Zane et al. ‘03] 98% of prefixes are at least 16 bits long Select hash bits from the first 16 bits (low-order bits)

27 ISPASS 2007 Shaping CA-RAM Consider multiple design points: Design B Design A Design D Design C Design E Design F 2,048 rows  (32 entries) 4,096 rows  (64 entries) (  = 0.47) (  = 0.40) (  = 0.36) (  = 0.24) (  = 0.36)

28 ISPASS 2007 Performance Spilled entries Average memory access latency (  = 0.47)(  = 0.40)(  = 0.36) (  = 0.24)(  = 0.36) “Uniform” traffic “Skewed” traffic  With a properly chosen ,  CA-RAM achieves near-constant AMAL

29 ISPASS 2007 Area and power  CA-RAM advantageous over TCAM Design B Relative area or power

30 Case study 2: Trigram lookup in speech recognition

31 ISPASS 2007 Problem, data set, and hashing  Problem Look up a trigram in the trigram database  Data set A subset of the Sphinx trigram database We picked up entries having 13~16 characters Still 5,385,231 entries or 86MB  Hashing DJB, an efficient string hash function (Used in Sphinx)

32 ISPASS 2007 Result

33 ISPASS 2007 Data distribution

34 ISPASS 2007 Area comparison Relative area CAMCA-RAM

35 ISPASS 2007 CA-RAM conclusions  Compared w/ software methods Less # of memory accesses; higher lookup performance  Compared w/ CAM or TCAM Higher density matching that of DRAM  large lookup table Competitive performance Low power – a critical advantage for cost-effective system design Reconfigurable Can accommodate apps having different key/record sizes, binary vs. ternary searching requirements, range checking, … Can adopt new standards much more easily, e.g., IPv6  Two case studies show the efficacy of the CA-RAM approach 3~5× improvement in area and power, compared with CAM/TCAM

36 CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Questions?


Download ppt "CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer."

Similar presentations


Ads by Google