Doctoral Dissertation Proposal: Acceleration of Network Processing Algorithms Sailesh Kumar Advisors: Jon Turner, Patrick Crowley Committee: Roger Chamberlain,

Slides:

Advertisements

Similar presentations

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

Advertisements

Spring 2006CS 685 Network Algorithmics1 Principles in Practice CS 685 Network Algorithmics Spring 2006.

Network Algorithms, Lecture 4: Longest Matching Prefix Lookups George Varghese.

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.

Detecting Evasion Attacks at High Speeds without Reassembly Detecting Evasion Attacks at High Speeds without Reassembly George Varghese J. Andrew Fingerhut.

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.

M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.

296.3: Algorithms in the Real World

An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.

Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.

Fast Filter Updates for Packet Classification using TCAM Authors: Haoyu Song, Jonathan Turner. Publisher: GLOBECOM 2006, IEEE Present: Chen-Yu Lin Date:

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

Hashing. Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.

Hash Tables1 Part E Hash Tables  

Hash Tables1 Part E Hash Tables  

1 HEXA: Compact Data Structures or Faster Packet Processing Author: Sailesh Kumar, Jonathan Turner, Patrick Crowley, Michael Mitzenmacher. Publisher: ICNP.

Hashing General idea: Get a large array

P2P Course, Structured systems 1 Skip Net (9/11/05)

1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,

Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:

ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.

IP Address Lookup Masoud Sabaei Assistant professor

Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.

An Improved Algorithm to Accelerate Regular Expression Evaluation Author ： Michela Becchi 、 Patrick Crowley Publisher ： ANCS’07 Presenter ： Wen-Tse Liang.

Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:

An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley, Jonathan Turner Washington University in St. Louis.

Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.

Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:

Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.

David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.

Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.

Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.

1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.

Hashing Hashing is another method for sorting and searching data.

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.

Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,

Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.

Memory Compression Algorithms for Networking Features Sailesh Kumar.

Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.

TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.

Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.

High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.

1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.

Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.

Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.

TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.

Dynamic Pipelining: Making IP-Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar Presented by Sailesh Kumar.

An Improved DFA for Fast Regular Expression Matching Author ： Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.

Advanced Algorithms for Fast and Scalable Deep Packet Inspection Author ： Sailesh Kumar 、 Jonathan Turner 、 John Williams Publisher ： ANCS’06 Presenter.

Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.

IP Routers – internal view

HEXA: Compact Data Structures for Faster Packet Processing

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Packet Classification Using Coarse-Grained Tuple Spaces

A Small and Fast IP Forwarding Table Using Hashing

Algorithms: Design and Analysis

Indexing, Access and Database System Architecture

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Lecture-Hashing.

Presentation transcript:

Doctoral Dissertation Proposal: Acceleration of Network Processing Algorithms Sailesh Kumar Advisors: Jon Turner, Patrick Crowley Committee: Roger Chamberlain, John Lockwood, Bob Morley

2 - Sailesh Kumar - 11/24/2015 Focus on 3 Network Features n In this proposal, we focus on 3 network features n Packet payload inspection »Network security n Packet header processing »Packet forwarding, classification, etc n Packet buffering and queuing »QoS

3 - Sailesh Kumar - 11/24/2015 Overview of the Presentation n Packet payload inspection »Previous work –D 2 FA and CD 2 FA »New ideas to implement regular expressions »Initial results n IP Lookup »Tries and pipelined tries »Previous work: CAMP »New direction: HEXA n Hashing used for packet header processing »Why do we need better hashing? »Previous work: Segmented Hash »New direction: Peacock Hashing n Packet buffering and queuing »Previous work: multichannel packet buffer, aggregated buffer »New direction: DRAM based buffer, NP based queuing assist

4 - Sailesh Kumar - 11/24/2015 Delayed Input DFA (D 2 FA), SIGCOMM’06 n Many transitions in a DFA »256 transitions per state »50+ distinct transitions per state (real world datasets) »Need 50+ words per state n Can we reduce the number of transitions in a DFA Three rules a+, b+c, c*d b 4 5 a d a c a b d a c b c b b a c d d d c 4 transitions per state Look at state pairs: there are many common transitions. How to remove them?

5 - Sailesh Kumar - 11/24/2015 Delayed Input DFA (D 2 FA), SIGCOMM’06 n Many transitions in a DFA »256 transitions per state »50+ distinct transitions per state (real world datasets) »Need 50+ words per state n Can we reduce the number of transitions in a DFA Three rules a+, b+c, c*d+ 1 3 a a a b b c b b c d d d c 4 transitions per state Alternative Representation d c a b d c a 1 3 a a a b b c b b c d d d c d c a b d c a Fewer transitions, less memory

6 - Sailesh Kumar - 11/24/2015 D 2 FA Operation 1 3 a a a b b c b b c d d d c d c a b d c a 1 3 a c c b d Heavy edges are called default transitions Take default transitions, whenever, a labeled transition is missing DFA D 2 FA

7 - Sailesh Kumar - 11/24/2015 D 2 FA versus DFA n D 2 FAs are compact but requires multiple memory accesses »Up to 20x increased memory accesses »Not desirable in off-chip architecture n Can D 2 FAs match the performance of DFAs »YES!!!! »Content Addressed D 2 FAs (CD 2 FA) n CD 2 FAs require only one memory access per byte »Matches the performance of a DFA in cacheless system »Systems with data cache, CD 2 FA are 2-3x faster n CD 2 FAs are 10x compact than DFAs

8 - Sailesh Kumar - 11/24/2015 Introduction to CD 2 FA, ANCS’06 n How to avoid multiple memory accesses of D 2 FAs? »Avoid lookup to decide if default path needs to be taken »Avoid default path traversal n Solution: Assign labels to each state, labels contain: »Characters for which it has labeled transitions »Information about all of its default states »Characters for which its default states have labeled transitions find node R at location R R c d a b all ab,cd,R cd,R R V U find node U at hash(c,d,R) find node V at hash(a,b,hash(c,d,R)) Content Labels

9 - Sailesh Kumar - 11/24/2015 Introduction to CD 2 FA R c d all ab,cd,R cd,R R V U Input char = hash(a,b,hash(c,d,R)) Z l m P q all X Y pq,lm,Z lm,Z hash(c,d,R) Current state: V (label = ab,cd,R) hash(p,q,hash(l,m,Z)) a b d a (R, a) (R, b) … (Z, a) (Z, b) … lm,Z pq,lm,Z (X, p) (X, q) (V, a) (V, b) → X (label = pq,lm,Z)

10 - Sailesh Kumar - 11/24/2015 Construction of CD 2 FA n We seek to keep the content labels small n Twin Objectives: »Ensure that states have few labeled transitions »Ensure that default paths are as small as possible n Proposed new heuristic called CRO to construct CD 2 FA »Details in ANCS’06 paper »Default path bound = 2 edges => CRO algorithm constructs upto 10x space efficient CD 2 FAs

11 - Sailesh Kumar - 11/24/2015 Memory Mapping in CD 2 FA R c d all ab,cd,R cd,R R V U Z l m P q all X Y pq,lm,R lm,R a b (R, a) (R, b) … (Z, a) (Z, b) … WE HAVE ASSUMED THAT HASHING IS COLLISION FREE hash(a,b,hash(c,d,R)) hash(c,d,R)) hash(p,q,hash(l,m,Z)) COLLISION

12 - Sailesh Kumar - 11/24/2015 Collision-free Memory Mapping a a b c p q r l m n d e f bc, …. pqr, n, def, hash (abc, …) hash (def, …) hash (pqr, …) hash (lmn, …) hash (edf, …) lm hash (mln, …) Add edges for all Possible choices Four states 4 memory locations

13 - Sailesh Kumar - 11/24/2015 Bipartite Graph Matching n Bipartite Graph »Left nodes are state content labels »Right nodes are memory locations »An edge for every choice of content label »Map state labels to unique memory locations »Perfect matching problem n With n left and right nodes »Need O(logn) random edges »n = 1M implies, we need ~20 edges per node n If we provide slight memory over-provisioning »We can uniquely map state labels with much fewer edges n In our experiments, we found perfect matching without memory over-provisioning

14 - Sailesh Kumar - 11/24/2015 Reg-ex – New Directions n Three Key problems with traditional DFA based reg-ex matching »1.Employ complete signature to parse input data –Even if normal data matches only a small prefix portion –Full signature => large DFA »2.Only one active state of execution and no memory about the previous matches –Combinations of partial matches requires new DFA states »3. Inability to count certain sub-expressions –E.g. a{1024} will require 1024 DFA states n We aim at addressing each of these problems in the proposed research

15 - Sailesh Kumar - 11/24/2015 Addressing the First Problem n Divide the processing into fast and slow path n Split the signature into prefix and suffix »employ signature prefixes in fast path »Upon a match in fast path, trigger the slow path »Appropriate splitting can maintain low triggering rate n Benefits: »Fast path can employ a composite DFA for all prefixes –Due to small prefixes composite DFA will remain small –Higher parsing rate »Slow path will use separate DFA for each signature –No state explosion in slow path –Due to low triggering rate, slow path will not become a bottleneck »Reduces per-flow state –Fast path uses composite DFA, one active state per flow

16 - Sailesh Kumar - 11/24/2015 Fast and Slow Path Processing n Here we assume that ε fraction of the flows are diverted to the slow path n Fast path stores a per flow DFA state n Slow path may store multiple active states

17 - Sailesh Kumar - 11/24/2015 Splitting Reg-exes n Splitting can be performed based upon data traces n Assign probability to NFA states and make a cut so that slow path cumulative probability is low r 1 =.*[ gh ]d[^ g ]*g e r 2 =.* fag [^ i ]* i [^ j ]* j r 3 =.* a [ gh ] i [^ l ]*[ ae ] c Cumulative probability of slow path = 0.05

18 - Sailesh Kumar - 11/24/2015 Splitting Reg-exes Slow path will comprise of three separate DFAs, one for each signature Fast path will contain a composite DFA (14 states) p 1 =.*[ gh ]d[^ g ]*g p 2 =.* fa p 3 =.* a [ gh ] i r 1 =.*[ gh ]d[^ g ]*g e r 2 =.* fag [^ i ]* i [^ j ]* j r 3 =.* a [ gh ] i [^ l ]*[ ae ] c Notice the start state

19 - Sailesh Kumar - 11/24/2015 Protection against DoS Attacks n An attacker can attack such system by sending data that match the prefixes more often than provisioned »Slow path will become the bottleneck n Solution: Look at the history and determine if a flow is an attack flow or not »Compute anomaly index: weighted moving average of the number of times a flow has triggered the slow path »If a flow has high anomaly index, send it to a low rate queue

20 - Sailesh Kumar - 11/24/2015 Initial Simulation Results

21 - Sailesh Kumar - 11/24/2015 Addressing the Second Problem n NFA: compact but O(n) active states n DFA: 1 active state but state explosion »How to avoid state explosion while also keeping the per-flow active state information small n Propose a novel machine called History based Finite Automaton or H-FA »Augment a DFA with a history buffer »Transitions are taken looking at the history buffer contents »During certain transitions, items are inserted/removed from the history buffer n Claim: a small history buffer is sufficient to avoid state explosion and also keep a single active state

22 - Sailesh Kumar - 11/24/2015 Example of H-FA Construction DFA NFA state 2 is present in 4 DFA states. If remove the NFA state 2 from these DFA states, then we will have just 6 states

23 - Sailesh Kumar - 11/24/2015 H-FA DFA NFA state 2 is present in 4 DFA states. If remove the NFA state 2 from these DFA states, then we will have just 6 states This new machine uses a history flag in addition to its transitions to make moves

24 - Sailesh Kumar - 11/24/2015 H-FA This new machine uses a history flag in addition to its transitions to make moves  0  3,0 set is flag because  c  4,0  d reset  0 is flag because  c Input data = c d a b c reset  flag 1,0   a  set 0   b

25 - Sailesh Kumar - 11/24/2015 H-FA n In general, if we maintain a flag for each NFA state that represents a Kleene closure, we can avoid any state explosion n k closures will require at most k-bits in history buffer n There are some challenges associated with the efficient implementation of conditional transitions »We plan to work on these in the proposed research

26 - Sailesh Kumar - 11/24/2015 Addressing the Third Problem ab[^a]{1024}c def Replace flag by a counter Replace flag=1 condition with ctr=1024 Replace flag=0 condition with ctr=0 Increment ctr if ctr>0; reset when ctr reaches 1024 One of the primary goals of research to enable efficient implementation of counter conditions

27 - Sailesh Kumar - 11/24/2015 Early Results

28 - Sailesh Kumar - 11/24/2015 Overview of the Presentation n Packet payload inspection »Previous work –D 2 FA and CD 2 FA »New ideas to implement regular expressions »Initial results n IP Lookup »Tries and pipelined tries »Previous work: CAMP »New direction: HEXA n Hashing used for packet header processing »Why do we need better hashing? »Previous work: Segmented Hash »New direction: Peacock Hashing n Packet buffering and queuing »Previous work: multichannel packet buffer, aggregated buffer »New direction: DRAM based buffer, NP based queuing assist

29 - Sailesh Kumar - 11/24/2015 IP Address Lookup n Routing tables at router input ports contain (prefix, next hop) pairs n Address in the packet is compared to the stored prefixes, starting at left. n Prefix that matches largest number of address bits is desired match. n Packet is forwarded to the specified next hop. 1*5 00*3 01*5 0*7 001*2 011*3 1011*4 prefix next hop routing table address:

30 - Sailesh Kumar - 11/24/2015 Address Lookup Using Tries n Prefixes stored in “alphabetical order” in tree. n Prefixes “spelled” out by following path from top. »green dots mark prefix ends n To find best prefix, spell out address in tree. n Last green dot marks longest matching prefix. address: *5 00*3 01*5 0*7 001*2 011*3 1011*4 1

31 - Sailesh Kumar - 11/24/2015 Pipelined Trie-based IP-lookup Each level in different stage → overlap multiple packets Tree data-structure, prefixes in leaves (leaf pushing) Process IP address level-by-level to find the longest match P4 = 10010* P1P2 P4 P3 P5 1 P6 P7 Stages of different size: - Requires more memory - Largest stage becomes the bottleneck

32 - Sailesh Kumar - 11/24/2015 Circular Pipeline, ANCS’06 n Use circular pipeline and allow requests to enter/exit at any stage n Mapping: »Divide the trie into multiple sub-tries »Map each sub-trie with its root starting at different stage

33 - Sailesh Kumar - 11/24/2015 Mapping in Circular Pipeline

34 - Sailesh Kumar - 11/24/2015 Circular Pipeline n Benefits: »Uniform stage sizes »Less memory – no over-provisioning is needed in face of arbitrary trie shape »Higher throughput

35 - Sailesh Kumar - 11/24/2015 New Direction: HEXA n HEXA (History-based Encoding, eXecution and Addressing) »Challenges the assumption that graph structures must store log2n bits pointers to identify successor nodes n If labels of the path leading to every node is unique then these labels can be used to identify the node »In tries every node has a unique path starting at the root node »Thus, labels along the path will become the identifier of the node »Note that these labels need not be explicitly stored

36 - Sailesh Kumar - 11/24/2015 Traditional Implementation Addrdata 10, 2, 3 20, 4, 5 31, NULL, 6 41, NULL, NULL 50, 7, 8 61, NULL, NULL 70, 9, NULL 81, NULL, NULL 9 There are nine nodes; we will need 4-bit node identifiers Total memory = 9 x 9 bits

37 - Sailesh Kumar - 11/24/2015 HEXA based Implementation Define HEXA identifier of a node as the path which leads to it from the root Notice that these identifiers are unique Thus, they can potentially be mapped to unique memory address

38 - Sailesh Kumar - 11/24/2015 HEXA based Implementation Use hashing to map the HEXA identifier to memory address If we have a minimal perfect hash function f -A function that maps elements to unique location Then we can store the trie as shown below f(010) = 5 f(011) = 3 f(0100) = 6 f(-) = 4 f(0) = 7 f(1) = 9 f(00) = 2 f(01) = 8 f(11) = 1 AddrFast pathPrefix 1 1,0,0 P3 2 1,0,0 P2 3 1,0,0 P4 4 0,1,1 5 0,1,0 6 1,0,0 P5 7 0,1, ,0,1 P1 Here we use only 3-bits per node in fast path

39 - Sailesh Kumar - 11/24/2015 Devising One-to-one Mapping n Finding a minimal perfect hash function is difficult »One-to-one mapping is essential for HEXA to work n Use discriminator bits »Append c-bits to every HEXA identifier, that we can modify »Thus a node can have 2 c choices of identifiers »Notice that we need to store these c-bits, thus more than just 3-bits per node are needed n With multiple choices of HEXA identifiers for a node, we can reduce the problem, to a bipartite graph matching problem »We need to find a perfect matching in the graph to map nodes to unique memory locations

40 - Sailesh Kumar - 11/24/2015 Devising One-to-one Mapping

41 - Sailesh Kumar - 11/24/2015 Initial Results n Our initial evaluation suggests that 2-bits discriminators are enough to find a perfect matching »Thus 2-bits per node is enough instead of log2n bits

42 - Sailesh Kumar - 11/24/2015 Initial Results n Memory comparison to Eatherton’s trie n In future »Complete evaluation of HEXA based IP lookup: throughput, die size and power estimate »Extend HEXA to string and finite automaton

43 - Sailesh Kumar - 11/24/2015 Overview of the Presentation n Packet payload inspection »Previous work –D 2 FA and CD 2 FA »New ideas to implement regular expressions »Initial results n IP Lookup »Tries and pipelined tries »Previous work: CAMP »New direction: HEXA n Hashing used for packet header processing »Why do we need better hashing? »Previous work: Segmented Hash »New direction: Peacock Hashing n Packet buffering and queuing »Previous work: multichannel packet buffer, aggregated buffer »New direction: DRAM based buffer, NP based queuing assist

44 - Sailesh Kumar - 11/24/2015 Hash Tables n Suppose our hash function gave us the following values: »hash("apple") = 5 hash("watermelon") = 3 hash("grapes") = 8 hash("cantaloupe") = 7 hash("kiwi") = 0 hash("strawberry") = 9 hash("mango") = 6 hash("banana") = 2 »hash("honeydew") = 6 n This is called collision »Now what kiwi banana watermelon apple mango cantaloupe grapes strawberry

45 - Sailesh Kumar - 11/24/2015 Collision Resolution Policies n Linear Probing »Successively search for the first empty subsequent table entry n Linear Chaining »Link all collided entries at any bucket as a linked-list n Double Hashing »Uses a second hash function to successively index the table

46 - Sailesh Kumar - 11/24/2015 Performance Analysis n Average performance is O(1) n However, worst-case performance is O(n) n In fact the likelihood that a key is at a distance > 1 is pretty high These keys will take twice time to be probed These will take thrice the time to be probed Pretty high probability that throughput is half or three times lower than the peak throughput

47 - Sailesh Kumar - 11/24/2015 Segmented Hashing, ANCS’05 n Uses power of multiple choices »has been proposed earlier by Azar et. al n A N-way segmented hash »Logically divides the hash table array into N equal segments »Maps the incoming keys onto a bucket from each segment »Picks the bucket which is either empty or has minimum keys k i h( ) k i is mapped to this bucket k i+1 h( ) k i+1 is mapped to this bucket A 4-way segmented hash table 1 2

48 - Sailesh Kumar - 11/24/2015 Segmented Hash Performance n More segments improves the probabilistic performance »With 64 segments, probability that a key is inserted at distance > 2 is nearly zero even at 100% load »Improvement in average case performance is still modest

49 - Sailesh Kumar - 11/24/2015 Adding per Segment Filters k i h( ) k i can go to any of the 3 buckets h 1 (kiki ) h 2 (kiki ) h k (kiki ) : m b bits We can select any of the above three segments and insert the key into the corresponding filter

50 - Sailesh Kumar - 11/24/2015 Selective Filter Insertion Algorithm k i h( ) k i can go to any of the 3 buckets h 1 (kiki ) h 2 (kiki ) h k (kiki ) : m b bits Insert the key into segment 4, since fewer bits are set. Fewer bits are set => lower false positive With more segments (or more choices), our algorithm sets far fewer bits in the Bloom filter

51 - Sailesh Kumar - 11/24/2015 Problem with Segmented Hash n Bloom filter size is proportional to the total number of elements n An O(1) lookup can be maintained even if we omit the Bloom filter of one segment »With many segments and each of equal size, this omission will not lead to much reduction in Bloom filter size n An alternative is to use segments of different sizes and omit the Bloom filter in the largest segment »If the largest segment is say 90% of the total memory, then this will result in 90% reduction in the Bloom filter size »Peacock hashing utilizes this property

52 - Sailesh Kumar - 11/24/2015 Peacock Hashing K (actual keys) U (universe of keys) k 1 k 3 k 4 k 6 k 5 k 7 k 1 k 5 k 6 k 7 k 4 h 5 ( )h 4 h 3` ( )h 2 h 1 k2k2 k 2 k 3 Size of 1 st segment = 1 Size of second segment = c Size of i th segment = c x size of i -1 st segment No element will be discarded Until the first segment is filled

53 - Sailesh Kumar - 11/24/2015 Peacock Hash n Use Bloom filter for all segments but the largest segment »Thus, for c = say 10, the Bloom filter will be 10x smaller n Lookup is obvious »First consult all Bloom filters »If none of them shows a membership, then lookup in the largest segment »Else lookup into the segments which shows a membership n In order to enable deletes we require counting Bloom filters, but counters can be kept in slow path n Deletes however lead to imbalance in the loading

54 - Sailesh Kumar - 11/24/2015 Peacock Hash n A series of “delete and insert” may lead to overflow of the smaller segments Simulation time (sampling interval is 1000) Discard rate (%) Segment 5 43 Segment 6 Second phase begins 2 1

55 - Sailesh Kumar - 11/24/2015 Peacock Hash n Following every delete we perform a re-balancing, i.e. search the smaller segments and move elements to larger segment if possible Simulation time (sampling interval is 1000) Discard rate (%) Segment Segment 6 Second phase begins 2

56 - Sailesh Kumar - 11/24/2015 Issues and Future Directions n It is not clear, how to perform rebalancing efficiently »In the previous simulation, we use a brute force approach and search the entire segment, leading to O(n) rebalancing cost n Complicating factors: »Collision length higher than 1 in some segments »Double hashing collision policy »Use of 2-ary hashing may improve the efficiency, but will again complicate the re-balancing n Future Research Objectives: »Develop efficient re-balancing algorithm »Develop Bloom filters which better utilizes the power of multiple choices »Extend the scheme to memory segments with different bandwidth and access latency

57 - Sailesh Kumar - 11/24/2015 Overview of the Presentation n Packet payload inspection »Previous work –D 2 FA and CD 2 FA »New ideas to implement regular expressions »Initial results n IP Lookup »Tries and pipelined tries »Previous work: CAMP »New direction: HEXA n Hashing used for packet header processing »Why do we need better hashing? »Previous work: Segmented Hash »New direction: Peacock Hashing n Packet buffering and queuing »Previous work: multichannel packet buffer, aggregated buffer »New direction: DRAM based buffer, NP based queuing assist

58 - Sailesh Kumar - 11/24/2015 Packet Buffering and Queuing n First objective is to extend the multichannel packet buffer architecture to DRAM memories n We also plan to consider memories with different size, bandwidth and access latency »Extension of –Sailesh Kumar, Patrick Crowley, and Jonathan Turner, "Design of Randomized Multichannel Packet Storage for High Performance Routers", Proceedings of IEEE Symposium on High Performance Interconnects (HotI-13), Stanford, August 17-19, 2005.Design of Randomized Multichannel Packet Storage for High Performance Routers n Work on a NP specific queuing hardware assist »Extension of –Sailesh Kumar, John Maschmeyer, and Patrick Crowley, "Queuing Cache: Exploiting Locality to Ameliorate Packet Queue Contention and Serialization", Proceedings of ACM International Conference on Computing Frontiers (ICCF), Ischia, Italy, May 2-5, 2006.Queuing Cache: Exploiting Locality to Ameliorate Packet Queue Contention and Serialization

59 - Sailesh Kumar - 11/24/2015 n The proposed research is expected to take one year n Acknowledgments »Jon Turner »Patrick Crowley »Michela Becchi »Sarang Dharmapurikar »John Lockwood »Roger Chamberlain »Robert Morley »Balakrishnan Chandrasekaran »Michael Mitzenmacher, Harvard Univ. »George Varghese, UCSD »Will Eatherton, Cisco »John Williams, Cisco

60 - Sailesh Kumar - 11/24/2015 n Questions???