Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.

Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications 2007 Present: Chen-Yu Lin ( 林呈俞 ) Date: Oct, 25, 2007

Outline Introduction Packet classification algorithm Filter decomposition algorithm Architecture of the search engine Performance of the search engine

Introduction Adopt the two-stage classification approach: 1. Determine the best matching source and destination prefix pair. 2. Determine the highest priority matching filter by comparing the remaining field against a short list of candidate rules in parallel. Filter decomposition Reduce the 2-D search problem to a pseudo 1-D problem. The decomposed filters are organized as a height-balanced search tree. This method is scalable to large filter databases and IPv6.

Packet classification algorithm 2-stage classification approach 1. Determine the best matching source-destination prefix pair. 2. Search the list of filters associated with the selected prefix pair for the highest priority filter Convert the 2-D search problem to 1-D problem. Source axis is partitioned into disjoint segments, called bands. The decomposed filters are organized as a height-balanced search tree.

Packet classification algorithm A decomposed filter is represented by a pair of end points. Lower-left corner (LC) Padding 0 to both source & destination prefixes. Lower-right corner (RC) Padding 0 to source prefix and padding 1 to destination prefix. End point definition  5 – tuple ps (pd): extended source (destination) prefix. ls (ld): original length of source (destination) prefix. type: type of end point. Given a source-destination address pair, the search algorithm will Determine the unique matching source band. Find the best matching filter.

Packet classification algorithm Conflicting filters = 0110000001101111

Packet classification algorithm If the LC- and RC-points of a filter are adjacent to each other in the sorted list, then the pair of points is replaced by a “merge” point, whose value is equal to LC- point.

Packet classification algorithm Composite key store in a node is, where the length of two prefixes be ls and ld. An input address pair (S, D) is compared with the composed key.

Packet classification algorithm

External node An external node provides the reference to the best matching filter, if any, when the search operation falls off the tree. The role of the external node is similar to the pre-computed best matching prefix in the range search approach.

Packet classification algorithm

Filter decomposition algorithm Extract all non-wild card source prefixes from the filter table and put them into a list. A prefix whose length is less than the full address length is then replaced by its lower and higher end-point.

Filter decomposition algorithm Let List of end-points regionssource bands 00100000 L 00100000 ~ 00101111 0010* 00110000 L 00110000 ~ 00110111 00110* 00110111 H 00111000 ~ 00111111 00111* 00111111 H

Architecture of the search engine To speed up the search operation is by pipelining Nodes of the search tree are stored in separate memory modules according to their level number. Linear pipeline can achieve optimal throughput Drawback: Incremental updates cannot be supported. Rebalanced a search tree may need to shift up to 75% nodes. IPv4  134 bits IPv6  330 bits

Architecture of the search engine Flag If left (right) flag is set  left (right) pointer = left (right) external node. Otherwise, left (right) pointer = left (right) subtree. The amount of memory to support a 256K nodes search tree is around 4.2MB and 10.3MB for IPv4 and IPv6. The depth of a height-balanced search tree with 256K nodes is no more than 19. There are only 255 nodes in the first eight levels. A tag is assigned to a task when it’s submitted to the search engine.

Architecture of the search engine 255 entries DRAM 32K entries Comparator + fast buffer (SRAM)

Architecture of the search engine Output buffer of PUs L1 and L2 are equipped with double output buffer. Input buffer of Pus L1 have a single slot buffer. L2 equipped with a multiple slots dual-write port buffer (FIFO). Assume the size of the L2 PUs input buffer is B slots. If the number of concurrent tasks is larger than 2B+3  Deadlock happen. To resolve the deadlock problem Limiting the number of tags no more than 2B+3. Implement a deflection routing scheme.

Performance of the search engine Simulation k and M = 8 ACLX-5 with about 46K rules Depth of search tree = 17 L2 PU input buffer size = (B)

Performance of the search engine Best performance Select buffer size of B = 32. Limiting the number of tags to 2B+3

Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.

Similar presentations

Presentation on theme: "Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.

Similar presentations

Presentation on theme: "Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications."— Presentation transcript:

Similar presentations

About project

Feedback