Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Succinct Priority Indexing Structures for the Management of Large Priority Queues Hao Wang and Bill Lin University of California, San Diego IEEE IWQoS.

Similar presentations


Presentation on theme: "1 Succinct Priority Indexing Structures for the Management of Large Priority Queues Hao Wang and Bill Lin University of California, San Diego IEEE IWQoS."— Presentation transcript:

1 1 Succinct Priority Indexing Structures for the Management of Large Priority Queues Hao Wang and Bill Lin University of California, San Diego IEEE IWQoS 2009 Charleston, South Carolina July 13-15, 2009

2 2 Introduction Priority queues used many network applications –Per-flow advanced QoS scheduling –Management of per-flow DRAM packet buffers –Maintenance of per-flow statistics counters for real- time network measurements Items in priority queue sorted at all times (e.g. smallest key first) Common operations: I NSERT, F IND M IN, D ELETE Challenge: Need to operate at high speeds (e.g. 40+ Gb/s)

3 3 Introduction Binary heap common structure for priority queues –Has  (log 2 n) time complexity, where n is # items –e.g. in fine-grained per-flow scheduling, n can be very large (e.g. 1 million) –But  (log 2 n) may be too slow for high line rates Pipelined heaps [Bhagwan, Lin 2000][Ioannou 2001][Wang, Lin 2006] –Reduced amortized time complexity to constant time –At the expense of  (log 2 n) pipeline stages

4 4 Introduction van Emde Boas (vEB) trees –Instead of maintaining priority queue of sorted items, maintain sorted dictionary of keys –In many applications, since keys are represented by a k-bit integer, possible keys can only be from a fixed universe of U = 2 k values –Only  (log 2 log 2 U) complexity vs.  (log 2 n) for heaps Pipelined vEB trees [Wang, Lin 2007] –Reduced amortized time complexity to constant time –At the expense of  (log 2 log 2 U) pipeline stages

5 5 This Talk Propose 3 related Priority Indexing (PI) structures that leverage built-in hardware optimized instructions in modern 64-bit x86 processors (both Intel and AMD) Specifically, given a W=64 bit word, the instructions BSR (bit-scan-reverse) and BSF (bit-scan-forward) return the positions of the most-significant and least-significant bits, respectively 001011…001000 BSRBSF

6 6 This Talk Most-significant (least-significant) bit positions can also be easily implemented using efficient priority encoder designs in custom hardware

7 7 Basic Priority Indexing Structure Essentially a W-way tree. Maintains sorted subset S of N elements from a fixed universe of size U = W h, where N ≤ U. Each element i of the universe is associated with a binary bit b i Leaf node contains W bits of b i : b i = 1 if element i is in the set Non-leaf node serves as summary of child nodes: bit in non-leaf node set to 1 if its child node has at least one non-zero bit Data Structure of PI with h = 3

8 8 Example Operations T EST (i): Just check b i I NSERT (i): Start at leaf, set b i. Set corresponding bit in parent. Repeat until root. F IND M IN (): Start at root. Find MSB (most-significant-bit) and traverse sub-tree. Repeat until leaf. D ELETE (i): Start at leaf, clear b i. If word = 0 (no more bits set), clear corresponding bit in parent. Repeat until root. Data Structure of PI with h = 3

9 9 Time/Memory Complexity of PI T EST (i) takes constant time. All other operations take  (log w U) time, which is asymptotically not as “good” as the  (log 2 log 2 U) time complexity of a van Emde Boas tree However, for W = 64, PI requires fewer or same number of operations for U ≤ 64 billion (h ≤ 6), but much simpler For PI of size U, memory size only 1.016U bits,  (U) space

10 10 Motivation for Modified Structures PI is fast, but  (log w U) time may still not be fast enough for high-performance applications –Want constant time operations (issue new operation every cycle) But PI cannot be readily pipelined –Some PI operations are top-down (e.g. F IND M IN ), but others are bottom-up (e.g. D ELETE ) Propose 2 modified structures –Counting Priority Index (CPI) –Pipelined Counting Priority Index (Pipelined CPI)

11 11 Data Structure of CPI with h = 3 Counting-Priority-Index In addition to having a bit set to indicate a child node has at least one bit set, add counter to keep track of “how many” bits in a child node are set. Enables all top-down operations.

12 12 Data Structure of CPI with h = 3 Example CPI Operations F IND M IN (): Start at root. Find MSB (most-significant-bit) and traverse sub-tree. Repeat until leaf. Same as before. D ELETE (i): Start at root. Decrement counter. If count = 0, clear bit. Go down corresponding sub-tree. Repeat until leaf.

13 13 Time/Memory Complexity of CPI T EST (i) takes constant time. All other operations take  (log w U) time, same as basic PI structure. But all operations supported in top-down fashion For CPI of size U, memory size only 1.11U bits, still  (U) space

14 14 Pipelined Counting-Priority-Index Reduced amortized time complexity to constant time At the expense of  (log w U) pipeline stages Memory size also only 1.11U bits,  (U) space Data Structure of Pipelined CPI with h = 3

15 15 Operations Supported Operations supported by all 3 priority indexing structures T EST (i)Test if index i is in set S I NSERT (i)Insert a new index i to set S D ELETE (i)Delete index i from set S F IND M IN Find the smallest index in set S F IND M AX Find the largest index in set S E XTRACT M IN Delete the smallest index in set S E XTRACT M AX Delete the largest index in set S S UCCESSOR (i) Find the successor of index i in set S P REDECESSOR (i) Find the predecessor of index i in set S E XTRACT S UCC (i) Delete the successor of index i in set S E XTRACT P RED (i) Delete the predecessor of index i in set S

16 16 Comparison TimeMemoryHardware PI Pipelined CPI CPI 1.016 U 1.11 U  (logwU) constant  (logwU) constant

17 17 Hardware Complexity of Pipelined CPI Number of Pipeline Stages in the Data Structures

18 18 Summary Fast sorting data structures –Fast and scalable succinct data structures for the implementation of priority queues –The Pipelined CPI supports constant time priority management operations –The hardware complexity is only  (log w U) with  (U) memory space

19 19 Thank You


Download ppt "1 Succinct Priority Indexing Structures for the Management of Large Priority Queues Hao Wang and Bill Lin University of California, San Diego IEEE IWQoS."

Similar presentations


Ads by Google