Download presentation

Presentation is loading. Please wait.

Published byJamir Grose Modified over 3 years ago

1
Liron Schiff * (TAU) Joint work with Yehuda Afek, Anat Bremler-Barr (TAU) (IDC) Recursive Design of Hardware Priority Queues Supported by European Research Council (ERC) Starting Grant no. 259085

2
Priority Queue (PQ) Priority Queue Insert GetMin

3
Networking: Scheduling Packets –Many flows (1M) –High rate (100Mpps) More Application: Scientific Simulators, Databases Priority Queue Applications Priority Queue (scheduler) 14 33 9 1324 19 274255 1638 7 2 5

4
Two Existing Approaches Dedicated Hardware Solutions Common Software Solutions Non-ScalableScalable

5
Two Existing Implementation Approaches

6
Merge-Sort concept: Our Approach: The Powering Technique Base Priority Queue (BPQ) Sort Merge

7
The Powering Technique

8
Insert(x) uses Input Input BPQ Exit BPQ 3

9
The Powering Technique Insert(x) uses Input Input BPQ Exit BPQ 0 3

10
The Powering Technique Insert(x) uses Input Input BPQ Exit BPQ 0 35

11
The Powering Technique When Input gets full move to Exit. Input BPQ Exit BPQ 0 3 5

12
The Powering Technique When Input gets full move to Exit. Input BPQ Exit BPQ 0 3 5 4 7 8

13
The Powering Technique When Input gets full move to Exit. Input BPQ Exit BPQ 0 3 5 4 7 8 1 2 6

14
The Powering Technique Get_min() extracts the min of Exit or Input Input BPQ Exit BPQ 0 3 5 4 7 8 1 2 6 9 min

15
The Powering Technique Get_min() extracts the min of Exit or Input Input BPQ Exit BPQ 0 3 5 4 7 8 1 2 6 9 and we update the Exit (if needed). min

16
Difficulties with the Simple idea Applying the construction recursively Exemplifying on TCAM base units Evaluation Outline

17
Two difficulties with the simple idea Input Exit

18
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9

19
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9

20
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9 We continually merge inactive lists during Insert

21
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9 We continually merge inactive lists during Insert 10

22
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9 We continually merge inactive lists during Insert 10 11

23
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 We continually merge inactive lists during Insert 9 10 11

24
Difficulty 2 Moving all items from input to RAM in O(1) time Exit BPQ Input BPQ

25
Difficulty 2 Moving all items from input to RAM in O(1) time –Use two Input BPQs and switch between them Exit BPQ Input BPQ Input BPQs

26
Difficulty 2 Moving all items from input to RAM in O(1) time –Use two Input BPQs and switch between them Exit BPQ Input BPQ

27
Difficulty 2 Moving all items from input to RAM in O(1) time –Use two Input BPQs and switch between them Exit BPQ Input BPQ

28
Difficulty 2 Moving all items from input to RAM in O(1) time –Use two Input BPQs and switch between them Exit BPQ Input BPQ

29
Block Size – Time Tradeoff Merge x

30
Block Size – Time Tradeoff Exit BPQ Input BPQ

31
Block Size – Time Tradeoff Exit BPQ Input BPQ

32
Block Size – Time Tradeoff Exit BPQ Input BPQ Exit BPQ InputBPQ InputBPQ

33
Block Size – Time Tradeoff Exit BPQ Input BPQ Exit BPQ InputBPQ InputBPQ InputBPQ InputBPQ

34
Block Size – Time Tradeoff Exit BPQ Input BPQ Exit BPQ InputBPQ InputBPQ InputBPQ InputBPQ Insert

35
Block Size – Time Tradeoff Observation-1: only two queues per recursion level!

36
Block Size – Time Tradeoff A Systolic Array like design: Exit BPQ InputBPQ InputBPQ in

37
Extensions - Tradeoffs Analysis: –Two queues of size N/x require only two sub-queues of size N/x 2. –In each operation we act on all sub-queues –For any k1 we have a priority queue with size BPQs and with time per lookup and update.

38
Resulting Tradeoffs Parallel op. Time (Latency) #BPQ Ops. (per op.) #Queues * SizeRecursion Levels........................

39
Resulting Tradeoffs Parallel op. Time (Latency) #BQP Ops. (per op.) #Queues * SizeRecursion Levels........................

40
TCAM example

41
RAM: Content Addressable Memory (CAM): TCAMs 01101010 00100111 11011011 01010110 in 0 1 2 m 1 00100111 out 01101010 00100111 11011011 01010110 in 0 1 2 m 1 00100111 out

42
Associative Memory chips: Properties: –Ternary values (0,1 and *) –Already used in routers (IP lookup, classification) –High throughput (300M ops per sec for 1Mb TCAM) –Latency and costs increase dramatically with size Ternary CAMs (TCAMs) 0 * 10 ** 1 * 00100111 11 *** 011 01010110 in 0 1 2 m 0 00100111 out

43
Panigrahy & Sharma (2003) presented TCAM based data-structure for disjoint ranges (PIDR): –2 TCAM entries per range –2 TCAM updates per insertion/deletion –2 queries per point lookup Can be used to implement a sorted list: TCAM based Priority Queue a6a6 a5a5 a4a4 a3a3 a2a2 a1a1 ) [a 6,a 6 -1][a 5,a 5 -1][a 4,a 4 -1][a 3,a 3 -1][a 2,a 2 -1][a 1,a 1 -1] (-, ) [a 6,a 6 -1][a 5,a 5 -1][a 4,a 4 -1][x,x-1][a 3,a 3 -1][a 2,a 2 -1][a 1,a 1 -1] (-,

44
Implied by Panigrahy & Sharma (2003) Pros: –O(1) time per queue operation –O(1) TCAM space per Item Cons: –O(N) TCAM space –TCAM space should be managed TCAM based Priority Queue 0 * 10 ** 1 * 00100111 11 *** 011 01010110 in 0 1 2 m 0 00100111 out

45
Implied by Panigrahy & Sharma (2003) Pros: –O(1) time per queue operation –O(1) TCAM space per Item Cons: –O(N) TCAM space –TCAM space should be managed TCAM based Priority Queue 00100111 0 * 10 ** 1 * 11 *** 011 01010110 in 0 1 2 m 0 00100111 out

46
Implied by Panigrahy & Sharma (2003) Three versions: A.O(1) time but O(w) entries per item (where w is the width of a priority value in bits) B.O(log w) time C.Empirical O(1) time but O(w) on w.c. TCAM based Priority Queue BPQ

47
Space (TCAM bits) Time (TCAM ops.) Latency (TCAM ops.) original Implied by Panigrahy & Sharma (2003) Our results: TCAM based Priority Queue

48
Implied by Panigrahy & Sharma (2003) Our results: TCAM based Priority Queue Space (TCAM bits) Time (TCAM ops.) Latency (TCAM ops.) original

49
Implied by Panigrahy & Sharma (2003) Three versions: TCAM based Priority Queue Space (TCAM bits) Time (TCAM ops.) Latency (TCAM ops.) PS[03].V1 PS[03].V2 PS[03].V3

50
Using small TCAM-based PQs –Faster TCAM access –Feasible even when N is large Suits well backbone routers –TCAMs are already used for IP-lookup Powering the TCAM BPQ

51
Results for TCAM based PQ

53
Results for TCAM-based PQ k=2 k=1 A B C

54
Applying to Shift-Registers Considering a HW PQ implementation of R. Chandra and O. Sinnen. Original K=1 K=2

55
Summary The Powering Technique –Combine Small HW queues and RAM –Allows space – time tradeoffs Powering TCAMs –Smaller TCAMs shorter operation time –Matches lower bound for sorting with TCAM –Also works for Shift Registers

57
Ensuring Prefix Length order in TCAMs How can we save n patterns with no prefix order violation? 1.Split the patterns to w sections of size n –Require w*n size TCAM 0 1 w-1 2

58
Ensuring Prefix Length order in TCAMs How can we save n patterns with no prefix order violation? 1.Split the patterns to w sections of size n –Require w*n size TCAM 2.PIDR2 - Attach length indicator to each pattern –Require log(w) queries to find the longest pattern LengthPattern *****1**10011*** ****1***1001**** *****1**10010*** 10010111query =00001111 0000001100000111

59
Ensuring Prefix Length order in TCAMs How can we save n patterns with no prefix order violation? 1.Split the patterns to w sections of size n –Require w*n size TCAM 2.PIDR2 - Attach length indicator to each pattern –Require log(w) queries to find the longest pattern 3.Maintain Chain Ancestor Order (CAO) – an optimized scheme by Shah&Gupta [2001] –Reported average O(1), but worst case O(w)

60
TCAM based Priority Queue We use two PIDR-lists with size TCAMs each can store segments/items using the naïve memory management. Input BPQ Exit BPQ 0 1 w-1 2 0 1 2

Similar presentations

OK

Fast Updating Algorithms for TCAMs Devavrat Shah Pankaj Gupta IEEE MICRO, Jan.-Feb. 2001.

Fast Updating Algorithms for TCAMs Devavrat Shah Pankaj Gupta IEEE MICRO, Jan.-Feb. 2001.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on remote control robot car Ppt on corporate governance in india Download ppt on human evolution Ppt on social networking sites advantages and disadvantages Ppt on council of ministers of ethiopia Ppt on election in india 2012 Ppt on non farming activities Animal adaptations for kids ppt on batteries Ppt on spiritual leadership in the home Retinal anatomy and physiology ppt on cells