1 Splay trees (Sleator, Tarjan 1983). 2 Motivation Assume you know the frequencies p 1, p 2, …. What is the best static tree ? You can find it in O(nlog(n))

Slides:



Advertisements
Similar presentations
Chapter 13. Red-Black Trees
Advertisements

Splay Trees Binary search trees.
Planar point location -- example
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Greedy Algorithms Amihood Amir Bar-Ilan University.
D. D. Sleator and R. E. Tarjan | AT&T Bell Laboratories Journal of the ACM | Volume 32 | Issue 3 | Pages | 1985 Presented By: James A. Fowler,
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Binary Search Tree AVL Trees and Splay Trees
1 Finger search trees. 2 Goal Keep sorted lists subject to the following operations: find(x,L) insert(x,L) delete(x,L) catenate(L1,L2) : Assumes that.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Self-Adjusting Data Structures. 2 Lists [D.D. Sleator, R.E. Tarjan, Amortized Efficiency of List Update Rules, Proc. 16 th Annual ACM Symposium on Theory.
Unified Access Bound 1 [M. B ă doiu, R. Cole, E.D. Demaine, J. Iacono, A unified access bound on comparison- based dynamic dictionaries, Theoretical Computer.
Splay Trees CSIT 402 Data Structures II. Motivation Problems with other balanced trees – AVL: extra storage/complexity for height fields Periulous delete.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Tirgul 5 AVL trees.
Princeton University COS 423 Theory of Algorithms Spring 2001 Kevin Wayne Amortized Analysis Some of these lecture slides are adapted from CLRS.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Splay trees CS 202 – Fundamental Structures of Computer Science II.
1 Minimize average access time Items have weights: Item i has weight w i Let W =  w i be the total weight of the items Want the search to heavy items.
1 Dynamic trees (Steator and Tarjan 83). 2 Operations that we do on the trees Maketree(v) w = findroot(v) (v,c) = mincost(v) addcost(v,c) link(v,w,r(v,w))
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Greedy Algorithms Huffman Coding
Binary search trees Definition Binary search trees and dynamic set operations Balanced binary search trees –Tree rotations –Red-black trees Move to front.
Splay Trees Splay trees are binary search trees (BSTs) that:
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
New Balanced Search Trees Siddhartha Sen Princeton University Joint work with Bernhard Haeupler and Robert E. Tarjan.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Section 10.1 Introduction to Trees These class notes are based on material from our textbook, Discrete Mathematics and Its Applications, 6 th ed., by Kenneth.
Weight balance trees (Nievergelt & Reingold 73)
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
1 Splay trees (Sleator, Tarjan 1983). 2 Goal Support the same operations as previous search trees.
Search Trees Chapter   . Outline  Binary Search Trees  AVL Trees  Splay Trees.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Oct 26, 2001CSE 373, Autumn A Forest of Trees Binary search trees: simple. –good on average: O(log n) –bad in the worst case: O(n) AVL trees: more.
1 Biased 2-b trees (Bent, Sleator, Tarjan 1980). 2 Goal Keep sorted lists subject to the following operations: find(x,L) insert(x,L) delete(x,L) catenate(L1,L2)
CS 61B Data Structures and Programming Methodology Aug 7, 2008 David Sun.
Four different data structures, each one best in a different setting. Simple Heap Balanced Heap Fibonacci Heap Incremental Heap Our results.
1 Fat heaps (K & Tarjan 96). 2 Goal Want to achieve the performance of Fibonnaci heaps but on the worst case. Why ? Theoretical curiosity and some applications.
Splay Trees and the Interleave Bound Brendan Lucier March 15, 2005 Summary of “Dynamic Optimality -- Almost” by Demaine et. Al., 2004.
Balanced Binary Search Trees
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Self-Adjusting Data Structures
Binary search trees Definition
HUFFMAN CODES.
Self-Adjusting Search trees
Greedy Technique.
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
Chapter 8 – Binary Search Tree
Data Structures Lecture 4 AVL and WAVL Trees Haim Kaplan and Uri Zwick
Chapter 9: Huffman Codes
Splay Trees In balanced tree schemes, explicit rules are followed to ensure balance. In splay trees, there are no such rules. Search, insert, and delete.
SPLAY TREES.
Splay trees (Sleator, Tarjan 1983)
Data Structure and Algorithms
Dynamic trees (Steator and Tarjan 83)
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Proving the Correctness of Huffman’s Algorithm
Analysis of Algorithms CS 477/677
Presentation transcript:

1 Splay trees (Sleator, Tarjan 1983)

2 Motivation Assume you know the frequencies p 1, p 2, …. What is the best static tree ? You can find it in O(nlog(n)) time (homework)

3 Approximation (Mehlhorn)

4 Approximation (Mehlhorn)

18 Analysis The sum of the weights of the pieces that correspond to an internal node is no larger than the length of the corresponding interval An internal node at level i corresponds to an interval of length 1/2 i

19 Analysis

20 Goal Support the same operations as previous search trees.

21 Highlights binary simple good amortized property very elegant interesting open conjectures -- further and deeper understanding of this data structure is still due

22 Main idea Try to arrange so frequently used items are near the root We shall assume that there is an item in every node including internal nodes. We can change this assumption so that items are at the leaves.

23 First attempt Move the accessed item to the root by doing rotations y x B C x A y BC A

24 Move to root (example) e d b a c A D E F CB e d a b c A B E F DC e d b a E F DC c BA

25 Move to root (analysis) There are arbitrary long access sequences such that the time per access is Ω(n) ! Homework ?

26 Splaying Does rotations bottom up on the access path, but rotations are done in pairs in a way that depends on the structure of the path. A splay step: z y x AB C D x y z DC B A ==> (1) zig - zig

27 Splaying (cont) z y x BC A D x z DC ==> (2) zig - zag y BA y x BA C x y CB ==> (3) zig A

28 Splaying (example) i h H g f e d c b a I J G A B C D EF i h H g f e d a b c I J G A B F E DC ==> i h H g f a de b c I J G A B F E DC

29 Splaying (example cont) i h H g f a de b c I J G A B F E DC ==> i h H a f g d e b c I J G A B F E DC f d b c A B E DC a h i IJH g e GF

30 Splaying (analysis) Assume each item i has a positive weight w(i) which is arbitrary but fixed. Define the size s(x) of a node x in the tree as the sum of the weights of the items in its subtree. The rank of x: r(x) = log 2 (s(x)) Measure the splay time by the number of rotations

31 Access lemma The amortized time to splay a node x in a tree with root t is at most 3(r(t) - r(x)) + 1 = O(log(s(t)/s(x))) This has many consequences: Potential used: The sum of the ranks of the nodes.

32 Balance theorem Balance Theorem: Accessing m items in an n node splay tree takes O((m+n) log n) Proof:

33 Balance theorem Balance Theorem: Accessing m items in an n node splay tree takes O((m+n) log n) More consequences after the proof. Proof. Assign weight of 1/n to each item. The total weight is then W=1. To splay at any item takes 3log(n) +1 amortized time the total potential drop is at most n log(n)

34 Static optimality theorem Static optimality theorem: If every item is accessed at least once then the total access time is O(m +  q(i) log (m/q(i)) ) i=1 n For any item i let q(i) be the total number of time i is accessed Optimal average access time up to a constant factor.

35 Static optimality theorem (proof) Static optimality theorem: If every item is accessed at least once then the total access time is O(m +  q(i) log (m/q(i)) ) i=1 n Proof.

36 Static optimality theorem (proof) Static optimality theorem: If every item is accessed at least once then the total access time is O(m +  q(i) log (m/q(i)) ) i=1 n Proof. Assign weight of q(i)/m to item i. Then W=1. Amortized time to splay at i is 3log(m/q(i)) + 1 Maximum potential drop over the sequence is  log(W)- log (q(i)/m) i=1 n

37 Proof of the access lemma proof. Consider a splay step. Let s and s’, r and r’ denote the size and the rank function just before and just after the step, respectively. We show that the amortized time of a zig step is at most 3(r’(x) - r(x)) + 1, and that the amortized time of a zig-zig or a zig-zag step is at most 3(r’(x)-r(x)) The lemma then follows by summing up the cost of all splay steps The amortized time to splay a node x in a tree with root t is at most 3(r(t) - r(x)) + 1 = O(log(s(t)/s(x)))

38 Proof of the access lemma (cont) y x BA C x y CB ==> (3) zig A amortized time(zig) = 1 +  = 1 + r’(x) + r’(y) - r(x) - r(y)  1 + r’(x) - r(x)  1 + 3(r’(x) - r(x))

39 Proof of the access lemma (cont) amortized time(zig-zig) = 2 +  = 2 + r’(x) + r’(y) + r’(z) - r(x) - r(y) - r(z) = 2 + r’(y) + r’(z) - r(x) - r(y)  2 + r’(x) + r’(z) - 2r(x)  2r’(x) - r(x) - r’(z) + r’(x) + r’(z) - 2r(x) = 3(r’(x) - r(x)) z y x AB C D x y z DC B A ==> (1) zig - zig 2  -(log(p) + log(q)) = log(1/p) + log(1/q) = log(s’(x)/s(x)) + log(s’(x)/s(z))= r’(x)-r(x) + r’(x)-r(z)

40 Proof of the access lemma (cont) z y x BC A D x z DC ==> (2) zig - zag y BA Similar. (do at home)

41 Intuition

42 Intuition (Cont)

43 Intuition  = 0

 = log(5) – log(3) + log(1) – log(5) = -log(3)

 = log(7) – log(5) + log(1) – log(7) = -log(5)

 = log(9) – log(7) + log(1) – log(9) = -log(7)

47 Static optimality theorem (proof) Static optimality theorem: If every item is accessed at least once then the total access time is O(m +  q(i) log (m/q(i)) ) i=1 n Proof.

48 Static optimality theorem (proof) Static optimality theorem: If every item is accessed at least once then the total access time is O(m +  q(i) log (m/q(i)) ) i=1 n Proof. Assign weight of q(i)/m to item i. Then W=1. Amortized time to splay at i is 3log(m/q(i)) + 1 Maximum potential drop over the sequence is  log(W)- log (q(i)/m) i=1 n

49 Static finger theorem Static finger theorem: Let f be an arbitrary fixed item, the total access time is O(nlog(n) + m +  log(|i j -f| + 1)) j=1 m Splay trees support access within the vicinity of any fixed finger as good as finger search trees. Suppose all items are numbered from 1 to n in symmetric order. Let the sequence of accessed items be i 1,i 2,....,i m

50 Working set theorem Working set theorem: The total access time is Let t(j), j=1,…,m, denote the # of different items accessed since the last access to item j or since the beginning of the sequence. Proof:

51 Working set theorem Working set theorem: The total access time is Let t(j), j=1,…,m, denote the # of different items accessed since the last access to item j or since the beginning of the sequence. Proof: Assign weights 1, ¼, 1/9, …., 1/k 2 in the order of first access. After an access say to an item of weight k change weights so that the accessed item has weight 1, and an item that had weight 1/d 2 has weight 1/(d+1) 2 for every d < k. Weight changes after a splay only decrease potential. Potential is nonpositive and not smaller than about -nlog(n)

52 Application: Data Compression via Splay Trees Suppose we want to compress text over some alphabet  Prepare a binary tree containing the items of  at its leaves. To encode a symbol x: Traverse the path from the root to x spitting 0 when you go left and 1 when you go right. Splay at the parent of x and use the new tree to encode the next symbol

53 Compression via splay trees (example) efghabcd aabg efgh a b cd

54 Compression via splay trees (example) efghabcd aabg efgh a b cd 0

55 Compression via splay trees (example) aabg efgh a b cd efgh cd ab 10

56 Compression via splay trees (example) aabg efgh a b cd efgh cd ab

57 Decoding Symmetric. The decoder and the encoder must agree on the initial tree.

58 Compression via splay trees (analysis) How compact is this compression ? Suppose m is the # of characters in the original string The length of the string we produce is m + (cost of splays) by the static optimality theorem m + O(m +  q(i) log (m/q(i)) ) = O(m +  q(i) log (m/q(i)) ) Recall that the entropy of the sequence  q(i) log (m/q(i)) is a lower bound.

59 Compression via splay trees (analysis) In particular the Huffman code of the sequence is at least  q(i) log (m/q(i)) But to construct it you need to know the frequencies in advance

60 Compression via splay trees (variations) D. Jones (88) showed that this technique could be competitive with dynamic Huffman coding (Vitter 87) Used a variant of splaying called semi-splaying.

61 Semi - splaying z y x AB C D y z DC ==> Semi-splay zig - zig x AB z y x AB C D x y z DC B A == > Regular zig - zig * * * * Continue splay at y rather than at x.

62 Compression via Semisplaying (Jones 88) Read the codeword from the path. Twist the tree so that the encoded symbol is the leftmost leaf. Semisplay the leftmost leaf (eliminate the need for zig-zag case). While splaying do semi-rotations rather than rotation.

63 Compression via splay trees (example) efghabcd aabg efgh a b cd efgh a b cd

64 Compression via splay trees (example) efghabcd aabg efgh a b cd efgh a b cd 0

Compression via splay trees (example) aabg efgh a b cd efgh b cd a efgh b cd a efgh b cd a

Compression via splay trees (example) aabg efgh b cd a b a efhg dc b a ef h g dc b a ef h g dc b a ef h g dc

67 Update operations on splay trees Catenate(T1,T2): Splay T1 at its largest item, say i. Attach T2 as the right child of the root. T1T2 i T1 T2 i T1T2 ≤ 3log(W/w(i)) + O(1) Amortize time: 3(log(s(T1)/s(i)) s(T1) s(T1) + s(T2) log ()

68 Update operations on splay trees (cont) split(i,T): Assume i  T T i Amortized time = 3log(W/w(i)) + O(1) Splay at i. Return the two trees formed by cutting off the right son of i i T1 T2

69 Update operations on splay trees (cont) split(i,T): What if i  T ? T i- Amortized time = 3log(W/min{w(i-),w(i+)}) + O(1) Splay at the successor or predecessor of i (i- or i+). Return the two trees formed by cutting off the right son of i or the left son of i i- T1 T2

70 Update operations on splay trees (cont) insert(i,T): T1T2 i Perform split(i,T) ==> T1,T2 Return the tree Amortize time: min{w(i-),w(i+)} W-w(i) 3log ( ) + log(W/w(i)) + O(1)

71 Update operations on splay trees (cont) delete(i,T): T1T2 i Splay at i and then return the catenation of the left and right subtrees Amortize time: w(i-) W-w(i) 3log ( ) + O(1) T1T2 + 3log(W/w(i)) +

72 Open problems Self adjusting form of a,b tree ?

73 Open problems Dynamic optimality conjecture: Consider any sequence of successful accesses on an n-node search tree. Let A be any algorithm that carries out each access by traversing the path from the root to the node containing the accessed item, at the cost of one plus the depth of the node containing the item, and that between accesses perform rotations anywhere in the tree, at a cost of one per rotation. Then the total time to perform all these accesses by splaying is no more than O(n) plus a constant times the cost of algorithm A.

74 Open problems Dynamic finger conjecture (now theorem) The total time to perform m successful accesses on an arbitrary n- node splay tree is O(m + n +  (log |i j+1 - i j | + 1)) where the j th access is to item i j j=1 m Very complicated proof showed up in SICOMP recently (Cole et al)

75 Open problems Traversal conjecture: Let T1 and T2 be any two n-node binary search trees containing exactly the same items. Suppose we access the items in T1 one after another using splaying, accessing them in the order they appear in T2 in preorder. Then the total access time is O(n).

76 Tango trees (Demaine, Harmon, Iacono, Patrascu 2004)

77 Lower bound A reference tree

78 y Lower bound A reference tree

79 y Left region of y

80 y Right region of y

81 y IB(σ,y) = #of alternations between accesses to the left region of y and accesses to the right region of y

82 IB(σ) =  y IB(σ,y) y

83 The lower bound (Wilber 89) OPT(σ) ≥ ½ IB(σ) - n

84 Proof

85 Unique transition point x z y x z

86 Transition point exists y x z z1z1 z z2z2 l x1x1 x x2x2 r

87 y x z z1z1 z z2z2 l x1x1 x x2x2 r z

88 y x z z1z1 z z2z2 l x1x1 x x2x2 r z

89 y x z z1z1 z z2z2 l x1x1 x x2x2 r z Transition point does not change if not touched

90 y1y1 x z z1z1 z z2z2 l1l1 x1x1 x x2x2 r1r1 z is a trasition point for only one node y2y2 y 1 and y 2 have different z’s if unrelated

91 y1y1 x z z1z1 z z2z2 l1l1 x1x1 x x2x2 r1r1 z is a trasition point for only one node y2y2 If z is not in y 2 ’s subtree we are ok

92 y1y1 x z z1z1 z z2z2 l1l1 x1x1 x x2x2 r1r1 z is a trasition point for only one node y2y2 Otherwise z is the LCA of everyone in y 2 ’s subtree So it’s the first among l 2 and r 2

93 OPT(σ) ≥ ½ IB(σ) - n (proof) Sum, for every y, how many times the algorithm touched the transition point of y (the deeper of l and r) Let σ y1, σ y2, σ y3, σ y4, ……., σ y p be the interleaving accesses through y We must touch l when we access σ y j for odd j, and r for even j so you touch the transition point unless it switched, but to switch it you also have to access it

94 y x z Tango trees A reference tree Each node has a preferred child

95 y x z Tango trees (cont) a b c d f e y b f cd e x z a A hierarchy of balanced binary trees each corresponds to a blue path

96 y x z a b c d f e y 36 4 b f cd e x z a Each node stores its depth in the blue path and the maximum depth in its subtree 6

97 y x z a b c d f e b f cd e x z a Nodes on the lower part are continuous in key space, can use maximum depth values to find the “interval” containing them 6 Cut 5

98 y x z a b c d f e b f cd e x z a 6 (5,1) Split

99 y x z a b c d f e b f c d e x z a 6 (5,1)

100 y x z a b c d f e b f c d e x z a 3 (5,1) need some differential encoding of the depths

101 y x z a b c d f e b f c d e x z a 3 (5,1) Catenate

102 y x z a b c d f e b f c d e a 3 Catenate x z (5,1)

103 y x z a b c d f e b f c d e a 3 Similarly can do join x z (5,1)

104 y x z The algorithm a b c d f e y b f cd e x z a Search, and then, bottom-up cut and join so that your tango tree corresponds to the blue edges in the reference tree

105 Analysis If k edges change from black to blue: Sum over m accesses If m=Ω(n)