Splay trees (Sleator, Tarjan 1983)

Splay trees (Sleator, Tarjan 1983)

Motivation Assume you know the frequencies p1, p2 , ….
What is the best static tree ? You can find it in O(nlog(n)) time (homework)

Approximation (Mehlhorn)
0.2 0.1 .04 0.1 0.2 0.1 0.26 0.26 0.1 0.2

Approximation (Mehlhorn)
0.2 0.1 .04 0.1 0.2 0.1 0.26 0.2 0.1 0.1 0.26

0.2 0.1 .04 0.1 0.2 0.1 0.26 0.2 0.1 0.26

Analysis 0.2 0.1 .04 0.1 0.2 0.1 0.26 0.2 0.1 0.26 An internal node at level i corresponds to an interval of length 1/2i The sum of the weights of the pieces that correspond to an internal node is no larger than the length of the corresponding interval

Analysis 0.2 0.1 .04 0.1 0.2 0.1 0.26 0.2 0.1 0.26

Main idea Try to arrange so frequently used items are near the root
We shall assume that there is an item in every node including internal nodes. We can change this assumption so that items are at the leaves.

First attempt Move the accessed item to the root by doing rotations y
x <===> x C A y B C A B

Move to root (example) e d b a c A D E F C B e d a b c A B E F D C e d

Move to root (analysis)
There are arbitrary long access sequences such that the time per access is Ω(n) ! Homework ?

Splaying Does rotations bottom up on the access path, but rotations are done in pairs in a way that depends on the structure of the path. A splay step: (1) zig - zig z y x A B C D x ==> A y B z C D

Splaying (cont) ==> ==> z x y D y z A B C D x A B C y x x C A y
(2) zig - zag z x ==> y D y z A B C D x A B C (3) zig y x ==> x C A y B C A B

Splaying (example) ==> ==> i i h H g f e d a b c I J G A B F E D

Splaying (example cont)
==> i h H a f g d e b c I J G A B F E D C ==> f d b c A B E D C a h i I J H g e G F h J g I f H A a d e b F G B c E C D

Splaying (analysis) Assume each item i has a positive weight w(i) which is arbitrary but fixed. Define the size s(x) of a node x in the tree as the sum of the weights of the items in its subtree. The rank of x: r(x) = log2(s(x)) Measure the splay time by the number of rotations

Access lemma The amortized time to splay a node x in a tree with root t is at most 3(r(t) - r(x)) + 1 = O(log(s(t)/s(x))) Potential used: The sum of the ranks of the nodes. This has many consequences:

Balance theorem Balance Theorem: Accessing m items in an n node splay tree takes O((m+n) log n) Proof:

Proof of the access lemma
The amortized time to splay a node x in a tree with root t is at most 3(r(t) - r(x)) + 1 = O(log(s(t)/s(x))) proof. Consider a splay step. Let s and s’, r and r’ denote the size and the rank function just before and just after the step, respectively. We show that the amortized time of a zig step is at most 3(r’(x) - r(x)) + 1, and that the amortized time of a zig-zig or a zig-zag step is at most 3(r’(x)-r(x)) The lemma then follows by summing up the cost of all splay steps

Proof of the access lemma (cont)
(3) zig y x ==> x C A y B C A B amortized time(zig) = 1 +  = 1 + r’(x) + r’(y) - r(x) - r(y)  1 + r’(x) - r(x)  1 + 3(r’(x) - r(x))

(1) zig - zig z y x A B C D x ==> A y B z C D amortized time(zig-zig) = 2 +  = 2 + r’(x) + r’(y) + r’(z) - r(x) - r(y) - r(z) = 2 + r’(y) + r’(z) - r(x) - r(y)  2 + r’(x) + r’(z) - 2r(x)  2r’(x) - r(x) - r’(z) + r’(x) + r’(z) - 2r(x) = 3(r’(x) - r(x)) 2  -(log(p) + log(q)) = log(1/p) + log(1/q) = log(s’(x)/s(x)) + log(s’(x)/s(z))= r’(x)-r(x) + r’(x)-r(z)

(2) zig - zag z x ==> y D y z A B C D x A B C Similar. (do at home)

Intuition 9 8 7 6 5 4 3 2 1

Intuition (Cont)

Intuition 9 9 8 8 7 7 6 6 5 5 4 4  = 0 3 1 2 2 3 1

 = log(5) – log(3) + log(1) – log(5) = -log(3)
9 9 8 8 7 7 6 6 5 1 4 4 5  = log(5) – log(3) + log(1) – log(5) = -log(3) 2 1 3 2 3

9 9 8 8 1 7 6 6 7 4 5 1 2 4 3 5  = log(7) – log(5) + log(1) – log(7) = -log(5) 2 3

8 9 8 6 7 1 4 6 5 2 7 4 3 5 2 3  = log(9) – log(7) + log(1) – log(9) = -log(7)

More consequences of the access lemma

Static optimality theorem
For any item i let q(i) be the total number of time i is accessed Static optimality theorem: If every item is accessed at least once then the total access time is O(m +  q(i) log (m/q(i)) ) i=1 n Optimal average access time up to a constant factor.

Static finger theorem Suppose all items are numbered from 1 to n in symmetric order. Let the sequence of accessed items be i1,i2,....,im Static finger theorem: Let f be an arbitrary fixed item, the total access time is O(nlog(n) + m +  log(|ij-f| + 1)) j=1 m Splay trees support access within the vicinity of any fixed finger as good as finger search trees.

Working set theorem Let t(j), j=1,…,m, denote the # of different items accessed since the last access to item j or since the beginning of the sequence. Working set theorem: The total access time is Proof:

Application: Data Compression via Splay Trees
Suppose we want to compress text over some alphabet  Prepare a binary tree containing the items of  at its leaves. To encode a symbol x: Traverse the path from the root to x spitting 0 when you go left and 1 when you go right. Splay at the parent of x and use the new tree to encode the next symbol

Compression via splay trees (example)
f g h a b c d e f g h a b c d aabg... 000

f g h c d a b a b c d e f g h aabg... 0000 10

f g h c d a b a b c d e f g h aabg... 0000 10 1110

Decoding Symmetric. The decoder and the encoder must agree on the initial tree.

Compression via splay trees (analysis)
How compact is this compression ? Suppose m is the # of characters in the original string The length of the string we produce is m + (cost of splays) by the static optimality theorem m + O(m +  q(i) log (m/q(i)) ) = O(m +  q(i) log (m/q(i)) ) Recall that the entropy of the sequence  q(i) log (m/q(i)) is a lower bound.

Compression via splay trees (analysis)
In particular the Huffman code of the sequence is at least  q(i) log (m/q(i)) But to construct it you need to know the frequencies in advance

Compression via splay trees (variations)
D. Jones (88) showed that this technique could be competitive with dynamic Huffman coding (Vitter 87) Used a variant of splaying called semi-splaying.

Semi - splaying ==> * * ==> *
z y x A B C D ==> Regular zig - zig * Semi-splay zig - zig z * y ==> y D x z * C x A B C D A B Continue splay at y rather than at x.

Update operations on splay trees
Catenate(T1,T2): Splay T1 at its largest item, say i. Attach T2 as the right child of the root. i T1 T2 i T1 T2 T1 T2 ≤ 3log(W/w(i)) + O(1) Amortize time: 3(log(s(T1)/s(i)) + 1 + s(T1) s(T1) + s(T2) log( )

Update operations on splay trees (cont)
split(i,T): Assume i  T Splay at i. Return the two trees formed by cutting off the right son of i i i T T2 T1 Amortized time = 3log(W/w(i)) + O(1)

split(i,T): What if i  T ? Splay at the successor or predecessor of i (i- or i+). Return the two trees formed by cutting off the right son of i or the left son of i i- i- T T2 T1 Amortized time = 3log(W/min{w(i-),w(i+)}) + O(1)

insert(i,T): Perform split(i,T) ==> T1,T2 Return the tree i T1 T2 W-w(i) 3log( ) Amortize time: + log(W/w(i)) + O(1) min{w(i-),w(i+)}

delete(i,T): Splay at i and then return the catenation of the left and right subtrees i T1 + T2 T1 T2 W-w(i) 3log( ) Amortize time: 3log(W/w(i)) + + O(1) w(i-)

Open problems Dynamic optimality conjecture:
Consider any sequence of successful accesses on an n-node search tree. Let A be any algorithm that carries out each access by traversing the path from the root to the node containing the accessed item, at the cost of one plus the depth of the node containing the item, and that between accesses perform rotations anywhere in the tree, at a cost of one per rotation. Then the total time to perform all these accesses by splaying is no more than O(n) plus a constant times the cost of algorithm A.

Open problems Dynamic finger conjecture (now theorem)
The total time to perform m successful accesses on an arbitrary n-node splay tree is O(m + n +  (log |ij+1 - ij| + 1)) where the jth access is to item ij m j=1 Very complicated proof showed up in SICOMP recently (Cole et al)

Tango trees (Demaine, Harmon, Iacono, Patrascu 2004)

Lower bound A reference tree

Lower bound A reference tree y

Left region of y y

Right region of y y

IB(σ,y) = #of alternations between accesses to the left region of y and accesses to the right region of y y

IB(σ) = yIB(σ,y) y

The lower bound (Wilber 89)
OPT(σ) ≥ ½ IB(σ) - n

Unique transition point
y x x z z

Transition point exists
y r x z l x x2 x1 z z2 z1

y z r x z x l x1 x2 z z2 z1

y l x z z2 r z z1 z x2 x x1

Transition point does not change if not touched
y l x z z2 r z z1 z x2 x x1

z is a trasition point for only one node
x z z2 r1 z1 y1 and y2 have different z’s if unrelated z x2 x x1

x z2 r1 If z is not in y2’s subtree we are ok z1 z x2 x x1

x z2 r1 Otherwise z is the LCA of everyone in y2’s subtree z1 z x2 x x1 So it’s the first among l2 and r2

OPT(σ) ≥ ½ IB(σ) - n (proof)
Sum, for every y, how many time the algorithm touched the transition point of y (the deeper of l and r) Let σy1, σy2, σy3, σy4, ……., σyp be the interleaving accesses through y We must touch l when we access σyj for odd j, and r for even j so you touch the transition point unless it switched, but to switch it you also have to access it

Tango trees A reference tree Each node has a preferred child y z x

Tango trees (cont) c d e f b a
y a b c d f c d e x z e f b a y z x A hierarchy of balanced binary trees each corresponds to a blue path

y 6 a 4 b c 3 6 d f c d e x z e f b a y z x Each node stores its depth in the blue path and the maximum depth in its subtree

Cut 7 6 5 a 4 2 b c 3 6 5 1 d f c d e x z e f b a y z x Nodes on the lower part are continuous in key space, can use maximum depth values to find the “interval” containing them

Split Split 7 6 (5,1) a 4 2 b c 3 6 5 1 d f c d e x z e f b a y z x

a 2 (5,1) b c 3 1 d 7 f c e b a y 4 6 6 5 z x d e x z f

need some differential encoding of the depths
2 (5,1) a b c 3 1 d 4 f c e b a y 1 3 3 2 z x d e x z f

Catenate (5,1) 2 a b c 3 1 d 4 f c e b a y 1 3 3 2 z x d e f z x

Catenate (5,1) 2 a b c 3 1 d f c e b a y 4 1 3 z x 3 2 d e f z x

Similarly can do join c b a d e f (5,1) 3 2 a b c 3 1 d f e y 4 1 x z

The algorithm y a b c d f c d e x z e f b a y z x Search, and then, bottom-up cut and join so that your tango tree corresponds to the blue edges in the reference tree

Analysis If k edges change from black to blue: Sum over m accesses
If m=Ω(n)

Splay trees (Sleator, Tarjan 1983)

Similar presentations

Presentation on theme: "Splay trees (Sleator, Tarjan 1983)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Splay trees (Sleator, Tarjan 1983)

Similar presentations

Presentation on theme: "Splay trees (Sleator, Tarjan 1983)"— Presentation transcript:

Similar presentations

About project

Feedback