CSE 326: Data Structures Splay Trees

CSE 326: Data Structures Splay Trees
James Fogarty Autumn 2007 Lecture 10

AVL Trees Revisited Balance condition: Left and right subtrees of every node have heights differing by at most 1 Strong enough : Worst case depth is O(log n) Easy to maintain : one single or double rotation Guaranteed O(log n) running time for Find ? Insert ? Delete ? buildTree ? Θ(n log n)

Single and Double Rotations
Z h X Y h h a b Z c h W h h -1 X h-1 Y

AVL Trees Revisited What extra info did we maintain in each node?
Where were rotations performed? How did we locate this node?

Other Possibilities? Could use different balance conditions, different ways to maintain balance, different guarantees on running time, … Why aren’t AVL trees perfect? Many other balanced BST data structures Red-Black trees AA trees Splay Trees 2-3 Trees B-Trees … Extra info, complex logic to detect imbalance, recursive bottom-up implementation

Leftish heap : Skew heap
Splay Trees Blind adjusting version of AVL trees Why worry about balances? Just rotate anyway! Amortized time per operations is O(log n) Worst case time per operation is O(n) But guaranteed to happen rarely Insert/Find always rotate node to the root! SAT/GRE Analogy question: AVL is to Splay trees as ___________ is to __________ Leftish heap : Skew heap

Recall: Amortized Complexity
If a sequence of M operations takes O(M f(n)) time, we say the amortized runtime is O(f(n)). Worst case time per operation can still be large, say O(n) Worst case time for any sequence of M operations is O(M f(n)) Average time per operation for any sequence is O(f(n)) Amortized complexity is worst-case guarantee over sequences of operations.

Recall: Amortized Complexity
Is amortized guarantee any weaker than worstcase? Is amortized guarantee any stronger than averagecase? Is average case guarantee good enough in practice? Is amortized guarantee good enough in practice? Yes, it is only for sequences Yes, guarantees no bad sequences No, adversarial input, bad day, … Yes, again, no bad sequences

The Splay Tree Idea 10 If you’re forced to make 17
a really deep access: 17 Since you’re down there anyway, fix up a lot of deep nodes! 5 2 9 3

Find/Insert in Splay Trees
Find or insert a node k Splay k to the root using: zig-zag, zig-zig, or plain old zig rotation Why could this be good?? Helps the new root, k Great if k is accessed again And helps many others! Great if many others on the path are accessed

Splaying node k to the root: Need to be careful!
One option (that we won’t use) is to repeatedly use AVL single rotation until k becomes the root: (see Section for details) p k q F r E s p s D A B q F A k r E B C C D

Splaying node k to the root: Need to be careful!
What’s bad about this process? r is pushed almost as low as k was Bad seq: find(k), find(r), find(…), … r D q E p F C s A B k s A k B C r D q E p F

k and its original children
Splay: Zig-Zag* g k X p g p This is just an AVL double rotation. This helps all the nodes in blue and hurts the ones in red. k W X Y Z W Y Z Helps those in blue Hurts those in red *Just like an… Which nodes improve depth? AVL double rotation k and its original children

Not quite – we rotate g and p, then p and k
X p Y k Z Splay: Zig-Zig* k Z Y p X g W (yes original 1985 paper used this terminology!) This rotation is not the same as the AVL zig-zig case! Can anyone tell me how to implement this with two rotations? There are two possibilities: Start with rotate k or rotate p? Rotate p! Rotate k makes p k’s left child and then we’re hosed. We want k to go up, not p. Then, rotate k. This helps all the nodes in blue and hurts the ones in red. So, in some sense, it helps and hurts the same number of nodes on one rotation. Question: what if we keep rotating? What happens to this whole subtree? It gets helped! *Is this just two AVL single rotations in a row? Not quite – we rotate g and p, then p and k Why does this help? Same number of nodes helped as hurt. But later rotations help the whole subtree.

Special Case for Root: Zig
k k Z X p X Y Y Z In the case where we’re just one step away from the root, we single rotate. This is the only time we single rotate Now, each of these helped and hurt an equal number of nodes at each step, but other than this final zig, our node always helped its children, right? Therefore, every single node except p, Y, and Z in this picture improves (some quite a lot). How bad are things for p, Y, and Z? each might be down one level from where it used to be. How good are things for n and many of its descendants and former ancestors? Very good. Why not zig all the way (no zig-zig)? This just helps _one_ child of n! Relative depth of p, Y, Z? Relative depth of everyone else? Down 1 level Much better Why not drop zig-zig and just zig all the way? Zig only helps one child!

Splaying Example: Find(6)
1 1 2 2 ? 3 3 Find(6) Zig-zig 4 6 5 4 Think of as if created by inserting 6,5,4,3,2,1 – each took constant time – a LOT of savings so far to amortize those bad accesses over 5 6

Still Splaying 6 1 1 2 6 ? 3 3 Zig-zig 6 5 4 2 5 4

Finally… 1 6 1 6 ? 3 3 Zig 2 5 2 5 4 4

Another Splay: Find(4) 6 1 6 1 ? 3 4 Find(4) Zig-zag 2 5 3 5 4 2

Example Splayed Out 6 1 4 1 6 ? 3 5 4 Zig-zag 2 3 5 2

But Wait… What happened here? Didn’t two find operations take linear time instead of logarithmic? What about the amortized O(log n) guarantee? That still holds, though we must take into account the previous steps used to create this tree. In fact, a splay tree, by construction, will never look like the example we started with!

Why Splaying Helps If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay Overall, nodes which are low on the access path tend to move closer to the root Splaying gets amortized O(log n) performance. (Maybe not now, but soon, and for the rest of the operations.) Alright, remember what we did on Monday. We learned how to splay a node to the root of a search tree. We decided it would help because we’d do a lot of fixing up if we had an expensive access. That means we have to fix up the tree on every expensive access.

Practical Benefit of Splaying
No heights to maintain, no imbalance to check for Less storage per node, easier to code Data accessed once, is often soon accessed again Splaying does implicit caching by bringing it to the root

Splay Operations: Find
Find the node in normal BST manner Splay the node to the root if node not found, splay what would have been its parent What if we didn’t splay? So, on find we just splay the sought-after node to the root. Amortized guarantee fails! Bad sequence: find(leaf k), find(k), find(k), …

Splay Operations: Insert
Insert the node in normal BST manner Splay the node to the root What if we didn’t splay? What about insert? Ideas? Can we just do BST insert? NO. Because then we could do an expensive operation without fixing up the tree. Amortized guarantee fails! Bad sequence: insert(k), find(k), find(k), …

Splay Operations: Remove
Everything else splayed, so we’d better do that for remove k find(k) delete k L R L R All those other operations splayed, so let’s do the same thing with delete. Voila! Proof by analogy. We know x is in the tree. Find it and bring it to the root. Remove it. Now, we have two split subtrees. How do we put them back together? < k > k Now what?

Similar to BST delete – find max = find element with no right child
Join Join(L, R): given two trees such that (stuff in L) < (stuff in R), merge them: Splay on the maximum element in L, then attach R R L splay max L R More proof by analogy. Remember BST delete? After the splay, this is pretty much a BST remove The join operation puts two subtrees together as long as one has smaller keys to begin with. First, splay the max element of L to the root. Now, that’s gauranteed to have no right child, right? (Why do they think this is true?) Just snap R onto that NULL right side of the max. Question for the students: can we do splay on the min element in R? In BST, it was important to alternate whether we used the successor/predecessor, since we could get unbalanced trees. Is this important in a splay tree? Similar to BST delete – find max = find element with no right child Does this work to join any two trees? No, need L < R

Delete Example Delete(4) 6 4 6 1 9 find(4) 1 6 1 9 9 4 7 2 2 7
Find max 7 2 2 2 1 6 1 6 9 9 7 7

Splay Tree Summary All operations are in amortized O(log n) time
Splaying can be done top-down; this may be better because: only one pass no recursion or parent pointers necessary we didn’t cover top-down in class Splay trees are very effective search trees Relatively simple No extra fields required Excellent locality properties: frequently accessed keys are cheap to find Like what? Skew heaps! (don’t need to wait) On the topic of locality: what happens to nodes that never get accessed? They rarely get splayed and tend to drop to the bottom of the tree (leaving the top for more valuable stuff) What happens to nodes that are often accessed? They linger near the top of the tree where they’re cheap to access! What happens to node that never get accessed? (tend to drop to the bottom)

E Splay E H B A D F I C G I H A B G F C D E

Splay E E A I B H A G C I D F B H C G D F E

CSE 326: Data Structures Splay Trees

Similar presentations

Presentation on theme: "CSE 326: Data Structures Splay Trees"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 326: Data Structures Splay Trees

Similar presentations

Presentation on theme: "CSE 326: Data Structures Splay Trees"— Presentation transcript:

Similar presentations

About project

Feedback