CSE 326: Data Structures Trees

Slides:



Advertisements
Similar presentations
Splay Trees CSIT 402 Data Structures II. Motivation Problems with other balanced trees – AVL: extra storage/complexity for height fields Periulous delete.
Advertisements

CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE 326: Data Structures Splay Trees Ben Lerner Summer 2007.
1 CSE 326: Data Structures Part Four: Trees Henry Kautz Autumn 2002.
CSE 326: Data Structures Lecture #7 Binary Search Trees Alon Halevy Spring Quarter 2001.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees Alon Halevy Spring Quarter 2001.
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
David Kaplan Dept of Computer Science & Engineering Autumn 2001
Splay Trees and B-Trees
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
1 CSE 326: Data Structures Trees. 2 Today: Splay Trees Fast both in worst-case amortized analysis and in practice Are used in the kernel of NT for keep.
CSE373: Data Structures & Algorithms Lecture 8: AVL Trees and Priority Queues Linda Shapiro Spring 2016.
CSE332: Data Abstractions Lecture 7: AVL Trees
COMP261 Lecture 23 B Trees.
CSE 373, Copyright S. Tanimoto, 2002 Binary Search Trees -
TCSS 342, Winter 2006 Lecture Notes
AA Trees.
Week 7 - Friday CS221.
Multiway Search Trees Data may not fit into main memory
Topics covered (since exam 1):
Balancing Binary Search Trees
CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act
CSE 332 Data Abstractions B-Trees
Binary Search Trees.
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Splay Trees.
SPLAY TREE Features Binary Search Tree Self adjusting balanced tree
B+-Trees.
B+-Trees.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
Red Black Trees.
Lecture 25 Splay Tree Chapter 10 of textbook
Binary Search Trees Why this is a useful data structure. Terminology
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Data Structures Lecture 4 AVL and WAVL Trees Haim Kaplan and Uri Zwick
Wednesday, April 18, 2018 Announcements… For Today…
Topics covered (since exam 1):
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Topics covered (since exam 1):
Binary Search Trees One of the tree applications in Chapter 10 is binary search trees. In Chapter 10, binary search trees are used to implement bags.
David Kaplan Dept of Computer Science & Engineering Autumn 2001
CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act
CSE 332: Data Abstractions Binary Search Trees
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
CSE 373, Copyright S. Tanimoto, 2002 Binary Search Trees -
SPLAY TREES.
Tree Rotations and AVL Trees
Lecture 21: B-Trees Monday, Nov. 19, 2001.
CSE 332: Data Abstractions AVL Trees
Topics covered (since exam 1, excluding PQ):
Analysis of Algorithms - Trees
CSE 373: Data Structures and Algorithms
CSE 326: Data Structures Lecture #7 Binary Search Trees
CSE 373 Data Structures and Algorithms
CSE 326: Data Structures Lecture #9 AVL II
CSE 373: Data Structures and Algorithms
Richard Anderson Spring 2016
CSE 326: Data Structures Splay Trees
Topics covered (since exam 1):
B-Trees.
CSE 326: Data Structures Lecture #10 Amazingly Vexing Letters
CSE 326: Data Structures Lecture #10 B-Trees
326 Lecture 9 Henry Kautz Winter Quarter 2002
Presentation transcript:

CSE 326: Data Structures Trees Lecture 8: Friday, Jan 24, 2003

Today: Splay Trees Fast both in worst-case amortized analysis and in practice Are used in the kernel of NT for keep track of process information! Invented by Sleator and Tarjan (1985) Details: Weiss 4.5 (basic splay trees) 11.5 (amortized analysis) 12.1 (better “top down” implementation) We’ll start by introducing AVL trees. Then, I’d like to spend some time talking about double-tailed distributions and means. Next, we’ll gind out what AVL stands for. Finally, you’ll receive a special bonus if we get to it! (Unfortunately, the bonus is AVL tree deletion)

Basic Idea “Blind” rebalancing – no height info kept! Worst-case time per operation is O(n) Worst-case amortized time is O(log n) Insert/find always rotates node to the root! Good locality: Most commonly accessed keys move high in tree – become easier and easier to find

Idea move n to root by series of zig-zag and zig-zig rotations, followed by a final single rotation (zig) if necessary 10 You’re forced to make a really deep access: 17 Since you’re down there anyway, fix up a lot of deep nodes! 5 2 9 3

Zig-Zag* n n Helped Unchanged Hurt g X p g p W X Y Z W Y Z up 2 X p g down 1 p down 1 up 1 n W This is just a double rotation. X Y Z W Y Z *This is just a double rotation

Zig-Zig n n g W p p Z X g Y Y Z W X Can anyone tell me how to implement this with two rotations? There are two possibilities: Start with rotate n or rotate p? Rotate p! Rotate n makes p n’s left child and then we’re hosed. Then, rotate n. This helps all the nodes in blue and hurts the ones in red. So, in some sense, it helps and hurts the same number of nodes on one rotation. Question: what if we keep rotating? What happens to this whole subtree? It gets helped! Y Z W X

Why Splaying Helps Node n and its children are always helped (raised) Except for last step, nodes that are hurt by a zig-zag or zig-zig are later helped by a rotation higher up the tree! Result: shallow nodes may increase depth by one or two helped nodes decrease depth by a large amount If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay Exceptions are the root, the child of the root, and the node splayed Alright, remember what we did on Monday. We learned how to splay a node to the root of a search tree. We decided it would help because we’d go a lot of fixing up if we had an expensive access. That means we have to fix up the tree on every expensive access.

Splaying Example 1 1 2 2 zig-zig 3 3 Find(6) 4 6 5 4 5 6

Still Splaying 6 1 1 2 6 zig-zig 3 3 6 5 4 2 5 4

Almost There, Stay on Target 1 6 1 6 zig 3 3 2 5 2 5 4 4

Splay Again 6 1 6 1 zig-zag 3 4 Find(4) 2 5 3 5 4 2

Example Splayed Out 6 1 4 1 6 zig-zag 3 5 4 2 3 5 2

Locality “Locality” – if an item is accessed, it is likely to be accessed again soon Why? Assume m  n access in a tree of size n Total worst case time is O(m log n) O(log n) per access amortized time Suppose only k distinct items are accessed in the m accesses. Time is O(n log n + m log k ) Compare with O( m log n ) for AVL tree those k items are all at the top of the tree getting those k items near root

Splay Operations: Insert To insert, could do an ordinary BST insert but would not fix up tree A BST insert followed by a find (splay)? Better idea: do the splay before the insert! How? What about insert? Ideas? Can we just do BST insert? NO. Because then we could do an expensive operation without fixing up the tree.

Split Split(T, x) creates two BST’s L and R: All elements of T are in either L or R All elements in L are  x All elements in R are  x L and R share no elements Then how do we do the insert? What about insert? Ideas? Can we just do BST insert? NO. Because then we could do an expensive operation without fixing up the tree.

Split Split(T, x) creates two BST’s L and R: All elements of T are in either L or R All elements in L are  x All elements in R are > x L and R share no elements Then how do we do the insert? Insert as root, with children L and R What about insert? Ideas? Can we just do BST insert? NO. Because then we could do an expensive operation without fixing up the tree.

Splitting in Splay Trees How can we split? We have the splay operation We can find x or the parent of where x would be if we were to insert it as an ordinary BST We can splay x or the parent to the root Then break one of the links from the root to a child How can we implement this? We can splay. We can find x or where x ought to be. We can splay that spot to the root. Now, what do we have? The left subtree is all <= x The right is all >= x

could be x, or what would have been the parent of x Split could be x, or what would have been the parent of x split(x) splay T L R if root is > x if root is  x So, a split just splays x’s spot to the root then hacks off one subtree. This code is _very_ pseudo. You should only use it as a general guideline. OR L R L R  x > x < x > x

Back to Insert x split(x) L R L R  x > x Insert(x): Split on x Now, If we can split on x and produce one subtree smaller and one larger than x, insert is easy! Just split on x. Then, hang the left (smaller) subtree on the left of x. Hang the right (larger) subtree on the right of x. Pretty simple, huh? Are we fixing up deep paths? Insert(x): Split on x Join subtrees using x as root

Insert Example Insert(5) 6 4 4 6 1 9 split(5) 1 6 1 9 9 4 7 2 2 7 7 2 Let’s do some examples. 4 6 1 9 2 7

Splay Operations: Delete x find(x) delete x L R L R < x > x OK, we’ll do something similar for delete. We know x is in the tree. Find it and bring it to the root. Remove it. Now, we have to split subtrees. How do we put them back together? Now what?

Join Join(L, R): given two trees such that L < R, merge them Splay on the maximum element in L then attach R R L splay L R The join operation puts two subtrees together as long as one has smaller keys to begin with. First, splay the max element of L to the root. Now, that’s gauranteed to have no right child, right? Just snap R onto that NULL right side of the max.

Delete Completed x T find(x) delete x L R L R < x > x Join(L,R) So, we just join the two subtrees for delete. T - x

Delete Example Delete(4) 6 4 6 1 9 find(4) 1 6 1 9 9 4 7 2 2 7 Find max 7 2 2 2 1 6 1 6 9 9 7 7

Splay Trees, Summary Splay trees are arguably the most practical kind of self-balancing trees If number of finds is much larger than n, then locality is crucial! Example: word-counting Also supports efficient Split and Join operations – useful for other tasks E.g., range queries

Dictionary & Search ADTs Dictionary ADT (aka map ADT) Stores values associated with user-specified keys keys may be any (homogenous) comparable type values may be any (homogenous) type Search ADT: (aka Set ADT) stores keys only Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored user-IDs associated with descriptions of their coolness level. This is probably the most valuable and widely used ADT we’ll hit. I’ll give you an example in a minute that should firmly entrench this concept.

Dictionary & Search ADTs create :  dictionary insert : dictionary  key  values  dictionary find : dictionary  key  values delete : dictionary  key  dictionary kim chi spicy cabbage Kreplach tasty stuffed dough Kiwi Australian fruit insert(kohlrabi, upscale tuber) Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored user-IDs associated with descriptions of their coolness level. This is probably the most valuable and widely used ADT we’ll hit. find(kreplach) kreplach: tasty stuffed dough

Dictionary Implementations Arrays: Unsorted Sorted Linked lists BST Random AVL Splay

Dictionary Implementations Arrays Lists Binary Search Trees unsorted sorted AVL splay insert O(1) O(n) O(log n) amortized find delete find + O(1)

The last dictionary we discuss: B-Trees Suppose we want to store the data on disk A disk access is a lot more expensive than one CPU operation Example 1,000,000 entries in the dictionary An AVL tree requires log(1,000,000)  20 disk accesses – this is expensive Idea in B Trees: Increase the fan-out, decrease the hight Make 1 node = 1 block

B-Trees Basics All keys are stored at leaves Nonleaf nodes have guidance keys, to help the search Parameter d = the degree book uses the order M = 2d+1) Rules for Keys: The root is either a leaf, or has between 1 and 2d keys All other nodes (except the root) have between d and 2d keys Rule for number of children: Each node (except leaves) has one more children than keys Balance rule: The tree is perfectly balanced !

B-Trees Basics A non-leaf node: A leaf node: Then called a B+ tree 30 120 240 Keys k < 30 30<=k<120 120<=k<240 Keys 240<=k Then called a B+ tree 40 50 60 Next leaf Record with key 40 Record with key 50 Record with key 60

B+Tree Example d = 2 (M = 5) Find the key 40 80 40  80 20 60 100 120 140 20 < 40  60 10 15 18 20 30 40 50 60 65 80 85 90 30 < 40  40 10 15 18 20 30 40 50 60 65 80 85 90

B+Tree Design How large d ? Example: 2d x 4 + (2d+1)  8 <= 4096 Key size = 4 bytes Pointer size = 8 bytes Block size = 4096 byes 2d x 4 + (2d+1)  8 <= 4096 d = 170

B+ Trees Depth Assume d = 170 How deep is the B-tree ? Depth = 0 (just the root)  at least 170 keys Depth = 1  at least 170+170171  30103 keys Depth = 2  170+170171+1701712  5106 keys Depth = 3  170+... +1701713  860 106 keys Depth = 4  170+...+1701714  147 109 keys Nobody has more keys ! With a B tree we can find any data item with at most 5 disk accesses !

Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only parent parent K3 K1 K2 K3 K4 K5 P0 P1 P2 P3 P4 p5 K1 K2 P0 P1 P2 K4 K5 P3 P4 p5

Insertion in a B+ Tree Insert K=19 80 20 60 100 120 140 10 15 18 20 30 50 60 65 80 85 90 10 15 18 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18 19 20 30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree Now insert 25 80 20 60 100 120 140 10 15 18 19 30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Insertion in a B+ Tree But now have to split ! 80 20 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After the split 80 20 30 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Deletion from a B+ Tree Delete 30 80 20 30 60 100 120 140 10 15 18 19 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Deletion from a B+ Tree After deleting 30 May change to 40, or not 80 20 30 60 100 120 140 10 15 18 19 20 25 40 50 60 65 80 85 90 10 15 18 19 20 25 40 50 60 65 80 85 90

Deletion from a B+ Tree Now delete 25 80 20 30 60 100 120 140 10 15 18 19 20 25 40 50 60 65 80 85 90 10 15 18 19 20 25 40 50 60 65 80 85 90

Deletion from a B+ Tree After deleting 25 Need to rebalance Rotate 80 20 30 60 100 120 140 10 15 18 19 20 40 50 60 65 80 85 90 10 15 18 19 20 40 50 60 65 80 85 90

Deletion from a B+ Tree Now delete 40 80 19 30 60 100 120 140 10 15 18 50 60 65 80 85 90 10 15 18 19 20 40 50 60 65 80 85 90

Deletion from a B+ Tree After deleting 40 Rotation not possible Need to merge nodes 80 19 30 60 100 120 140 10 15 18 19 20 50 60 65 80 85 90 10 15 18 19 20 50 60 65 80 85 90

Deletion from a B+ Tree Final tree 80 19 60 100 120 140 10 15 18 19 20 50 60 65 80 85 90 10 15 18 19 20 50 60 65 80 85 90