B/B+ Trees 4.7.

Slides:



Advertisements
Similar presentations
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Advertisements

Chapter 4: Trees Part II - AVL Tree
AA Trees another alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A balanced.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
Advanced Tree Data Structures Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Trees and Red-Black Trees Gordon College Prof. Brinton.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
B-Trees and Red Black Trees. Binary Trees B Trees spread data all over – Fine for memory – Bad on disks.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Starting at Binary Trees
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW B Trees and B+ Trees COMP 261.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
1 Binary Search Trees  Average case and worst case Big O for –insertion –deletion –access  Balance is important. Unbalanced trees give worse than log.
COMP261 Lecture 23 B Trees.
AA Trees.
File Organization and Processing Week 3
B-Trees B-Trees.
Data Structures and Algorithms
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
CS202 - Fundamental Structures of Computer Science II
B+-Trees.
CS202 - Fundamental Structures of Computer Science II
B+-Trees.
B+-Trees.
Week 11 - Friday CS221.
Hashing Exercises.
Tonga Institute of Higher Education
(edited by Nadia Al-Ghreimil)
AVL Trees 11/10/2018 AVL Trees v z AVL Trees.
Data Structures Balanced Trees CSCI
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Haim Kaplan and Uri Zwick November 2014
Wednesday, April 18, 2018 Announcements… For Today…
CS202 - Fundamental Structures of Computer Science II
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
CS202 - Fundamental Structures of Computer Science II
B- Trees D. Frey with apologies to Tom Anastasio
Heaps and the Heapsort Heaps and priority queues
B-Tree.
B+-Trees (Part 1).
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
Algorithms and Data Structures Lecture VIII
2-3-4 Trees Red-Black Trees
CSIT 402 Data Structures II With thanks to TK Prasad
(edited by Nadia Al-Ghreimil)
AVL Trees 2/23/2019 AVL Trees v z AVL Trees.
CS202 - Fundamental Structures of Computer Science II
CSE 373: Data Structures and Algorithms
2-3 Trees Extended tree. Tree in which all empty subtrees are replaced by new nodes that are called external nodes. Original nodes are called internal.
CSE 373: Data Structures and Algorithms
Richard Anderson Spring 2016
B-Trees.
B-Trees.
CS202 - Fundamental Structures of Computer Science II
CS202 - Fundamental Structures of Computer Science II
Presentation transcript:

B/B+ Trees 4.7

AVL limitations Must be in memory If I don’t have enough memory, I have to start doing disk accesses The OS takes care of this Disk is really slow compared to memory

Disk accesses What is the worst case for the number of disk accesses for: Insert Find Delete

AVL Limitations AVL produce balanced trees, this helps keep them short and bushy The more bushy our tree, the better Reduces the height, so reduces number of accesses Can we make our trees shorter and bushier?

AVL Limitations AVL produce balanced trees, this helps keep them short and bushy The more bushy our tree, the better Reduces the height, so reduces number of accesses Can we make our trees shorter and bushier? Yup, just add more children

M-ary trees Binary trees have 1 key and 2 kids M-ary trees have M children Requires M-1 keys What is the height? Log base m of n

AVL limitations If I add more children, then my rotations become very complex. Just by adding a center child, I double the rotations CC CL CR RC LC

More Children How can I maintain balance with more children?

More Children Require every node is at least half full Otherwise degenerates to a binary tree or linked list How can I maintain balance with more children? If every leaf is at the same level, is it balanced? Of course

B/B+ Tree M-ary tree with M>3 Every node at least half full Except the root Data is stored in leaves Internal nodes store M-1 keys, which guide searching All leaves are at the same depth

B+ tree Internal node values No standard, many ways to do it We will use the smallest value in the leaf to the right

B+ Tree Search Similar to binary trees If search is smaller than key, go left If search is larger than the key, look at the next key If there is not a next key, go right

Searching Find 41, 99, 66, 52

B+ Tree Insert Similar to binary trees on the way down Find where to insert When hit a leaf, insert at the leaf Have to check for overflowing nodes on our way back out Also have to update parent keys

Insert Insert O

Insert Continued Insert O After the insert

Insert Insert W

Insert Continued Insert W After the insert

Insert Continued Insert W Update the parent key with the smallest value in the right

Insert Continued Insert X

Insert Continued Insert X After the insert

Overflowing nodes We split into two nodes giving half the values to each Choose a key for them Smallest in the right Insert the key into the parent

Insert Continued Insert X Split the node, choose a key

Insert Continued Insert X Put the key in the parent

Insert Insert T

Insert Continued Insert T After the insert

Insert Continued Insert T Split the node, choose a key

Insert Continued Insert T Put the key in the parent

Overflowing internal nodes We split into two nodes giving have the values to each Choose a key for them Smallest in the right Remove this from the right node Insert the key into the parent

Insert Continued Insert T Split the parent, choose a key

Insert Continued Insert T Put the key in the parent

Overfilled nodes If the root gets overfilled, we split it and add a new node on top as the new root

Delete Find it and delete it Update the keys Now we might have under-filled leaves We may have to merge them If the node we merge with is full, we will just have to split them More practical to steal some values from a sibling If the right-1 is more than half full, take the smallest value If the left-1 is more than half full, take the largest one If neither have enough, merge with the right

Delete Example Delete N

Delete Continued Delete N After the delete, key updated

Delete Delete K

Delete Continued Delete K After the delete, no key update needed

Delete Delete A

Delete Continued Delete A The node is not full enough

Delete Continued Delete A After stealing the D and updating the key

Delete Delete X

Delete Continued Delete X The node is not full enough

Delete Continued Delete X Since there were not enough values to fill two nodes, we merged them We also updated the key

Under-filled internal nodes Do the same as a leaf, steal from a sibling or merge Difference is we drag along the children

Delete Continued Delete X After merging the internal nodes

Delete Delete U

Delete Continued Delete U After removing U and updating the parent key

Delete Continued Delete U After stealing the W and updating the parent key

Empty Root If the root ever becomes empty, remove it and make the child the root

Other Trees Trees are used a lot in Computer Science, there are many more Expression Minimum Spanning Game Red-Black Huffman

Red Black-Tree Popular Alternative to AVL Operations are O(logn)

Red-Black Tree 1) Nodes are colored red or black 2) Root is black 3) Red nodes have black children 4) Every path from a node down to a NULL must have the same number of black nodes in it

Red-Black Trees These rules make our max depth 2* log(n+1) Means operations are logarithmic When we insert/delete, we have to update the tree Must follow the rules Recoloring Rotations

Advantages AVL had to store or re-compute the height When storing, height is an int, which takes 32 bits In a red black, we only need to know the color Since there are only two color options, we only need 1 bit So, we get the balance advantages of the AVL, and reduce overhead Makes it really good for embedded systems

Inner workings Complex and Complicated We don’t usually talk about them If you want to use one, you can find implementations Depending on your libraries, you may already have one

Huffman Tree Built when doing Huffman coding Huffman coding is used to compress data

Huffman trees When we are sending a stream of data, we want to compress it as much as possible Consider text data Letter usage is not even a, e, i, o, u, s, r, t are used much more often than q, z, x, v We want the letters we use a lot to be fewer bits

Huffman Trees Find the unique letters in the message hello world h e l _ w r d

Huffman Trees Get the counts of each letter in the message hello world _ 1 w 1 r 1 d 1

Huffman Trees Combine the two smallest into an internal node Use their sum as the value Repeat until you have everything in one tree

Huffman Trees Assign left edges as 0’s, right edges are 1’s

Huffman Trees Now we follow the tree down and have our codes

Huffman Trees

Huffman Trees Now use the codes to translate the message hello world

Huffman Trees Now use the codes to translate the message hello world 0000 0001 01 01 001 100 101 001 110 01 111 (no spaces, just for your visual) This took 32 bits To represent 11 items, need 4 bits each, so 44 bits Char takes 8 bits, so 88 bits

Huffman Tree Decoding First pass through the letters and frequencies and build the tree on the other side Use this to decode

Huffman Trees 0000 0001 01 01 001 100 101 001 110 01 111

Huffman Trees Gives us a good compression Transmits the fewest bits possible Since we have to pass the letters and frequencies in the beginning, we see no advantages for short messages Don’t get advantages when distribution is even

Huffman Tree Notice that because values are at the leaves, no code is the beginning of another (or the prefix of another) This is a good thing, it means translation is not ambiguous