Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.

Slides:



Advertisements
Similar presentations
COL 106 Shweta Agrawal, Amit Kumar
Advertisements

COL 106 Shweta Agrawal and Amit Kumar
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
COP 3502: Computer Science I (Note Set #21) Page 1 © Mark Llewellyn COP 3502: Computer Science I Spring 2004 – Note Set 21 – Balancing Binary Trees School.
Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
Binary Search Trees Azhar Maqsood School of Electrical Engineering and Computer Sciences (SEECS-NUST)
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
1 Heaps & Priority Queues (Walls & Mirrors - Remainder of Chapter 11)
A balanced life is a prefect life.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
Binary Search Trees Briana B. Morrison Adapted from Alan Eugenio.
Data Structures Topic #9. Today’s Agenda Continue Discussing Trees Examine the algorithm to insert Examine the algorithm to remove Begin discussing efficiency.
BST Data Structure A BST node contains: A BST contains
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Trees and Red-Black Trees Gordon College Prof. Brinton.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Hashing General idea: Get a large array
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
1 Joe Meehean.  Important and common problem  Given a collection, determine whether value v is a member  Common variation given a collection of unique.
Binary Trees Chapter 6.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
Dictionaries CS 105. L11: Dictionaries Slide 2 Definition The Dictionary Data Structure structure that facilitates searching objects are stored with search.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Trees and Graphs CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
Chapter 21 Binary Heap.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
© 2004 Goodrich, Tamassia Binary Search Trees1 CSC 212 Lecture 18: Binary and AVL Trees.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Dictionaries CS /02/05 L7: Dictionaries Slide 2 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved.
© 2004 Goodrich, Tamassia Trees
CS 61B Data Structures and Programming Methodology Aug 7, 2008 David Sun.
Chapter 7 Trees_Part3 1 SEARCH TREE. Search Trees 2  Two standard search trees:  Binary Search Trees (non-balanced) All items in left sub-tree are less.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
Internal and External Sorting External Searching
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
Binary Search Trees1 Chapter 3, Sections 1 and 2: Binary Search Trees AVL Trees   
CS 367 Introduction to Data Structures Lecture 8.
CSE 2331/5331 Topic 8: Binary Search Tree Data structure Operations.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Dictionaries CS 110: Data Structures and Algorithms First Semester,
2 Binary Heaps What if we’re mostly concerned with finding the most relevant data?  A binary heap is a binary tree (2 or fewer subtrees for each node)
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
COMP261 Lecture 23 B Trees.
AA Trees.
Multiway Search Trees Data may not fit into main memory
Hashing Alexandra Stefan.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Chapter 10.1 Binary Search Trees
Wednesday, April 18, 2018 Announcements… For Today…
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Multi-Way Search Trees
Ch. 8 Priority Queues And Heaps
Search Trees CSE 2320 – Algorithms and Data Structures
Mark Redekopp David Kempe
1 Lecture 13 CS2013.
Presentation transcript:

Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1

Symbol Tables - Dictionaries A symbol table is a data structure that allows us to maintain and use an organized set of items. Main operations: – Insert new item. – Search and return an item with a given key. – Delete an item. – Modify an item. – Sort all items. – Find k-th smallest item. 2

Symbol Tables - Dictionaries Similarities and differences compared to priority queues: 3

Symbol Tables - Dictionaries Similarities and differences compared to priority queues: In priority queues we care about: – Insertions. – Finding/deleting the max item efficiently. In symbol tables we care about: – Insertions. – Finding/deleting any item efficiently. 4

Keys and Items Question: what is the difference between a "key" and an "item"? 5

Keys and Items Question: what is the difference between a "key" and an "item"? – An item contains a key, and possibly other pieces of information as well. – The key is just the part of the item that we use for sorting/searching. For example, the item can be a student record and the key can be a student ID. 6

Multiple Keys In actual applications, we oftentimes want to search or sort by different criteria. For example, we may want to search a customer database by ??? 7

Multiple Keys In actual applications, we oftentimes want to search or sort by different criteria. For example, we may want to search a customer database by: – Customer ID. – Last name. – First name. – Phone. – Address. – Shopping history. – … 8

Multiple Keys Accommodating the ability to search by different criteria (i.e., allow multiple keys) is a standard topic in a databases course. However, the general idea is fairly simple: Define a primary key, that is unique. – That is why we all have things such as: – Social Security number. – UTA ID number. – Customer ID number, and so on. Build a main symbol table based on the primary key. For any other field (like address, phone, last name) that we may want to search by, build a separate symbol table, that simply maps values in this field to primary keys. – The "key" for each such separate symbol table is NOT the primary key. 9

Generality of Search by Key In this course we will only discuss searching by a single key. – This is the problem that is relevant for an algorithms course. The previous slides hopefully have convinced you that if you can search by a single key, you can easily accommodate multiple keys as well. – That topic is covered in standard database courses. 10

Overview We will now see some standard ways to implement symbol tables. Some straightforward ways use arrays and lists. – Simple implementations, problematic performance or severe limitations. The most commonly used methods use trees. – Relatively simple implementations (but more complicated than array/list-based implementations). – Good performance. 11

Key-Indexed Search Suppose that: – The keys are distinct positive integers, that are sufficiently small. What exactly we mean by "sufficiently small" will be clarified in a bit. "Distinct" means that no two items share the same key. – We store our items in an array (so, we have an array-based implementation). How would you implement symbol tables in that case? – How would you support insertions, deletions, search? 12

Key-Indexed Search Keys are indices into an array. Initialization??? Insertions??? Deletions??? Search??? 13

Key-Indexed Search Keys are indices into an array. Initialization: set all array entries to null, O(N) time. Insertions, deletions, search: Constant time. Limitations: – Keys must be unique. This can be OK for primary keys, but not for keys such as last names, that are not expected to be unique. – Keys must be small enough so that the array fits in memory. In summary: optimal performance, severe limitations. 14

Unordered Array Implementation Note: this is different than the key-indexed implementation we just talked about. Key idea: just throw items into an array. Implementation and time complexity: – Initialization??? – Insert? – Delete? – Search? 15

Unordered Array Implementation Note: this is different than the key-indexed implementation we just talked about. Key idea: just throw items into an array. Initialization: initialize all entries to null. – Linear time. Insert: place the new item at the end. – Constant time. Delete: remove the item, move all subsequent items to fill in the gap. – Linear time. This is a problem. Search: scan the array, until you find the key you are looking for. – Linear time. This is a problem. 16

Variations Unordered list implementation: – Linear time for deletion and search. Ordered array implementation. – Linear time for insertion and deletion. – Logarithmic time for search: binary search Ordered list implementation. – Linear time for insertion, deletion, search. Filling in the details on these variations is left as an exercise. However, each of these versions requires linear time for at least one of insertion, deletion, search. – We want methods that take at most logarithmic time for insertions, deletions, and searches. 17

Search Trees Preliminary note: "search trees" as a term does NOT refer to a specific implementation of symbol tables. – This is a very common mistake. The term refers to a family of implementations, that may have different properties. We will see soon a specific implementation with good properties, namely trees. 18

Search Trees What all search trees have in common is the implementation of search. Insertions and deletions can differ, and have important implications on overall performance. The main goal is to have insertions and deletions that: – Are efficient (at most logarithmic time). – Leave the tree balanced, to support efficient search (at most logarithmic time). 19

Binary Search Trees Definition: a binary search tree is a binary tree where: Each internal node contains an item. – External nodes (leaves) do not contain items. The item at each node is: – Greater than or equal to all items on the left subtree. – Less than all items in the right subtree. 20

Binary Search Trees Parenthesis: is this a binary tree? According to the definition in the book (that we use in this course), no, because one node has only one child. However, a binary tree can only have an odd number of nodes. What are we supposed to do if the number of items is even? – Hint: look back to the previous definition

Binary Search Trees Parenthesis: is this a binary tree? According to the definition in the book (that we use in this course), no, because one node has only one child. However, a binary tree can only have an odd number of nodes. What are we supposed to do if the number of items is even? We make the convention that items are only stored at internal nodes. Leaves exist, but they do not contain items. To simplify, we will not be showing leaves

Binary Search Trees So, is this a binary tree?

Binary Search Trees So, is this a binary tree? We will make the convention that yes, this is a binary tree whose leaves contain no items and are not shown

Binary Search Trees Definition: a binary search tree is a binary tree where the item at each node is: – Greater than or equal to all items on the left subtree. – Less than all items in the right subtree. How do we implement search? 25

Binary Search Trees Definition: a binary search tree is a binary tree where the item at each node is: – Greater than or equal to all items on the left subtree. – Less than all items in the right subtree. search(tree, key) – if (tree == null) return null – else if (key == tree.item.key) return tree.item – else if (key < tree.item.key) return search(tree.left_child, key) – else return search(tree.right_child, key) 26

Performance of Search Note: so far we have said nothing about how to implement insertions and deletions. Given that, what can we say about the worst-case time complexity of search? 27

Performance of Search Note: so far we have said nothing about how to implement insertions and deletions. Given that, what can we say about the worst-case time complexity of search? A binary tree can be perfectly balanced or maximally unbalanced

Performance of Search Note: so far we have said nothing about how to implement insertions and deletions. Given that, what can we say about the worst-case time complexity of search? Search takes time that is in the worst case linear to the number of items. – This is not very good. Search takes time that is linear to the height of the tree. For balanced trees, search takes time logarithmic to the number of items. – This is good. So, the challenge is to make sure that insertions and deletions leave the tree balanced. 29

Naïve Insertion To insert an item, the simplest approach is to go down the tree until finding a leaf position where it is appropriate to insert the item. Pseudocode ??? 30

Naïve Insertion To insert an item, the simplest approach is to go down the tree until finding a leaf position where it is appropriate to insert the item. insert(tree, item) – if (tree == null) return new tree(item.key) – else if (item.key < tree.item.key) tree.left_child = insert(tree.left_child, item) – else if (item.key > tree.item.key) tree.right_child = insert(tree.right_child, item) – return tree 31

Naïve Insertion To insert an item, the simplest approach is to go down the tree until finding a leaf position where it is appropriate to insert the item. insert(tree, item) – if (tree == null) return new tree(item.key) – else if (item.key < tree.item.key) tree.left_child = insert(tree.left_child, item) – else if (item.key > tree.item.key) tree.right_child = insert(tree.right_child, item) – return tree 32 Why do we use line tree.left_child = insert(tree.left_child, item) instead of line insert(tree.left_child, item)

Naïve Insertion To insert an item, the simplest approach is to go down the tree until finding a leaf position where it is appropriate to insert the item. insert(tree, item) – if (tree == null) return new tree(item.key) – else if (item.key < tree.item.key) tree.left_child = insert(tree.left_child, item) – else if (item.key > tree.item.key) tree.right_child = insert(tree.right_child, item) – return tree 33 Answer: To handle the base case, where we return a new node, and the parent must make this new node a child.

Naïve Insertion Inserting a 39:

Naïve Insertion Inserting a 39:

Naïve Insertion If items are inserted in random order, the resulting trees are reasonably balanced. If items are inserted in ascending order, the resulting tree is maximally imbalanced. We will next see more sophisticated methods, that guarantee that the resulting tree is balanced regardless of the order of insertions. 36

2-3-4 Trees A tree is a tree that either is empty or contains three types of nodes: 2-nodes, which contain: – An item with key K. – A left subtree with keys <= K. – A right subtree with keys > K. 3-nodes, which contain: – Two items with keys K1 and K2, K1 <= K2. – A left subtree with keys <= K1. – A middle subtree with K1 < keys <= K2. – A right subtree with keys > K nodes, which contain: – Three items with keys K1, K2, K3, K1 <= K2 <= K3. – A left subtree with keys <= K1. – A middle-left subtree with K1 < keys <= K2. – A middle-right subtree with K2 < keys <= K3. – A right subtree with keys > K3. For a search tree to be called balanced, all leaves must be at the same distance from the root. We will only consider balanced trees.

Example of Tree

Search in Trees Search in trees is a generalization of search in binary search trees. For simplicity, we assume that all keys are unique. Given a search key, at each node select one of the subtrees by comparing the search key with the 1, 2, or 3 keys that are present at the node. The time is linear to the height of the tree. Since we assume that trees are balanced, search time is logarithmic to the number of items. Question to tackle next: – how to implement insertions and deletions so as to guarantee that, when we start with a balanced tree, the tree remains balanced after the insertion or deletion. 39

Insertion in Trees We follow the same path as if we are searching for the item. A simple approach would be to just insert the item at the end of that path. However, if we insert the item at a new node at the end, the tree is not balanced any more. We need to make sure that the tree remains balanced, so we follow a more complicated approach. 40

Insertion in Trees Given our key K: we follow the same path as in search. On the way: – If we find a 2-node being parent to a 4-node, we transform the pair into a 3-node connected to two 2-nodes. – If we find a 3-node being parent to a 4-node, we transform the pair into a 4-node connected to two 2-nodes. – If the root becomes a 4-node, split it into three 2-nodes. These transformations: – Are local (they only affect the nodes in question). – Do not affect the overall height or balance of the tree (except for splitting a 4-node at the root). This way, when we get to the bottom of the tree, we know that the node we arrived at is not a 4-node, and thus it has room to insert the new item. 41

Transformation Examples If we find a 2-node being parent to a 4-node, we transform the pair into a 3- node connected to two 2-nodes, by pushing up the middle key of the 4-node. If we find a 3-node being parent to a 4-node, we transform the pair into a 4- node connected to two 2-nodes, by pushing up the middle key of the 4-node

Transformation Examples If we find a 2-node being parent to a 4-node, we transform the pair into a 3- node connected to two 2-nodes, by pushing up the middle key of the 4-node. If we find a 3-node being parent to a 4-node, we transform the pair into a 4- node connected to two 2-nodes, by pushing up the middle key of the 4-node

Insertion Example Inserting an item with key 25:

Insertion Example Inserting an item with key 25:

Insertion Example Inserting an item with key 25:

Insertion Example Inserting an item with key 25:

Insertion Example Found a 2-node being parent to a 4-node, we must transform the pair into a 3-node connected to two 2-nodes

Insertion Example Found a 2-node being parent to a 4-node, we must transform the pair into a 3-node connected to two 2-nodes

Insertion Example Reached the bottom. Make insertion of item with key

Insertion Example Next: insert an item with key =

Insertion Example Next: insert an item with key =

Insertion Example Next: insert an item with key =

Insertion Example Next: insert an item with key =

Insertion Example Next: insert an item with key =

Insertion Example Next: insert an item with key =

Insertion Example Next: insert an item with key =

Insertion Example Next: insert an item with key =

Insertion Example Found a 3-node being parent to a 4-node, we must transform the pair into a 4-node connected to two 2-nodes

Insertion Example Found a 3-node being parent to a 4-node, we must transform the pair into a 4-node connected to two 2-nodes

Insertion Example Reached the bottom. Make insertion of item with key

Insertion Example Reached the bottom. Make insertion of item with key

Insertion Example Insert an item with key =

Insertion Example Insert an item with key =

Insertion Example Found a 3-node being parent to a 4-node, we must transform the pair into a 4-node connected to two 2-nodes

Insertion Example Found a 3-node being parent to a 4-node, we must transform the pair into a 4-node connected to two 2-nodes

Insertion Example Root is 4-node, must split

Insertion Example Root is 4-node, must split

Insertion Example Continue with inserting an item with key =

Insertion Example Continue with inserting an item with key =

Insertion Example Continue with inserting an item with key =

Deletion on Trees More complicated. The book does not cover it. We will not cover it. If you care, you can look it up on Wikipedia. 72