Dynamics of Binary Search Trees under batch insertions and deletions with duplicates ╛ BACKGROUND The complexity of many operations on Binary Search Trees.

Slides:



Advertisements
Similar presentations
COL 106 Shweta Agrawal and Amit Kumar
Advertisements

Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Tree Data Structures &Binary Search Tree 1. Trees Data Structures Tree  Nodes  Each node can have 0 or more children  A node can have at most one parent.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Binary Trees, Binary Search Trees COMP171 Fall 2006.
Data Structures: Trees i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, or Glenn Brookshear.
Binary Heaps CSE 373 Data Structures Lecture 11. 2/5/03Binary Heaps - Lecture 112 Readings Reading ›Sections
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
A Binary Tree root leaf. A Binary Tree root leaf descendent of root parent of leaf.
BST Data Structure A BST node contains: A BST contains
CS2420: Lecture 27 Vladimir Kulyukin Computer Science Department Utah State University.
WFM 5201: Data Management and Statistical Analysis
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
Unit 11a 1 Unit 11: Data Structures & Complexity H We discuss in this unit Graphs and trees Binary search trees Hashing functions Recursive sorting: quicksort,
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
2.3. Measures of Dispersion (Variation):
Fundamentals of Python: From First Programs Through Data Structures
David Luebke 1 7/2/2015 Medians and Order Statistics Structures for Dynamic Sets.
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
Chapter 9 contd. Binary Search Trees Anshuman Razdan Div of Computing Studies
Heaps and heapsort COMP171 Fall 2005 Part 2. Sorting III / Slide 2 Heap: array implementation Is it a good idea to store arbitrary.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Important Problem Types and Fundamental Data Structures
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
CSCE 3110 Data Structures & Algorithm Analysis Binary Search Trees Reading: Chap. 4 (4.3) Weiss.
Chapter 19 - basic definitions - order statistics ( findkth( ) ) - balanced binary search trees - Java implementations Binary Search Trees 1CSCI 3333 Data.
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
Min Chen School of Computer Science and Engineering Seoul National University Data Structure: Chapter 7.
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
Quantifying the dynamics of Binary Search Trees under combined insertions and deletions BACKGROUND The complexity of many operations on Binary Search Trees.
AP Statistics Section 9.3A Sample Means. In section 9.2, we found that the sampling distribution of is approximately Normal with _____ and ___________.
24/3/00SEM107 - © Kamin & ReddyClass 16 - Searching - 1 Class 16 - Searching r Linear search r Binary search r Binary search trees.
AVL Trees Amanuel Lemma CS252 Algoithms Dec. 14, 2000.
AVL Trees Neil Ghani University of Strathclyde. General Trees Recall a tree is * A leaf storing an integer * A node storing a left subtree, an integer.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Chapter 21 Binary Heap.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
1 Binary Trees Informal defn: each node has 0, 1, or 2 children Informal defn: each node has 0, 1, or 2 children Formal defn: a binary tree is a structure.
A Study of Balanced Search Trees: Brainstorming a New Balanced Search Tree Anthony Kim, 2005 Computer Systems Research.
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
Trees  Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search,
Comp 335 File Structures B - Trees. Introduction Simple indexes provided a way to directly access a record in an entry sequenced file thereby decreasing.
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
Lec 15 Oct 18 Binary Search Trees (Chapter 5 of text)
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
1 Joe Meehean.  We wanted a data structure that gave us... the smallest item then the next smallest then the next and so on…  This ADT is called a priority.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Review 1 Queue Operations on Queues A Dequeue Operation An Enqueue Operation Array Implementation Link list Implementation Examples.
David Stotts Computer Science Department UNC Chapel Hill.
AVL Trees 1. 2 Outline Background Define balance Maintaining balance within a tree –AVL trees –Difference of heights –Rotations to maintain balance.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
1/14/20161 BST Operations Data Structures Ananda Gunawardena.
CIS 068 Welcome to CIS 068 ! Lesson 12: Data Structures 3 Trees.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
CS 367 Introduction to Data Structures Lecture 8.
BSTs, AVL Trees and Heaps Ezgi Shenqi Bran. What to know about Trees? Height of a tree Length of the longest path from root to a leaf Height of an empty.
Binary Search Trees A binary search tree is a binary tree
Binary Search Tree (BST)
Binary Search Tree Chapter 10.
B+ Tree.
Binary Trees, Binary Search Trees
Binary Trees, Binary Search Trees
2-3 Trees Extended tree. Tree in which all empty subtrees are replaced by new nodes that are called external nodes. Original nodes are called internal.
B-Trees.
2.3. Measures of Dispersion (Variation):
Binary Trees, Binary Search Trees
Presentation transcript:

Dynamics of Binary Search Trees under batch insertions and deletions with duplicates ╛ BACKGROUND The complexity of many operations on Binary Search Trees (BSTs) is proportional to the height of the tree, so height is a crucial performance parameter. In the worst case, it is possible to obtain “skinny” BSTs, whose height is equal or close to the total number of nodes N. This is no better than using an array as data structure. If only insertions are performed in the BST, it can be shown analytically that the average height is approximately 3·log 2 (N). But if both insertions and deletions are performed (as it happens in most real-life applications), the process is not analytically tractable. Empirical evidence indicates that the average height is still proportional to the log 2 N. Arun Mahendra - Dept. of Math, Physics & Engineering, Tarleton State UniversityMentor: Dr. Mircea Agapie OBJECTIVE We conduct a systematic study of insertions and deletions in BSTs of various sizes, and investigate the statistics of the height of the tree: average, standard deviation, and coefficient of variation. METHODS Each node is assigned the depth property, which shows how many levels down that node is from the root. The root itself has depth zero. The height of the tree is defined as the maximum depth of all its nodes, e.g. for the tree below the height is 3. RESULTS To simulate real-life dynamic operation, we allowed 1/3 of the nodes to be deleted and then re-inserted in each cycle, and performed a total of 10,000 cycles for each tree size. In the deletion process the first occurrence of a duplicate key was deleted. CONCLUSIONS AND FUTURE WORK For Binary Search Trees of sizes N between 100 and nodes, and deletion-insertion cycles as described above, the following behaviors have been observed:  Average max tree height is logarithmic as a function of size.  Maximum and minimum max heights are also logarithmic, with the same slope. In all our experiments, the total range (max – min) was bounded by 8.  The coefficient of variation of the max height distribution is always under 0.14, and decreasing as tree size increases, as expected from statistics (STDDEV of the sampling distribution is STDDEV of population divided by  n).  The empirical law derived from data is H = ·log 2 (N). Future work will investigate:  The impact of “deeper” or more “shallow” cycles.  The impact of larger numbers of cycles per tree, such that the total # of insertions is of the order of N 2.  The impact of using average depth instead of maximum depth (height).  The impact of not allowing duplicate keys.  The theoretical grounding of the empirical formula derived. Assuming that the functional relationship between height and number of nodes is of the form H = a + b·log 2 (N) with unknown coefficients a and b, the linear regression enables to estimate a and b. From our data we find: a = , b = 2.2. The theoretical explanation of these numbers is unknown, and it may be the object of further study, but for now this formula is a purely empirical result. This is a simple Binary Tree, having only two leaves (terminal nodes) under the Root. Nodes with the same parent are called siblings. All nodes store integers, or other keys (e.g. floating point, strings of text etc.). Height of BST subjected to 33% fluctuation cycles  For additional information please contact: Root Leaves Siblings A more complex Binary Tree, having leaves and internal nodes. For each node, the following property holds: all numbers in the left sub-tree are smaller than (or equal to), and all Numbers in the right sub-tree are larger than the number In the node itself. This is the definition of a BST Depth = 0 Depth = 1 Depth = 2 Depth = 3 We used the computer programming language C for implementation, because of its small overhead, simple syntax, and direct access to pointers. For example, the height of a tree is found through the function maxDepth(), shown below: void maxDepth(node *tree){ if (tree){//tree not empty maxDepth(tree->left); heightOfTree = (heightOfTree depth) ?\ tree->depth : heightOfTree; maxDepth(tree->right); } The function modifies the global variable maxDeptTree, which has to be set to zero in the program before maxDept() is called. Due to the expected logarithmic behavior of the height, we chose exponential data points: out trees have 100, 200, 400, 800, 1600, 3200,6400 and nodes. The trees are subjected to cycles of node deletions followed by the same number of node insertions: The initial trees are built by inserting random numbers into an initially empty tree. The numbers to be deleted are chosen at random from among the numbers already in the tree. The numbers to be inserted are generated at random, using the function rand() from the C standard library. Duplicates are permitted. Coefficient of variation of height of BST subjected to 33% fluctuation cycles The coefficient of variation c is a measure of variability, defined as the ratio of standard deviation to average. We present it because of the varying averages of our distributions; in this context standard deviations cannot be compared directly, but coefficients of variation can, since the STDDEV is scaled. Arun Mahendra Computer Science program Tarleton State University Dr. Mircea Agapie Dept. of Math, Physics & Engineering Tarleton State University ╛ An earlier version of this work was presented at the 3rd Annual TAMUS Pathways Student Research Symposium, Kingsville 2005.