CSC311: Data Structures Java Review

CSC311: Data Structures Java Review
Fall 2006 CSC311: Data Structures

The Java Virtual Machine (JVM)
Virtual machine -- the logical machine (non-physical machine) that performs machine functions, typically implemented in software on top of a "real" hardware platform and operating system JVM -- the software implementation of a "CPU" designed to run compiled Java code Includes stand-alone Java applications, as well as "applets" that are downloaded and run in Web browsers such as the NetScape Navigator Thanks to the JVM, programs written in Java don't have to be rewritten to run on different computers Need an interpreter to translate JVM code to machine code Fall 2006 CSC311: Data Structures

Class and Object What is a class? What is an object?
A class is corresponding to a concept, e.g. People, Student, Faculty, Staff, Rectangle, Circle, etc In Java, a class is a set of objects with the same behavior What is an object? An object is corresponding an entity, e.g. Jack, Peter In Java, an object is an entity that you can manipulate in your programs (by invoking methods) Each object belongs to a class Object constructions Fall 2006 CSC311: Data Structures

Class Components Components Declaration
Fields – name, type and specifier Methods – signature (prototype) Constructors – object initialization Declaration Variable declaration Object variables w/o initialization Method declaration // method calls Fall 2006 CSC311: Data Structures

Data Types Data types Type conversion
Primitive data types – int, long, short, double, float, boolean, char Classes – String, Object Type conversion Explicit type conversion – casting may cause information lost Implicit type conversion – default no information lost Fall 2006 CSC311: Data Structures

Casting and Autoboxing/Unboxing
Casting: (type) expr Autoboxing Implicit casting Number – Integer, Double, Float, etc Anytime a Number object is expected as a parameter to a method, the corresponding base type can be passed: Number  base type Unboxing Anytime a base type is expected in an expression involving a Number reference, that Number object is changed to the corresponding base type base type  Number Fall 2006 CSC311: Data Structures

Enum Types Declaration Example Constants -- final
modifier enum name {vale_name0, value_name1, …, value_namen-1 Example Public enum Day{Mon, Tue, Wed, Thu, Fri, Sat, Sun}; Constants -- final Fall 2006 CSC311: Data Structures

Statements Assignments – location Decisions Iterations
If-then If-then-else Switch-case Iterations while do-while for Restricted gotos break continue return Boolean expressions and short-circuit evaluation Fall 2006 CSC311: Data Structures

Simple Input and Output
Simple output methods: Built-in static object: System.out An instance of java.io.PrintStream class print(Object) Print(String) print(base_type) Println(String) Simple input methods Built-in special object: System.in Related class: java.util.Scanner Scanner sc = new Scanner(System.in); hasNext() next() hasNextType() nextType() hasNextLine() nextLine() findInLine(String) Fall 2006 CSC311: Data Structures

Designing classes Choosing classes Principles:
A class represents a single concept Concepts from mathematics: Point, Rectangle, Ellipse Concepts from real life: BankAccount, Person, Student Utility classes--no objects, only static methods Math Principles: Responsibilities Independence Behaviors Fall 2006 CSC311: Data Structures

Designing classes Cohesion and Coupling
Cohesive: all methods are closely related to the single concept that the class represents Coupling: A class depends on another if it calls one of its methods High Coupling  many class dependencies Minimize coupling to minimize the impact of interface changes Fall 2006 CSC311: Data Structures

Accessor and Mutator Classes
Accessor: does not change the state of the implicit parameter (e.g. getBalance()) Mutator: changes the state of the implicit parameter (e.g. deposit(double amount ) Rule of thumb: Mutator should return void Immutable class: all methods are accessors (e.g. String) Fall 2006 CSC311: Data Structures

Preconditions/postconditions
The condition that must be met before the method executes Publish preconditions so the caller won't call methods with bad parameters Postcondition The condition that's true after a method has completed If not, the method functioned incorrectly Fall 2006 CSC311: Data Structures

Scope Scope of variable: region of program where the variable can be referred by its name Local variable scope: from definition to end of block Class scope: all methods of the class Overlapping scope: local scope wins over class scope Static scope Static fields Define a field that belongs to a class instead of any object Non-static fields are instance fields, while static fields are class fields Minimize the use of static fields Static methods No implicit parameter Too many static methods are a sign of too little Object-oriented Fall 2006 CSC311: Data Structures

Coding and Pseudo-code
programming in Java IDE Pseudo-code Mixes natural language with standard programming language constructs Constructs: Expressions Method declarations Decision structures Loop structures Array indexing Method calls Method returns comments Fall 2006 CSC311: Data Structures

Object-oriented Design
Goals Robustness Adaptability Reusability Principles Abstraction Encapsulation Modularity Design Patterns Describe a solution to a typical software design problem Some typical design patterns: Position, Adaptor, Iterator, Template method, Composition, Comparator, Amortization, Divide-and-conquer, etc. Fall 2006 CSC311: Data Structures

Inheritance Specialization and generalization (extension)
Specialization – subclass Generalization -- superclass Method overriding Refinement Replacement The keyword this Fall 2006 CSC311: Data Structures

Inheritance Hierarchies
Hierarchies of classes, subclasses, and sub-subclasses are common Similar to concept hierarchies: Person has subclasses like Faculty, Student, Staff; Faculty may have subclasses FullTimeFaculty and PartTimeFaculty; FullTimeFaculty may have TenuredFaculty and TenureTrackFaculty; Student may have subclasses FullTimeStudent and PartTimeStudent; etc. A superclass can have multiple subclasses, but a subclass should have at most ONE superclass Fall 2006 CSC311: Data Structures

Inheritance of Methods
Override method: Supply a different implementation of a method that exists in the superclass Inherit method: Don't supply a new implementation of a method that exists in the superclass Add method: Supply a new method that doesn't exist in the superclass Subclass can not access the private methods of superclass Calling a Superclass Method/constructor Fall 2006 CSC311: Data Structures

Inheritance of Fields Inherit field: All fields from the superclass are automatically inherited Add field: Supply a new field that doesn't exist in the superclass Can't override fields Subclass can not access the private fields of superclass Fall 2006 CSC311: Data Structures

Polymorphism Polymorphism (greek: many shapes): The type of the object determines the method to call Called late binding. Resolved at runtime Different from overloading. Overloading is resolved by the compiler Converting from subclasses to superclass The instanceof method Test whether an object is of a specified class Example: x instanceof String to test if x is a string Fall 2006 CSC311: Data Structures

Access control public – accessible anywhere
private – accessible inside the class protected -- accessible by class, subclasses and package default – no specifier, accessible by package Recommended Access Levels Fields: Always private Exception: public static final constants Methods: public or private Classes: public or package Don't use protected Beware of accidental package access (forgetting public or private) Fall 2006 CSC311: Data Structures

The Object Object The Cosmic Superclass
All classes extend Object by default Most useful methods: String toString(): to convert the object to a string message boolean equals(Object otherObject): to test if the object is equal to the given object otherObject Object clone(): to create a new copy of the object Fall 2006 CSC311: Data Structures

Exceptions Throwing exceptions Catching exceptions
Try-catch-finally block try { … } catch (exception_type1 e1) { … } catch (exception_type2 e2) { … } … finally { … } Fall 2006 CSC311: Data Structures

Interfaces Special classes All methods in an interface are abstract
no implementation All methods in an interface are automatically public An interface doesn't have instance fields An interface doesn’t have constructors Interface implementation Should be implemented by classes Multiple inheritance in Interfaces Fall 2006 CSC311: Data Structures

Abstract Class and Method
Cannot have direct instances (objects) Must be extended by subclass(es) Can implement some methods (main difference from interfaces) Abstract method Must be overridden by subclass(es) Must be inside abstract classes Non-abstract class cannot have abstract method Fall 2006 CSC311: Data Structures

Strong Typing Strong typing Java All variables must be typed
A strong-typing language An object can be viewed as being of various types But declared as being of only one type Fall 2006 CSC311: Data Structures

Generics Casting Generics Widening conversion: to superclass
Narrowing conversion: to subclass Casting exception Casting with interfaces Generics Generic type: a type that is not defined at compilation time, but becomes fully specified at run time Formal type parameters Actual type parameters Fall 2006 CSC311: Data Structures

Generics: Example public class Pair<K, V> { K key; V value;
public void set(K k, V v) { key = k; value = v; } public K getKey() { return key; } public V getValue() { return value; } Public static void main(String[] args) { Pair<String, integer> pair1 = new Pair<String, Integer>(); pair1.set(new String(“height”), new Integer(36)); Pair<String, Double> pair2 = new Pair<String, Double>(); pair2.set(new Student(“A5678”, “Sue”, 19), new Double(9.5)); Fall 2006 CSC311: Data Structures

Arrays A set of data cells with fixed size, each cell holding a data item and all data items being the same type Purpose: to construct a linear structure to hold a given number of elements. Once an array is created, its size is fixed – length Indices starting from 0 Common algorithms on arrays Find max/min values Find average value Linear search Copying array and clone array High-dimensional arrays Fall 2006 CSC311: Data Structures

Array List ArrayList is a class – data type
Purpose: to construct a linear structure to hold a flexible number of elements Methods: size() – returns the number of elements in the ArrayList: a.size() get(<index>) – returns the <index>-th element that is the type of Object should convert the return value to your specific type The <index> starts at 0 set(<index>, <object>) – sets the <index>-th element to <object>, overwrites the old element add(<index>, <object>) – adds <object> before the <index>-th element remove(<index<) – removes the <index>-th element Fall 2006 CSC311: Data Structures

Recursion A recursive computation solves a problem by using the solution of the same problem with simpler input For recursion to terminate, there must be special cases for the simplest inputs The simplest case can be directly solved The recursive cases should call the method itself Fall 2006 CSC311: Data Structures

Chapter 3 Arrays, Linked Lists, and Recursion
Objectives Using Arrays Singly Linked Lists Doubly Linked Lists Circularly Linked Lists Linked-List Sorting Recursion Fall 2006 CSC311: Data Structures

Arrays Declaration Properties Operations Index: 0 through length-1
Length: the number of elements Fixed length Elements could be objects Operations Insertion: add(Object e) Deletion: remove(Object e) remove(int i) Search: find(Object e) Fall 2006 CSC311: Data Structures

Sorting an Array Insertion-sort algorithm Selection-sort algorithm
Merge-sort algorithm Bubble-sort algorithm Quick-sort algorithm Fall 2006 CSC311: Data Structures

java.util Methods for Arrays
Simple methods equals(A, B) fill(A, x) sort(A) toString(A) Fall 2006 CSC311: Data Structures

Pseudo-Random Numbers
Random rand = new Random(); Rand.setSeed(System.currentTimeMills()); …… rand.nextInt(100); // between 0 and 99 Pseudo-random number generator seed Fall 2006 CSC311: Data Structures

Two-Dmensional Arrays
Array of arrays Index Length Reference Assignment of arrays Matrix High-dimension arrays Fall 2006 CSC311: Data Structures

Singly Linked Lists Linked list: a collection of nodes in a linear order Nodes Link – pointing to the next node Data Head – refer to the first node Operations: Insertion addFirst(Object) -- O(1) addLast(Object) -- O(1) Removal removeFirst() -- O(1) removeLast() -- O(1) Search find(Object) -- O(n) Implementation Boundary scenarios: insert into and remove from an empty list Fall 2006 CSC311: Data Structures

Doubly Linked Lists Doubly linked list – a linked list with each node having two links pointing to its previous and next nodes respectively Node: DNode Fields: Data – element Link to previous node – prev Link to next node – next Methods: getElement() getNext() getPrev() setElement(Object) setNext(DNode) setPrev(DNode) Fall 2006 CSC311: Data Structures

Doubly Linked Lists Header and Trailer Sentinels Operations – at ends
Separate header and trailer One header, one trailer Integrated header and trailer One node with prev pointing to the last node and next pointing to the first node Operations – at ends Insertion Removal Operations – in the middle removal Fall 2006 CSC311: Data Structures

Circularly Linked Lists
A linked list without head or tail Traversal means circle through all nodes Cursor: current node Operations: add(Object) – immediately after the cursor remove() – immediately after the cursor advance() – go to the next node Fall 2006 CSC311: Data Structures

Sorting a Linked List Insertion-sort Selection-sort
Using single linked list Start from the first to the current Find the appropriate position and insert Using two linked list Remove the first from the source list Insert into the target list Selection-sort Select the maximum from the original first Insert at the first Select the minimum from the source list Insert at the last to the target list Fall 2006 CSC311: Data Structures

Recursion Pattern Recursion: when a method calls itself
Classic example--the factorial function: n! = 1· 2· 3· ··· · (n-1)· n Recursive definition: As a Java method: // recursive factorial function public static int recursiveFactorial(int n) { if (n == 0) return 1; // basis case else return n * recursiveFactorial(n- 1); // recursive case } Fall 2006 CSC311: Data Structures

Linear Recursion Test for base cases. Recur once.
Begin by testing for a set of base cases (there should be at least one). Every possible chain of recursive calls must eventually reach a base case, and the handling of each base case should not use recursion. Recur once. Perform a single recursive call. (This recursive step may involve a test that decides which of several possible recursive calls to make, but it should ultimately choose to make just one of these calls each time we perform this step.) Define each possible recursive call so that it makes progress towards a base case. Fall 2006 CSC311: Data Structures

A Simple Example of Linear Recursion
Example recursion trace: Algorithm LinearSum(A, n): Input: An integer array A and an integer n  1, such that A has at least n elements Output: The sum of the first n integers in A if n = 1 then return A[0] else return LinearSum(A, n - 1) + A[n - 1] LinearSum ( A , 5 ) 1 2 3 4 call return [ ] = + = 7 6 13 15 20 Fall 2006 CSC311: Data Structures

Reversing an Array Algorithm ReverseArray(A, i, j):
Input: An array A and nonnegative integer indices i and j Output: The reversal of the elements in A starting at index i and ending at j if i < j then Swap A[i] and A[ j] ReverseArray(A, i + 1, j - 1) return Fall 2006 CSC311: Data Structures

Defining Arguments for Recursion
In creating recursive methods, it is important to define the methods in ways that facilitate recursion. This sometimes requires we define additional parameters that are passed to the method. For example, we defined the array reversal method as ReverseArray(A, i, j), not ReverseArray(A). Fall 2006 CSC311: Data Structures

Computing Powers The power function, p(x,n)=xn, can be defined recursively: This leads to an power function that runs in O(n) time (for we make n recursive calls). We can do better than this, however. Fall 2006 CSC311: Data Structures

Recursive Squaring We can derive a more efficient linearly recursive algorithm by using repeated squaring: For example, 24 = 2(4/2)2 = (24/2)2 = (22)2 = 42 = 16 25 = 21+(4/2)2 = 2(24/2)2 = 2(22)2 = 2(42) = 32 26 = 2(6/ 2)2 = (26/2)2 = (23)2 = 82 = 64 27 = 21+(6/2)2 = 2(26/2)2 = 2(23)2 = 2(82) = 128. Fall 2006 CSC311: Data Structures

A Recursive Squaring Method
Algorithm Power(x, n): Input: A number x and integer n = 0 Output: The value xn if n = 0 then return 1 if n is odd then y = Power(x, (n - 1)/ 2) return x · y ·y else y = Power(x, n/ 2) return y · y Fall 2006 CSC311: Data Structures

Analyzing the Recursive Squaring Method
Algorithm Power(x, n): Input: A number x and integer n = 0 Output: The value xn if n = 0 then return 1 if n is odd then y = Power(x, (n - 1)/ 2) return x · y · y else y = Power(x, n/ 2) return y · y Each time we make a recursive call we halve the value of n; hence, we make log n recursive calls. That is, this method runs in O(log n) time. It is important that we used a variable twice here rather than calling the method twice. Fall 2006 CSC311: Data Structures

Tail Recursion Tail recursion occurs when a linearly recursive method makes its recursive call as its last step. The array reversal method is an example. Such methods can be easily converted to non-recursive methods (which saves on some resources). Example: Algorithm IterativeReverseArray(A, i, j ): Input: An array A and nonnegative integer indices i and j Output: The reversal of the elements in A starting at index i and ending at j while i < j do Swap A[i ] and A[ j ] i = i + 1 j = j - 1 return Fall 2006 CSC311: Data Structures

Binary Recursive Method
Binary recursion occurs whenever there are two recursive calls for each non-base case. Example Problem: add all the numbers in an integer array A: Algorithm BinarySum(A, i, n): Input: An array A and integers i and n Output: The sum of the n integers in A starting at index i if n = 1 then return A[i ] return BinarySum(A, i, n/ 2) + BinarySum(A, i + n/ 2, n/ 2) Example trace: 3 , 1 2 4 8 7 6 5 Fall 2006 CSC311: Data Structures

Computing Fibanacci Numbers
Fibonacci numbers are defined recursively: F0 = 0 F1 = 1 Fi = Fi-1 + Fi for i > 1. As a recursive algorithm (first attempt): Algorithm BinaryFib(k): Input: Nonnegative integer k Output: The kth Fibonacci number Fk if k <= 1 then return k else return BinaryFib(k - 1) + BinaryFib(k - 2) Fall 2006 CSC311: Data Structures

Analyzing the Binary Recursion Fibonacci Algorithm
Let nk denote number of recursive calls made by BinaryFib(k). Then n0 = 1 n1 = 1 n2 = n1 + n0 + 1 = = 3 n3 = n2 + n1 + 1 = = 5 n4 = n3 + n2 + 1 = = 9 n5 = n4 + n3 + 1 = = 15 n6 = n5 + n4 + 1 = = 25 n7 = n6 + n5 + 1 = = 41 n8 = n7 + n6 + 1 = = 67. Note that the value at least doubles for every other value of nk. That is, nk > 2k/2. It is exponential! Fall 2006 CSC311: Data Structures

A Better Fibonacci Algorithm
Use linear recursion instead: Algorithm LinearFibonacci(k): Input: A nonnegative integer k Output: Pair of Fibonacci numbers (Fk, Fk-1) if k <= 1 then return (k, 0) else (i, j) = LinearFibonacci(k - 1) return (i +j, i) Runs in O(k) time. Fall 2006 CSC311: Data Structures

Multiple Recursion Motivating example: summation puzzles
pot + pan = bib dog + cat = pig boy + girl = baby Multiple recursion: makes potentially many recursive calls (not just one or two). Fall 2006 CSC311: Data Structures

Algorithm for Multiple Recursion
Algorithm PuzzleSolve(k,S,U): Input: An integer k, sequence S, and set U (the universe of elements to test) Output: An enumeration of all k-length extensions to S using elements in U without repetitions for all e in U do Remove e from U {e is now being used} Add e to the end of S if k = 1 then Test whether S is a configuration that solves the puzzle if S solves the puzzle then return “Solution found: ” S else PuzzleSolve(k - 1, S,U) Add e back to U {e is now unused} Remove e from the end of S Fall 2006 CSC311: Data Structures

Visualizing PuzzleSolve
( 3 , () ,{ a b c } ) Initial call 2 1 ab ac cb ca bc ba abc acb bac bca cab cba Fall 2006 CSC311: Data Structures

Chapter 4 Analysis Tools
Objectives Experiment analysis of algorithms and limitations Theoretical Analysis of algorithms Pseudo-code description of algorithms Big-Oh notations Seven functions Proof techniques Fall 2006 CSC311: Data Structures

Analysis of Algorithms
Input Algorithm Output An algorithm is a step-by-step procedure for solving a problem in a finite amount of time.

Running Time Most algorithms transform input objects into output objects. The running time of an algorithm typically grows with the input size. Average case time is often difficult to determine. We focus on the worst case running time. Easier to analyze Crucial to applications such as games, finance and robotics Fall 2006 CSC311: Data Structures

Experimental Studies Write a program implementing the algorithm
Run the program with inputs of varying size and composition Use a method like System.currentTimeMillis() to get an accurate measure of the actual running time Plot the results Fall 2006 CSC311: Data Structures

Limitations of Experiments
It is necessary to implement the algorithm, which may be difficult Results may not be indicative of the running time on other inputs not included in the experiment. In order to compare two algorithms, the same hardware and software environments must be used Fall 2006 CSC311: Data Structures

Theoretical Analysis Uses a high-level description of the algorithm instead of an implementation Characterizes running time as a function of the input size, n. Takes into account all possible inputs Allows us to evaluate the speed of an algorithm independent of the hardware/software environment Fall 2006 CSC311: Data Structures

Pseudocode Example: find max element of an array
Algorithm arrayMax(A, n) Input array A of n integers Output maximum element of A currentMax  A[0] for i  1 to n  1 do if A[i]  currentMax then currentMax  A[i] return currentMax Example: find max element of an array High-level description of an algorithm More structured than English prose Less detailed than a program Preferred notation for describing algorithms Hides program design issues Fall 2006 CSC311: Data Structures

Pseudocode Details Control flow Method declaration Method call
if … then … [else …] while … do … repeat … until … for … do … Indentation replaces braces Method declaration Algorithm method (arg [, arg…]) Input … Output … Method call var.method (arg [, arg…]) Return value return expression Expressions Assignment (like  in Java) Equality testing (like  in Java) n2 Superscripts and other mathematical formatting allowed Fall 2006 CSC311: Data Structures

The Random Access Machine (RAM) Model
A CPU An potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character 1 2 Memory cells are numbered and accessing any cell in memory takes unit time. Fall 2006 CSC311: Data Structures

Seven Important Functions
Seven functions that often appear in algorithm analysis: Constant  1 Logarithmic  log n Linear  n N-Log-N  n log n Quadratic  n2 Cubic  n3 Exponential  2n In a log-log chart, the slope of the line corresponds to the growth rate of the function Fall 2006 CSC311: Data Structures

Primitive Operations Basic computations performed by an algorithm
Identifiable in pseudocode Largely independent from the programming language Exact definition not important (we will see why later) Assumed to take a constant amount of time in the RAM model Examples: Evaluating an expression Assigning a value to a variable Indexing into an array Calling a method Returning from a method Fall 2006 CSC311: Data Structures

Counting Primitive Operations
By inspecting the pseudocode, we can determine the maximum number of primitive operations executed by an algorithm, as a function of the input size Algorithm arrayMax(A, n) currentMax  A[0] # operations for i  1 to n  1 do n if A[i]  currentMax then 2(n  1) currentMax  A[i] 2(n  1) { increment counter i } 2(n  1) return currentMax Total 8n  2 Fall 2006 CSC311: Data Structures

Estimating Running Time
Algorithm arrayMax executes 8n  2 primitive operations in the worst case. Define: a = Time taken by the fastest primitive operation b = Time taken by the slowest primitive operation Let T(n) be worst-case time of arrayMax. Then a (8n  2)  T(n)  b(8n  2) Hence, the running time T(n) is bounded by two linear functions Fall 2006 CSC311: Data Structures

Growth Rate of Running Time
Changing the hardware/ software environment Affects T(n) by a constant factor, but Does not alter the growth rate of T(n) The linear growth rate of the running time T(n) is an intrinsic property of algorithm arrayMax Fall 2006 CSC311: Data Structures

Constant Factors The growth rate is not affected by Examples
constant factors or lower-order terms Examples 102n is a linear function 105n n is a quadratic function Fall 2006 CSC311: Data Structures

Big-Oh Notation f(n)  cg(n) for n  n0
Given functions f(n) and g(n), we say that f(n) is O(g(n)) if there are positive constants c and n0 such that f(n)  cg(n) for n  n0 Example: 2n + 10 is O(n) 2n + 10  cn (c  2) n  10 n  10/(c  2) Pick c = 3 and n0 = 10 Fall 2006 CSC311: Data Structures

Big-Oh Example Example: the function n2 is not O(n) n2  cn n  c
The above inequality cannot be satisfied since c must be a constant Fall 2006 CSC311: Data Structures

More Big-Oh Examples 7n-2 3n3 + 20n2 + 5 3 log n + 5 7n-2 is O(n)
need c > 0 and n0  1 such that 7n-2  c•n for n  n0 this is true for c = 7 and n0 = 1 3n3 + 20n2 + 5 3n3 + 20n2 + 5 is O(n3) need c > 0 and n0  1 such that 3n3 + 20n2 + 5  c•n3 for n  n0 this is true for c = 4 and n0 = 21 3 log n + 5 3 log n + 5 is O(log n) need c > 0 and n0  1 such that 3 log n + 5  c•log n for n  n0 this is true for c = 8 and n0 = 2 Fall 2006 CSC311: Data Structures

Big-Oh and Growth Rate The big-Oh notation gives an upper bound on the growth rate of a function The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n) We can use the big-Oh notation to rank functions according to their growth rate f(n) is O(g(n)) g(n) is O(f(n)) g(n) grows more Yes No f(n) grows more Same growth Fall 2006 CSC311: Data Structures

Big-Oh Rules If is f(n) a polynomial of degree d, then f(n) is O(nd), i.e., Drop lower-order terms Drop constant factors Use the smallest possible class of functions Say “2n is O(n)” instead of “2n is O(n2)” Use the simplest expression of the class Say “3n + 5 is O(n)” instead of “3n + 5 is O(3n)” Fall 2006 CSC311: Data Structures

Asymptotic Algorithm Analysis
The asymptotic analysis of an algorithm determines the running time in big-Oh notation To perform the asymptotic analysis We find the worst-case number of primitive operations executed as a function of the input size We express this function with big-Oh notation Example: We determine that algorithm arrayMax executes at most 8n  2 primitive operations We say that algorithm arrayMax “runs in O(n) time” Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them when counting primitive operations Fall 2006 CSC311: Data Structures

Computing Prefix Averages
We further illustrate asymptotic analysis with two algorithms for prefix averages The i-th prefix average of an array X is average of the first (i + 1) elements of X: A[i] = (X[0] + X[1] + … + X[i])/(i+1) Computing the array A of prefix averages of another array X has applications to financial analysis Fall 2006 CSC311: Data Structures

Prefix Averages (Quadratic)
The following algorithm computes prefix averages in quadratic time by applying the definition Algorithm prefixAverages1(X, n) Input array X of n integers Output array A of prefix averages of X #operations A  new array of n integers n for i  0 to n  1 do n s  X[0] n for j  1 to i do …+ (n  1) s  s + X[j] …+ (n  1) A[i]  s / (i + 1) n return A Fall 2006 CSC311: Data Structures

Arithmetic Progression
The running time of prefixAverages1 is O( …+ n) The sum of the first n integers is n(n + 1) / 2 There is a simple visual proof of this fact Thus, algorithm prefixAverages1 runs in O(n2) time Fall 2006 CSC311: Data Structures

Prefix Averages (Linear)
The following algorithm computes prefix averages in linear time by keeping a running sum Algorithm prefixAverages2(X, n) Input array X of n integers Output array A of prefix averages of X #operations A  new array of n integers n s  for i  0 to n  1 do n s  s + X[i] n A[i]  s / (i + 1) n return A Algorithm prefixAverages2 runs in O(n) time Fall 2006 CSC311: Data Structures

Math you need to Review Summations Logarithms logb(xy) = logbx + logby
logbxa = alogbx logba = logxa/logxb Exponentials: a(b+c) = aba c abc = (ab)c ab /ac = a(b-c) b = a logab bc = a c*logab Fall 2006 CSC311: Data Structures

Proof Techniques Proof techniques By counterexample Contrapositive
Ex. Let a and b integers. If ab is even, then a is even or b is even Proof: Consider the contrapositive: if a is odd and b is odd. Then you can find out that ab is odd. Contradiction Ex. Let a and b be integers. If ab is odd, then a is odd and b is odd Proof: Consider the opposite of the “then” part. You’ll reach to a contradiction where ab is even Induction Two formats Base case Induction case Loop invariants Fall 2006 CSC311: Data Structures

Relatives of Big-Oh big-Omega
f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0  1 such that f(n)  c•g(n) for n  n0 big-Theta f(n) is (g(n)) if there are constants c’ > 0 and c’’ > 0 and an integer constant n0  1 such that c’•g(n)  f(n)  c’’•g(n) for n  n0 Fall 2006 CSC311: Data Structures

Intuition for Asymptotic Notation
Big-Oh f(n) is O(g(n)) if f(n) is asymptotically less than or equal to g(n) big-Omega f(n) is (g(n)) if f(n) is asymptotically greater than or equal to g(n) big-Theta f(n) is (g(n)) if f(n) is asymptotically equal to g(n) Fall 2006 CSC311: Data Structures

Example Uses of the Relatives of Big-Oh
5n2 is (n2) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0  1 such that f(n)  c•g(n) for n  n0 let c = 5 and n0 = 1 5n2 is (n) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0  1 such that f(n)  c•g(n) for n  n0 let c = 1 and n0 = 1 5n2 is (n2) f(n) is (g(n)) if it is (n2) and O(n2). We have already seen the former, for the latter recall that f(n) is O(g(n)) if there is a constant c > 0 and an integer constant n0  1 such that f(n) < c•g(n) for n  n0 Let c = 5 and n0 = 1 Fall 2006 CSC311: Data Structures

Chapter 5 Stacks and Queues
Objectives Stacks and implementations Queues and implementations Double-ended queues and implementation Stacks and Queues CSC311: Data Structures

Abstract Data Types (ADTs)
An abstract data type (ADT) is an abstraction of a data structure An ADT specifies: Data stored Operations on the data Error conditions associated with operations Stacks and Queues CSC311: Data Structures

ADT Example Example: ADT modeling a simple stock trading system
The data stored are buy/sell orders The operations supported are order buy(stock, shares, price) order sell(stock, shares, price) void cancel(order) Error conditions: Buy/sell a nonexistent stock Cancel a nonexistent order Stacks and Queues CSC311: Data Structures

The Stack ADT The Stack ADT stores arbitrary objects
Insertions and deletions follow the last-in first-out scheme Think of a spring-loaded plate dispenser Main stack operations: push(object): inserts an element object pop(): removes and returns the last inserted element Auxiliary stack operations: object top(): returns the last inserted element without removing it integer size(): returns the number of elements stored boolean isEmpty(): indicates whether no elements are stored Stacks and Queues CSC311: Data Structures

Stack Interface in Java
public interface Stack { public int size(); public boolean isEmpty(); public Object top() throws EmptyStackException; public void push(Object o); public Object pop() throws EmptyStackException; } Java interface corresponding to our Stack ADT Requires the definition of class EmptyStackException Different from the built-in Java class java.util.Stack Stacks and Queues CSC311: Data Structures

Exceptions Attempting the execution of an operation of ADT may sometimes cause an error condition, called an exception Exceptions are said to be “thrown” by an operation that cannot be executed In the Stack ADT, operations pop and top cannot be performed if the stack is empty Attempting the execution of pop or top on an empty stack throws an EmptyStackException Stacks and Queues CSC311: Data Structures

Applications of Stacks
Direct applications Page-visited history in a Web browser Undo sequence in a text editor Chain of method calls in the Java Virtual Machine Indirect applications Auxiliary data structure for algorithms Component of other data structures Stacks and Queues CSC311: Data Structures

Method Stack in the JVM The Java Virtual Machine (JVM) keeps track of the chain of active methods with a stack When a method is called, the JVM pushes on the stack a frame containing Local variables and return value Program counter, keeping track of the statement being executed When a method ends, its frame is popped from the stack and control is passed to the method on top of the stack Allows for recursion main() { int i = 5; foo(i); } foo(int j) { int k; k = j+1; bar(k); } bar(int m) { … } bar PC = 1 m = 6 foo PC = 3 j = 5 k = 6 main PC = 2 i = 5 Stacks and Queues CSC311: Data Structures

Array-based Stack Algorithm size() return t + 1 Algorithm pop()
if isEmpty() then throw EmptyStackException else t  t  1 return S[t + 1] A simple way of implementing the Stack ADT uses an array We add elements from left to right A variable keeps track of the index of the top element … S 1 2 t Stacks and Queues CSC311: Data Structures

Array-based Stack (cont.)
The array storing the stack elements may become full A push operation will then throw a FullStackException Limitation of the array-based implementation Not intrinsic to the Stack ADT Algorithm push(o) if t = S.length  1 then throw FullStackException else t  t + 1 S[t]  o S 1 2 t … Stacks and Queues CSC311: Data Structures

Performance and Limitations
Let n be the number of elements in the stack The space used is O(n) Each operation runs in time O(1) Limitations The maximum size of the stack must be defined a priori and cannot be changed Trying to push a new element into a full stack causes an implementation-specific exception Stacks and Queues CSC311: Data Structures

Array-based Stack in Java
public class ArrayStack implements Stack { // holds the stack elements private Object S[ ]; // index to top element private int top = -1; // constructor public ArrayStack(int capacity) { S = new Object[capacity]); } public Object pop() throws EmptyStackException { if isEmpty() throw new EmptyStackException (“Empty stack: cannot pop”); Object temp = S[top]; // facilitates garbage collection S[top] = null; top = top – 1; return temp; } …… Stacks and Queues CSC311: Data Structures

Parentheses Matching Each “(”, “{”, or “[” must be paired with a matching “)”, “}”, or “[” correct: ( )(( )){([( )])} correct: ((( )(( )){([( )])} incorrect: )(( )){([( )])} incorrect: ({[ ])} incorrect: ( Stacks and Queues CSC311: Data Structures

Parentheses Matching Algorithm
Algorithm ParenMatch(X,n): Input: An array X of n tokens, each of which is either a grouping symbol, a variable, an arithmetic operator, or a number Output: true if and only if all the grouping symbols in X match Let S be an empty stack for i=0 to n-1 do if X[i] is an opening grouping symbol then S.push(X[i]) else if X[i] is a closing grouping symbol then if S.isEmpty() then return false {nothing to match with} if S.pop() does not match the type of X[i] then return false {wrong type} return true {every symbol matched} else return false {some symbols were never matched} Stacks and Queues CSC311: Data Structures

HTML Tag Matching The Little Boat
For fully-correct HTML, each <name> should pair with a matching </name> The Little Boat The storm tossed the little boat like a cheap sneaker in an old washing machine. The three drunken fishermen were used to such treatment, of course, but not the tree salesman, who even as a stowaway now felt that he had overpaid for the voyage. 1. Will the salesman die? 2. What color is the boat? 3. And what about Naomi? <body> <center> <h1> The Little Boat </h1> </center> <p> The storm tossed the little boat like a cheap sneaker in an old washing machine. The three drunken fishermen were used to such treatment, of course, but not the tree salesman, who even as a stowaway now felt that he had overpaid for the voyage. </p> <ol> <li> Will the salesman die? </li> <li> What color is the boat? </li> <li> And what about Naomi? </li> </ol> </body> Stacks and Queues CSC311: Data Structures

Tag Matching Algorithm
Is similar to parentheses matching: import java.util.StringTokenizer; import datastructures.Stack; import datastructures.NodeStack; import java.io.*; /** Simpli.ed test of matching tags in an HTML document. */ public class HTML { /** Nested class to store simple HTML tags */ public static class Tag { String name; // The name of this tag boolean opening; // Is true i. this is an opening tag public Tag() { // Default constructor name = ""; opening = false; } public Tag(String nm, boolean op) { // Preferred constructor name = nm; opening = op; /** Is this an opening tag? */ public boolean isOpening() { return opening; } /** Return the name of this tag */ public String getName() {return name; } /** Test if every opening tag has a matching closing tag. */ public boolean isHTMLMatched(Tag[ ] tag) { Stack S = new NodeStack(); // Stack for matching tags for (int i=0; (i<tag.length) && (tag[i] != null); i++) { if (tag[i].isOpening()) S.push(tag[i].getName()); // opening tag; push its name on the stack else { if (S.isEmpty()) // nothing to match return false; if (!((String) S.pop()).equals(tag[i].getName())) // wrong match if (S.isEmpty()) return true; // we matched everything return false; // we have some tags that never were matched Stacks and Queues CSC311: Data Structures

Tag Matching Algorithm, cont.
public final static int CAPACITY = 1000; // Tag array size upper bound /* Parse an HTML document into an array of html tags */ public Tag[ ] parseHTML(BufferedReader r) throws IOException { String line; // a line of text boolean inTag = false ; // true iff we are in a tag Tag[ ] tag = new Tag[CAPACITY]; // our tag array (initially all null) int count = 0 ; // tag counter while ((line = r.readLine()) != null) { // Create a string tokenizer for HTML tags (use < and > as delimiters) StringTokenizer st = new StringTokenizer(line,"<> \t",true); while (st.hasMoreTokens()) { String token = (String) st.nextToken(); if (token.equals("<")) // opening a new HTML tag inTag = true; else if (token.equals(">")) // ending an HTML tag inTag = false; else if (inTag) { // we have a opening or closing HTML tag if ( (token.length() == 0) | | (token.charAt(0) != ’/’) ) tag[count++] = new Tag(token, true); // opening tag else // ending tag tag[count++] = new Tag(token.substring(1), false); // skip the } // Note: we ignore anything not in an HTML tag } return tag; // our array of tags /** Tester method */ public static void main(String[ ] args) throws IOException { BufferedReader stdr; // Standard Input Reader stdr = new BufferedReader(new InputStreamReader(System.in)); HTML tagChecker = new HTML(); if (tagChecker.isHTMLMatched(tagChecker.parseHTML(stdr))) System.out.println("The input file is a matched HTML document."); else System.out.println("The input file is not a matched HTML document."); Stacks and Queues CSC311: Data Structures

Computing Spans We show how to use a stack as an auxiliary data structure in an algorithm Given an an array X, the span S[i] of X[i] is the maximum number of consecutive elements X[j] immediately preceding X[i] and such that X[j]  X[i] Spans have applications to financial analysis E.g., stock at 52-week high X 6 3 4 5 2 1 S Stacks and Queues CSC311: Data Structures

Quadratic Algorithm Algorithm spans1(X, n) Input array X of n integers
Output array S of spans of X # S  new array of n integers n for i  0 to n  1 do n s  n while s  i  X[i - s]  X[i] …+ (n  1) s  s …+ (n  1) S[i]  s n return S Algorithm spans1 runs in O(n2) time Stacks and Queues CSC311: Data Structures

Computing Spans with a Stack
We keep in a stack the indices of the elements visible when “looking back” We scan the array from left to right Let i be the current index We pop indices from the stack until we find index j such that X[i]  X[j] We set S[i]  i - j We push x onto the stack Stacks and Queues CSC311: Data Structures

Linear Algorithm Each index of the array
Is pushed into the stack exactly once Is popped from the stack at most once The statements in the while-loop are executed at most n times Algorithm spans2 runs in O(n) time Algorithm spans2(X, n) # S  new array of n integers n A  new empty stack for i  0 to n  1 do n while (A.isEmpty()  X[A.top()]  X[i] ) do n A.pop() n if A.isEmpty() then n S[i]  i n else S[i]  i - A.top() n A.push(i) n return S Stacks and Queues CSC311: Data Structures

The Queue ADT Auxiliary queue operations: Exceptions
The Queue ADT stores arbitrary objects Insertions and deletions follow the first-in first-out scheme Insertions are at the rear of the queue and removals are at the front of the queue Main queue operations: enqueue(object): inserts an element at the end of the queue object dequeue(): removes and returns the element at the front of the queue Auxiliary queue operations: object front(): returns the element at the front without removing it integer size(): returns the number of elements stored boolean isEmpty(): indicates whether no elements are stored Exceptions Attempting the execution of dequeue or front on an empty queue throws an EmptyQueueException Stacks and Queues CSC311: Data Structures

Queue Example Operation Output Q enqueue(5) – (5) enqueue(3) – (5, 3)
dequeue() 5 (3) enqueue(7) – (3, 7) dequeue() 3 (7) front() 7 (7) dequeue() 7 () dequeue() “error” () isEmpty() true () enqueue(9) – (9) enqueue(7) – (9, 7) size() 2 (9, 7) enqueue(3) – (9, 7, 3) enqueue(5) – (9, 7, 3, 5) dequeue() 9 (7, 3, 5) Stacks and Queues CSC311: Data Structures

Applications of Queues
Direct applications Waiting lists, bureaucracy Access to shared resources (e.g., printer) Multiprogramming Indirect applications Auxiliary data structure for algorithms Component of other data structures Stacks and Queues CSC311: Data Structures

wrapped-around configuration
Array-based Queue Use an array of size N in a circular fashion Two variables keep track of the front and rear f index of the front element r index immediately past the rear element Array location r is kept empty normal configuration Q 1 2 r f wrapped-around configuration Q 1 2 f r Stacks and Queues CSC311: Data Structures

Queue Operations We use the modulo operator (remainder of division)
Algorithm size() return (N - f + r) mod N Algorithm isEmpty() return (f = r) Q 1 2 r f Q 1 2 f r Stacks and Queues CSC311: Data Structures

Queue Operations (cont.)
Algorithm enqueue(o) if size() = N  1 then throw FullQueueException else Q[r]  o r  (r + 1) mod N Operation enqueue throws an exception if the array is full This exception is implementation-dependent Q 1 2 r f Q 1 2 f r Stacks and Queues CSC311: Data Structures

Queue Operations (cont.)
Algorithm dequeue() if isEmpty() then throw EmptyQueueException else o  Q[f] f  (f + 1) mod N return o Operation dequeue throws an exception if the queue is empty This exception is specified in the queue ADT Q 1 2 r f Q 1 2 f r Stacks and Queues CSC311: Data Structures

Queue Interface in Java
Java interface corresponding to our Queue ADT Requires the definition of class EmptyQueueException No corresponding built-in Java class public interface Queue { public int size(); public boolean isEmpty(); public Object front() throws EmptyQueueException; public void enqueue(Object o); public Object dequeue() throws EmptyQueueException; } Stacks and Queues CSC311: Data Structures

Application: Round Robin Schedulers
We can implement a round robin scheduler using a queue, Q, by repeatedly performing the following steps: e = Q.dequeue() Service element e Q.enqueue(e) The Queue Shared Service 1 . Deque the next element 3 Enqueue the serviced element 2 Service the Stacks and Queues CSC311: Data Structures

Chapter 6 Lists and Iterators
Objectives Array Lists and Vectors: ADT and Implementation Node Lists: ADT and Implementations Iterators: ADT and Implementation List ADTs and Collections Design Patterns Adaptor Pattern Position Pattern Iterator Pattern Lists and Iterators CSC311: Data Structures

The Array List (Vector) ADT
The Array List ADT extends the notion of array by storing a sequence of arbitrary objects An element can be accessed, inserted or removed by specifying its rank (number of elements preceding it) An exception is thrown if an incorrect rank is specified (e.g., a negative rank) Main Array List operations: object get(int r): returns the element at rank r without removing it object set(int r, Object o): replace the element at rank with o and return the old element add(int r, Object o): insert a new element o to have rank r object remove (int r): removes and returns the element at rank r Additional operations size() and isEmpty() Lists and Iterators CSC311: Data Structures

Applications of Array Lists
Direct applications Sorted collection of objects (elementary database) Indirect applications Auxiliary data structure for algorithms Component of other data structures Lists and Iterators CSC311: Data Structures

Adaptor Pattern Adaptor Pattern: adapting an existing class to a new class Contains an instance of the existing class as a instance field, and implement each method of the new class by calling the method of the instance field object Adapt Array List ADT to Deque ADT class Deque { private ArrayList al; public Deque() { al = new ArrayList(); } public Object getFirst() { return al.get(0); public void addFirst(Object e) { al.add(0, e); public Object removeFirst() { return al.remove(0); … Lists and Iterators CSC311: Data Structures

Array-based Implementation
Use an array V of size N A variable n keeps track of the size of the array list (number of elements stored) Operation get(r) is implemented in O(1) time by returning V[r] V 1 2 r n Lists and Iterators CSC311: Data Structures

Insertion In operation add(r, o), we need to make room for the new element by shifting forward the n - r elements V[r], …, V[n - 1] In the worst case (r = 0), this takes O(n) time Algorithm add(i, e) for t = n  1 to i do A[t+1]  A[t] A[t]  e n  n+1 V 1 2 n r o Lists and Iterators CSC311: Data Structures

Deletion In operation remove(r), we need to fill the hole left by the removed element by shifting backward the n - r - 1 elements V[r + 1], …, V[n - 1] In the worst case (r = 0), this takes O(n) time Algorithm remove(i) e  A[i] for t = i to n-2 do A[t]  A[t+1] n  n-1 return e V 1 2 n r o Lists and Iterators CSC311: Data Structures

Performance In the array based implementation of an Array List
The space used by the data structure is O(n) size, isEmpty, get and set run in O(1) time add and remove run in O(n) time If we use the array in a circular fashion, add(0) and remove(0) run in O(1) time In an add operation, when the array is full, instead of throwing an exception, we can replace the array with a larger one Lists and Iterators CSC311: Data Structures

Growable Array-based Implementation
Consider an array list implemented with array In a add operation, when the array is full, instead of throwing an exception, we can replace the array with a larger one How large should the new array be? incremental strategy: increase the size by a constant c doubling strategy: double the size Algorithm add(i, o) if n = S.length  1 then A  new array of size … for j  0 to i-1 do A[j]  S[j] A[i] = o; for ji+1 to n do A[j] = S[j-1]; S  A Lists and Iterators CSC311: Data Structures

Comparison of the Strategies
We compare the incremental strategy and the doubling strategy by analyzing the total time T(n) needed to perform a series of n add operations We assume that we start with an empty array list represented by an array of size 1 We call amortized time of an add operation the average time taken by an add over the series of operations, i.e., T(n)/n Lists and Iterators CSC311: Data Structures

Incremental Strategy Analysis
We replace the array k = n/c times The total time T(n) of a series of n push operations is proportional to n + c + 2c + 3c + 4c + … + kc = n + c( … + k) = n + ck(k + 1)/2 Since c is a constant, T(n) is O(n + k2), i.e., O(n2) The amortized time of an add operation is O(n) Lists and Iterators CSC311: Data Structures

Doubling Strategy Analysis
We replace the array k = log2 n times The total time T(n) of a series of n add operations is proportional to n …+ 2k = n + 2k = 2n -1 T(n) is O(n) The amortized time of an add operation is O(1) geometric series 1 2 4 8 Lists and Iterators CSC311: Data Structures

Position ADT The Position ADT models the notion of place within a data structure where a single object is stored It gives a unified view of diverse ways of storing data, such as a cell of an array a node of a linked list Just one method: object element(): returns the element stored at the position Lists and Iterators CSC311: Data Structures

Node List ADT The Node List ADT models a sequence of positions storing arbitrary objects It establishes a before/after relation between positions Generic methods: size(), isEmpty() Accessor methods: first(), last() prev(p), next(p) Update methods: set(p, e) addBefore(p, e), addAftter(p, e), addFirst(e), addLast(e) remove(p) Lists and Iterators CSC311: Data Structures

Doubly Linked List A doubly linked list provides a natural implementation of the Node List ADT Nodes implement Position and store: element link to the previous node link to the next node Special trailer and header nodes prev next elem node trailer header nodes/positions elements Lists and Iterators CSC311: Data Structures

Insertion p A B C p q A B C X p q A B X C
We visualize operation addAfter(p, X), which returns position q p A B C p q A B C X p q A B X C Lists and Iterators CSC311: Data Structures

Insertion Algorithm Algorithm addAfter(p,e): Create a new node v
v.setElement(e) v.setPrev(p) {link v to its predecessor} v.setNext(p.getNext()) {link v to its successor} (p.getNext()).setPrev(v) {link p’s old successor to v} p.setNext(v) {link p to its new successor, v} return v {the position for the element e} Lists and Iterators CSC311: Data Structures

Deletion A B C D p A B C p D A B C
We visualize remove(p), where p = last() A B C D p A B C p D A B C Lists and Iterators CSC311: Data Structures

Deletion Algorithm Algorithm remove(p):
t = p.element {a temporary variable to hold the return value} (p.getPrev()).setNext(p.getNext()) {linking out p} (p.getNext()).setPrev(p.getPrev()) p.setPrev(null) {invalidating the position p} p.setNext(null) return t Lists and Iterators CSC311: Data Structures

Performance In the implementation of the Node List ADT by means of a doubly linked list The space used by a list with n elements is O(n) The space used by each position of the list is O(1) All the operations of the List ADT run in O(1) time Operation element() of the Position ADT runs in O(1) time Lists and Iterators CSC311: Data Structures

Sequence ADT The Sequence ADT is the union of the Array List, Deque, and Node List ADTs Elements accessed by index, or Position Generic methods: size(), isEmpty() Bridge methods: atIndex(r) indexOf(p) Lists and Iterators CSC311: Data Structures

Applications of Sequences
The Sequence ADT is a basic, general-purpose, data structure for storing an ordered collection of elements Direct applications: Generic replacement for stack, queue, vector, or list small database (e.g., address book) Indirect applications: Building block of more complex data structures Lists and Iterators CSC311: Data Structures

Multiple Inheritance in the Sequence ADT
Sequence has three super interfaces: Node List, Array List, and Deque Node List-based methods: first() last() prev(p) next(p) set(p, o) addBefore(p, o) addAfter(p, o) addFirst(o) addLast(o) remove(p) Array List-based methods: get(r) set(r, o) add(r, o) remove(r) Deque-based methods: addFirst(o) addLast(o) removeFirst() removeLast() getFirst() getLast() Lists and Iterators CSC311: Data Structures

Iterator ADT An iterator abstracts the process of scanning through a collection of elements Methods of the Iterator ADT: boolean hasNext() object next() Implementation with an array or singly linked list An iterator is typically associated with an another data structure Two notions of iterator: snapshot: freezes the contents of the data structure at a given time dynamic: follows changes to the data structure Lists and Iterators CSC311: Data Structures

Design Pattern: Iterators
We can augment the Stack, Queue, Vector, List and Sequence ADTs with the following method to iterate all elements: Iterator elements() For ADTs that support the notion of position, such as list and sequences, we can provide a method to iterate all positions: Iterator positions() Lists and Iterators CSC311: Data Structures

Iterators in Java Java.util.Iterator interface
Iterable ADT provides a method: Iterator(): returns an iterator of the elements in the collection For-Each loop Syntax: for (type name: expression) loop_statement; Equivalence: for (Iterator <type> it=expresion.iterator(); it.hasNext();) { Type name = it.next(); } Lists and Iterators CSC311: Data Structures

ListIterator List iterator gives access to elements inside a linked list ListIterator protects the linked list while giving access ListIterator encapsulates a position anywhere in the linked list Lists and Iterators CSC311: Data Structures

Conceptual View of the ListIterator
Think of an iterator as pointing between two links The listIterator method of the LinkedList class gets a list iterator LinkedList list = . . . ListIterator iterator = list.listIterator(); Lists and Iterators CSC311: Data Structures

ListIterator Methods The next method moves the iterator iterator.next(); next throws a NoSuchElementException if you are already past the end of the list hasNext returns true if there is a next element if (iterator.hasNext()) iterator.next(); The next method returns the object of the link that it is passing while iterator.hasNext() { Object obj = iterator.next(); //do something with the object } To move the list position backwards, use: hasPrevious previous Lists and Iterators CSC311: Data Structures

Collections in Java Collection Iterator List ListIterator Map Queue
Set Lists and Iterators CSC311: Data Structures

Chapter 7: Trees Objectives:
Introduce general tree structure and Tree ADT Discuss the depth and height of trees Introduce the tree traversal algorithms Specialize to binary trees Implement binary trees with linked structure and array-list structure Introduce the Template Method Pattern CSC311: Data Structures

What is a Tree In computer science, a tree is an abstract model of a hierarchical structure A tree consists of nodes with a parent-child relation Applications: Organization charts File systems Programming environments Computers”R”Us Sales R&D Manufacturing Laptops Desktops US International Europe Asia Canada Trees CSC311: Data Structures

Tree Terminology subtree Root: node without parent (A)
Internal node: node with at least one child (A, B, C, F) External node (a.k.a. leaf ): node without children (E, I, J, K, G, H, D) Ancestors of a node: parent, grandparent, grand-grandparent, etc. Depth of a node: number of ancestors Height of a tree: maximum depth of any node (3) Descendant of a node: child, grandchild, grand-grandchild, etc. Subtree: tree consisting of a node and its descendants A B D C G H E F I J K subtree Trees CSC311: Data Structures

Tree ADT We use positions to abstract nodes Generic methods:
integer size() boolean isEmpty() Iterator iterator() Iterator positions() Accessor methods: position root() position parent(p) positionIterator children(p) Query methods: boolean isInternal(p) boolean isExternal(p) boolean isRoot(p) Update method: object replace (p, o) Additional update methods may be defined by data structures implementing the Tree ADT Trees CSC311: Data Structures

Depth of a Node The depth of a node v in a tree T is defined as:
Algorithm depth(T, v) Input: Tree T and a node v Output: an integer that is the depth of v in T If v is the root of T then return 0 Else return 1 + depth(T, parent(v)) The depth of a node v in a tree T is defined as: If v is the root, then its depth is 0; Otherwise, its depth is one plus the depth of its parent Trees CSC311: Data Structures

Height of a Node The height of a node v in a tree T is defined as:
Algorithm height(T, v) Input: Tree T and a node v Output: an integer that is the height of v in T If v is an external of T then return 0 Else h0 for each child w of v in T do hmax(h, height(T,w)) return 1+h The height of a node v in a tree T is defined as: If v is an external, then its height is 0; Otherwise, its height is one plus the maximum height of its children Trees CSC311: Data Structures

Features on Height The height of a nonempty tree T is equal to the maximum depth of an external node of T Let T be a tree with n nodes, and let cv denote the number if children of a node v of T. The summing over the vertices in T, vcv=n-1. Trees CSC311: Data Structures

Preorder Traversal Algorithm preOrder(v) visit(v)
A traversal visits the nodes of a tree in a systematic manner In a preorder traversal, a node is visited before its descendants Application: print a structured document Algorithm preOrder(v) visit(v) for each child w of v preorder (w) 1 Make Money Fast! 2 5 9 1. Motivations 2. Methods References 6 7 8 3 4 2.1 Stock Fraud 2.2 Ponzi Scheme 2.3 Bank Robbery 1.1 Greed 1.2 Avidity Trees CSC311: Data Structures

Postorder Traversal Algorithm postOrder(v) for each child w of v
In a postorder traversal, a node is visited after its descendants Application: compute space used by files in a directory and its subdirectories Algorithm postOrder(v) for each child w of v postOrder (w) visit(v) 9 cs16/ 8 3 7 todo.txt 1K homeworks/ programs/ 1 2 4 5 6 h1c.doc 3K h1nc.doc 2K DDR.java 10K Stocks.java 25K Robot.java 20K Trees CSC311: Data Structures

Binary Trees Applications:
arithmetic expressions decision processes searching A binary tree is a tree with the following properties: Each internal node has at most two children (exactly two for proper binary trees) The children of a node are an ordered pair We call the children of an internal node left child and right child Alternative recursive definition: a binary tree is either a tree consisting of a single node, or a tree whose root has an ordered pair of children, each of which is a binary tree A B C F G D E H I Trees CSC311: Data Structures

Arithmetic Expression Tree
Binary tree associated with an arithmetic expression internal nodes: operators external nodes: operands Example: arithmetic expression tree for the expression (2  (a - 1) + (3  b)) +  - 2 a 1 3 b Trees CSC311: Data Structures

Decision Tree Binary tree associated with a decision process
internal nodes: questions with yes/no answer external nodes: decisions Example: dining decision Want a fast meal? Yes No How about coffee? On expense account? Yes No Yes No Starbucks Spike’s Al Forno Café Paragon Trees CSC311: Data Structures

BinaryTree ADT The BinaryTree ADT extends the Tree ADT, i.e., it inherits all the methods of the Tree ADT Additional methods: position left(p) position right(p) boolean hasLeft(p) boolean hasRight(p) Update methods may be defined by data structures implementing the BinaryTree ADT Trees CSC311: Data Structures

Properties of Proper Binary Trees
e = i + 1 n = 2e - 1 h  i h  (n - 1)/2 e  2h h  log2 e h  log2 (n + 1) - 1 Notation n number of nodes e number of external nodes i number of internal nodes h height Trees CSC311: Data Structures

Inorder Traversal Algorithm inOrder(v) if hasLeft (v)
In an inorder traversal a node is visited after its left subtree and before its right subtree Application: draw a binary tree x(v) = inorder rank of v y(v) = depth of v Algorithm inOrder(v) if hasLeft (v) inOrder (left (v)) visit(v) if hasRight (v) inOrder (right (v)) 6 2 8 1 4 7 9 3 5 Trees CSC311: Data Structures

Print Arithmetic Expressions
Algorithm printExpression(v) if hasLeft (v) print(“(’’) inOrder (left(v)) print(v.element ()) if hasRight (v) inOrder (right(v)) print (“)’’) Specialization of an inorder traversal print operand or operator when visiting node print “(“ before traversing left subtree print “)“ after traversing right subtree +  - 2 a 1 3 b ((2  (a - 1)) + (3  b)) Trees CSC311: Data Structures

Evaluate Arithmetic Expressions
Specialization of a postorder traversal recursive method returning the value of a subtree when visiting an internal node, combine the values of the subtrees Algorithm evalExpr(v) if isExternal (v) return v.element () else x  evalExpr(leftChild (v)) y  evalExpr(rightChild (v))   operator stored at v return x  y +  - 2 5 1 3 Trees CSC311: Data Structures

Euler Tour Traversal +   2 - 3 2 5 1
Generic traversal of a binary tree Includes a special cases the preorder, postorder and inorder traversals Walk around the tree and visit each node three times: on the left (preorder) from below (inorder) on the right (postorder) + L  R  B 2 - 3 2 5 1 Trees CSC311: Data Structures

Template Method Pattern
Generic algorithm that can be specialized by redefining certain steps Implemented by means of an abstract Java class Visit methods that can be redefined by subclasses Template method eulerTour Recursively called on the left and right children A Result object with fields leftResult, rightResult and finalResult keeps track of the output of the recursive calls to eulerTour public abstract class EulerTour { protected BinaryTree tree; protected void visitExternal(Position p, Result r) { } protected void visitLeft(Position p, Result r) { } protected void visitBelow(Position p, Result r) { } protected void visitRight(Position p, Result r) { } protected Object eulerTour(Position p) { Result r = new Result(); if tree.isExternal(p) { visitExternal(p, r); } else { visitLeft(p, r); r.leftResult = eulerTour(tree.left(p)); visitBelow(p, r); r.rightResult = eulerTour(tree.right(p)); visitRight(p, r); return r.finalResult; } … Trees CSC311: Data Structures

Specializations of EulerTour
We show how to specialize class EulerTour to evaluate an arithmetic expression Assumptions External nodes store Integer objects Internal nodes store Operator objects supporting method operation (Integer, Integer) public class EvaluateExpression extends EulerTour { protected void visitExternal(Position p, Result r) { r.finalResult = (Integer) p.element(); } protected void visitRight(Position p, Result r) { Operator op = (Operator) p.element(); r.finalResult = op.operation( (Integer) r.leftResult, (Integer) r.rightResult ); } … } Trees CSC311: Data Structures

Linked Structure for Trees
A node is represented by an object storing Element Parent node Sequence of children nodes Node objects implement the Position ADT  B   A D F B A D F   C E C E Trees CSC311: Data Structures

Linked Structure for Binary Trees
A node is represented by an object storing Element Parent node Left child node Right child node Node objects implement the Position ADT  B   A D B A D     C E C E Trees CSC311: Data Structures

Array-Based Representation of Binary Trees
nodes are stored in an array 1 A H G F E D C B J … 2 3 let rank(node) be defined as follows: rank(root) = 1 if node is the left child of parent(node), rank(node) = 2*rank(parent(node)) if node is the right child of parent(node), rank(node) = 2*rank(parent(node))+1 4 5 6 7 10 11 Trees CSC311: Data Structures

Array-Based Representation of Binary Trees: Properties
Let n be the number of nodes of a binary tree T Let pM be the maximum value of rank(v) over all the nodes of T The array size N = pM+1 In the worst case, N = 2n Trees CSC311: Data Structures

Chapter 8: Priority Queues
Objectives: Priority Queue ADT Comparator design pattern Heaps Priority Queue Implementation with List and Heap Adaptable Priority Queues Sorting: Priority Queue-sort Selection-sort Insert-sort Heap-ort CSC311: Data Structures

Priority Queue ADT A priority queue stores a collection of entries
Each entry is a pair (key, value) Main methods of the Priority Queue ADT insert(k, x) inserts an entry with key k and value x removeMin() removes and returns the entry with smallest key Additional methods min() returns, but does not remove, an entry with smallest key size(), isEmpty() Applications: Standby flyers Auctions Stock market Priority Queues CSC311: Data Structures

Total Order Relations Keys in a priority queue can be arbitrary objects on which an order is defined Two distinct entries in a priority queue can have the same key Mathematical concept of total order relation  Reflexive property: x  x Antisymmetric property: x  y  y  x  x = y Transitive property: x  y  y  z  x  z Priority Queues CSC311: Data Structures

Entry ADT An entry in a priority queue is simply a key-value pair
Priority queues store entries to allow for efficient insertion and removal based on keys Methods: key(): returns the key for this entry value(): returns the value associated with this entry As a Java interface: /** * Interface for a key-value * pair entry **/ public interface Entry { public Object key(); public Object value(); } Priority Queues CSC311: Data Structures

Comparator ADT The primary method of the Comparator ADT:
A comparator encapsulates the action of comparing two objects according to a given total order relation A generic priority queue uses an auxiliary comparator The comparator is external to the keys being compared When the priority queue needs to compare two keys, it uses its comparator The primary method of the Comparator ADT: compare(x, y): Returns an integer i such that i < 0 if a < b, i = 0 if a = b, and i > 0 if a > b; an error occurs if a and b cannot be compared. Priority Queues CSC311: Data Structures

Example Comparator Lexicographic comparison of 2-D points:
/** Comparator for 2D points under the standard lexicographic order. */ public class Lexicographic implements Comparator { int xa, ya, xb, yb; public int compare(Object a, Object b) throws ClassCastException { xa = ((Point2D) a).getX(); ya = ((Point2D) a).getY(); xb = ((Point2D) b).getX(); yb = ((Point2D) b).getY(); if (xa != xb) return (xb - xa); else return (yb - ya); } Point objects: /** Class representing a point in the plane with integer coordinates */ public class Point2D { protected int xc, yc; // coordinates public Point2D(int x, int y) { xc = x; yc = y; } public int getX() { return xc; public int getY() { return yc; Priority Queues CSC311: Data Structures

Priority Queue Sorting
We can use a priority queue to sort a set of comparable elements Insert the elements one by one with a series of insert operations Remove the elements in sorted order with a series of removeMin operations The running time of this sorting method depends on the priority queue implementation Algorithm PQ-Sort(S, C) Input sequence S, comparator C for the elements of S Output sequence S sorted in increasing order according to C P  priority queue with comparator C while S.isEmpty () e  S.removeFirst () P.insert (e, 0) while P.isEmpty() e  P.removeMin().key() S.insertLast(e) Priority Queues CSC311: Data Structures

Sequence-based Priority Queue
Implementation with an unsorted list Performance: insert takes O(1) time since we can insert the item at the beginning or end of the sequence removeMin and min take O(n) time since we have to traverse the entire sequence to find the smallest key Implementation with a sorted list Performance: insert takes O(n) time since we have to find the place where to insert the item removeMin and min take O(1) time, since the smallest key is at the beginning 4 5 2 3 1 1 2 3 4 5 Priority Queues CSC311: Data Structures

Selection-Sort Selection-sort is the variation of PQ-sort where the priority queue is implemented with an unsorted sequence Running time of Selection-sort: Inserting the elements into the priority queue with n insert operations takes O(n) time Removing the elements in sorted order from the priority queue with n removeMin operations takes time proportional to …+ n Selection-sort runs in O(n2) time Priority Queues CSC311: Data Structures

Selection-Sort Example
Sequence S Priority Queue P Input: (7,4,8,2,5,3,9) () Phase 1 (a) (4,8,2,5,3,9) (7) (b) (8,2,5,3,9) (7,4) . . . (g) () (7,4,8,2,5,3,9) Phase 2 (a) (2) (7,4,8,5,3,9) (b) (2,3) (7,4,8,5,9) (c) (2,3,4) (7,8,5,9) (d) (2,3,4,5) (7,8,9) (e) (2,3,4,5,7) (8,9) (f) (2,3,4,5,7,8) (9) (g) (2,3,4,5,7,8,9) () Priority Queues CSC311: Data Structures

Insertion-Sort Insertion-sort is the variation of PQ-sort where the priority queue is implemented with a sorted sequence Running time of Insertion-sort: Inserting the elements into the priority queue with n insert operations takes time proportional to …+ n Removing the elements in sorted order from the priority queue with a series of n removeMin operations takes O(n) time Insertion-sort runs in O(n2) time Priority Queues CSC311: Data Structures

Insertion-Sort Example
Sequence S Priority queue P Input: (7,4,8,2,5,3,9) () Phase 1 (a) (4,8,2,5,3,9) (7) (b) (8,2,5,3,9) (4,7) (c) (2,5,3,9) (4,7,8) (d) (5,3,9) (2,4,7,8) (e) (3,9) (2,4,5,7,8) (f) (9) (2,3,4,5,7,8) (g) () (2,3,4,5,7,8,9) Phase 2 (a) (2) (3,4,5,7,8,9) (b) (2,3) (4,5,7,8,9) (g) (2,3,4,5,7,8,9) () Priority Queues CSC311: Data Structures

In-place Insertion-sort
Instead of using an external data structure, we can implement selection-sort and insertion-sort in-place A portion of the input sequence itself serves as the priority queue For in-place insertion-sort We keep sorted the initial portion of the sequence We can use swaps instead of modifying the sequence 5 4 2 3 1 Priority Queues CSC311: Data Structures

Heaps A heap is a binary tree storing keys at its nodes and satisfying the following properties: Heap-Order: for every internal node v other than the root, key(v)  key(parent(v)) Complete Binary Tree: let h be the height of the heap for i = 0, … , h - 1, there are 2i nodes of depth i at depth h - 1, the internal nodes are to the left of the external nodes The last node of a heap is the rightmost node of depth h 2 5 6 9 7 last node Priority Queues CSC311: Data Structures

Height of a Heap Theorem: A heap storing n keys has height O(log n)
Proof: (we apply the complete binary tree property) Let h be the height of a heap storing n keys Since there are 2i keys at depth i = 0, … , h - 1 and at least one key at depth h, we have n  … + 2h Thus, n  2h , i.e., h  log n depth keys 1 1 2 h-1 2h-1 h 1 Priority Queues CSC311: Data Structures

Heaps and Priority Queues
We can use a heap to implement a priority queue We store a (key, element) item at each internal node We keep track of the position of the last node For simplicity, we show only the keys in the pictures (2, Sue) (6, Mark) (5, Pat) (9, Jeff) (7, Anna) Priority Queues CSC311: Data Structures

Insertion into a Heap Method insertItem of the priority queue ADT corresponds to the insertion of a key k to the heap The insertion algorithm consists of three steps Find the insertion node z (the new last node) Store k at z Restore the heap-order property (discussed next) 2 5 6 z 9 7 insertion node 2 5 6 z 9 7 1 Priority Queues CSC311: Data Structures

Upheap After the insertion of a new key k, the heap-order property may be violated Algorithm upheap restores the heap-order property by swapping k along an upward path from the insertion node Upheap terminates when the key k reaches the root or a node whose parent has a key smaller than or equal to k Since a heap has height O(log n), upheap runs in O(log n) time 2 1 5 1 5 2 z z 9 7 6 9 7 6 Priority Queues CSC311: Data Structures

Removal from a Heap Method removeMin of the priority queue ADT corresponds to the removal of the root key from the heap The removal algorithm consists of three steps Replace the root key with the key of the last node w Remove w Restore the heap-order property (discussed next) 2 5 6 w 9 7 last node 7 5 6 w 9 new last node Priority Queues CSC311: Data Structures

Downheap After replacing the root key with the key k of the last node, the heap-order property may be violated Algorithm downheap restores the heap-order property by swapping key k along a downward path from the root Upheap terminates when key k reaches a leaf or a node whose children have keys greater than or equal to k Since a heap has height O(log n), downheap runs in O(log n) time 7 5 5 6 7 6 w w 9 9 Priority Queues CSC311: Data Structures

Updating the Last Node The insertion node can be found by traversing a path of O(log n) nodes Go up until a left child or the root is reached If a left child is reached, go to the right child Go down left until a leaf is reached Similar algorithm for updating the last node after a removal Priority Queues CSC311: Data Structures

Heap-Sort Consider a priority queue with n items implemented by means of a heap the space used is O(n) methods insert and removeMin take O(log n) time methods size, isEmpty, and min take time O(1) time Using a heap-based priority queue, we can sort a sequence of n elements in O(n log n) time The resulting algorithm is called heap-sort Heap-sort is much faster than quadratic sorting algorithms, such as insertion-sort and selection-sort Priority Queues CSC311: Data Structures

Vector-based Heap Implementation
We can represent a heap with n keys by means of a vector of length n + 1 For the node at rank i the left child is at rank 2i the right child is at rank 2i + 1 Links between nodes are not explicitly stored The cell of at rank 0 is not used Operation insert corresponds to inserting at rank n + 1 Operation removeMin corresponds to removing at rank 1 Yields in-place heap-sort 2 5 6 9 7 2 5 6 9 7 1 3 4 Priority Queues CSC311: Data Structures

Merging Two Heaps We are given two heaps and a key k
3 2 We are given two heaps and a key k We create a new heap with the root node storing k and with the two heaps as subtrees We perform downheap to restore the heap-order property 8 5 4 6 7 3 2 8 5 4 6 2 3 4 8 5 7 6 Priority Queues CSC311: Data Structures

Bottom-up Heap Construction
We can construct a heap storing n given keys in using a bottom-up construction with log n phases In phase i, pairs of heaps with 2i -1 keys are merged into heaps with 2i+1-1 keys 2i -1 2i+1-1 Priority Queues CSC311: Data Structures

Example 15 16 12 4 7 6 20 23 25 5 11 27 Priority Queues
CSC311: Data Structures

Example (contd.) 25 5 11 27 16 15 4 12 6 9 23 20 15 4 6 23 16 25 5 12 11 9 27 20 Priority Queues CSC311: Data Structures

Example (contd.) 7 8 15 4 6 23 16 25 5 12 11 9 27 20 4 6 15 5 8 23 16 25 7 12 11 9 27 20 Priority Queues CSC311: Data Structures

Example (end) 10 4 6 15 5 8 23 16 25 7 12 11 9 27 20 4 5 6 15 7 8 23 16 25 10 12 11 9 27 20 Priority Queues CSC311: Data Structures

Analysis We visualize the worst-case time of a downheap with a proxy path that goes first right and then repeatedly goes left until the bottom of the heap (this path may differ from the actual downheap path) Since each node is traversed by at most two proxy paths, the total number of nodes of the proxy paths is O(n) Thus, bottom-up heap construction runs in O(n) time Bottom-up heap construction is faster than n successive insertions and speeds up the first phase of heap-sort Priority Queues CSC311: Data Structures

Adaptable Priority Queues
3 a 5 g 4 e Adaptable Priority Queues CSC311: Data Structures

Motivating Example Suppose we have an online trading system where orders to purchase and sell a given stock are stored in two priority queues (one for sell orders and one for buy orders) as (p,s) entries: The key, p, of an order is the price The value, s, for an entry is the number of shares A buy order (p, s) is executed when a sell order (p’, s’) with price p’<p is added (the execution is complete if s’>s) A sell order (p, s) is executed when a buy order (p’, s’) with price p’>p is added (the execution is complete if s’>s) What if someone wishes to cancel their order before it executes? What if someone wishes to update the price or number of shares for their order? Priority Queues CSC311: Data Structures

Methods of the Adaptable Priority Queue ADT
remove(e): Remove from P and return entry e. replaceKey(e,k): Replace with k and return the key of entry e of P; an error condition occurs if k is invalid (that is, k cannot be compared with other keys). replaceValue(e,x): Replace with x and return the value of entry e of P. Priority Queues CSC311: Data Structures

Example Operation Output P insert(5,A) e1 (5,A)
insert(3,B) e2 (3,B),(5,A) insert(7,C) e3 (3,B),(5,A),(7,C) min() e2 (3,B),(5,A),(7,C) key(e2) 3 (3,B),(5,A),(7,C) remove(e1) e1 (3,B),(7,C) replaceKey(e2,9) 3 (7,C),(9,B) replaceValue(e3,D) C (7,D),(9,B) remove(e2) e2 (7,D) Priority Queues CSC311: Data Structures

Locating Entries In order to implement the operations remove(k), replaceKey(e), and replaceValue(k), we need fast ways of locating an entry e in a priority queue. We can always just search the entire data structure to find an entry e, but there are better ways for locating entries. Priority Queues CSC311: Data Structures

Location-Aware Entries
A locator-aware entry identifies and tracks the location of its (key, value) object within a data structure Intuitive notion: Coat claim check Valet claim ticket Reservation number Main idea: Since entries are created and returned from the data structure itself, it can return location-aware entries, thereby making future updates easier Priority Queues CSC311: Data Structures

List Implementation A location-aware list entry is an object storing
key value position (or rank) of the item in the list In turn, the position (or array cell) stores the entry Back pointers (or ranks) are updated during swaps trailer header nodes/positions 2 c 4 c 5 c 8 c entries Priority Queues CSC311: Data Structures

Heap Implementation A location-aware heap entry is an object storing
key value position of the entry in the underlying heap In turn, each heap position stores an entry Back pointers are updated during entry swaps 2 d 4 a 6 b 8 g 5 e 9 c Priority Queues CSC311: Data Structures

Performance Using location-aware entries we can achieve the following running times (times better than those achievable without location-aware entries are highlighted in red): Method Unsorted List Sorted List Heap size, isEmpty O(1) O(1) O(1) insert O(1) O(n) O(log n) min O(n) O(1) O(1) removeMin O(n) O(1) O(log n) remove O(1) O(1) O(log n) replaceKey O(1) O(n) O(log n) replaceValue O(1) O(1) O(1) Priority Queues CSC311: Data Structures

Chapter 9: Maps and Dictionaries
Objectives: Map ADT Hash tables Hash functions and hash code Compression functions and collisions Dictionary ADT List-based Dictionary Hash table Dictionary Ordered search tables and binary search Skip list CSC311: Data Structures

Maps A map models a searchable collection of key-value entries
The main operations of a map are for searching, inserting, and deleting items Multiple entries with the same key are not allowed Applications: address book student-record database Maps and Dictionaries CSC311: Data Structures

The Map ADT Map ADT methods: size(), isEmpty()
get(k): if the map M has an entry with key k, return its associated value; else, return null put(k, v): insert entry (k, v) into the map M; if key k is not already in M, then return null; else, return old value associated with k remove(k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null keys(): return an iterator of the keys in M values(): return an iterator of the values in M entries(): return an iterable collection containing all the key-value entries in M Maps and Dictionaries CSC311: Data Structures

Example Operation Output Map isEmpty() true Ø put(5,A) null (5,A)
put(7,B) null (5,A),(7,B) put(2,C) null (5,A),(7,B),(2,C) put(8,D) null (5,A),(7,B),(2,C),(8,D) put(2,E) C (5,A),(7,B),(2,E),(8,D) get(7) B (5,A),(7,B),(2,E),(8,D) get(4) null (5,A),(7,B),(2,E),(8,D) get(2) E (5,A),(7,B),(2,E),(8,D) size() 4 (5,A),(7,B),(2,E),(8,D) remove(5) A (7,B),(2,E),(8,D) remove(2) E (7,B),(8,D) get(2) null (7,B),(8,D) isEmpty() false (7,B),(8,D) Maps and Dictionaries CSC311: Data Structures

Comparison to java.util.Map
Map ADT Methods java.util.Map Methods size() size() isEmpty() isEmpty() get(k) get(k) put(k,v) put(k,v) remove(k) remove(k) keys() keySet() values() valueSet() entries() values() Maps and Dictionaries CSC311: Data Structures

A Simple List-Based Map
We can efficiently implement a map using an unsorted list We store the items of the map in a list S (based on a doubly-linked list), in arbitrary order trailer header nodes/positions entries 9 c 6 5 8 Maps and Dictionaries CSC311: Data Structures

The get(k) Algorithm Algorithm get(k): Input: A key k
Output: a value for key k in M, null if k is not in M B = S.positions() {B is an iterator of the positions in S} while B.hasNext() do p = B.next() fthe next position in Bg if p.element().key() = k then return p.element().value() return null {there is no entry with key equal to k} Maps and Dictionaries CSC311: Data Structures

The put(k,v) Algorithm Algorithm put(k,v):
Input: A key-value pair (k, v) Output: the old value for k in M, null if k is new B = S.positions() while B.hasNext() do p = B.next() if p.element().key() = k then t = p.element().value() B.replace(p,(k,v)) return t {return the old value} S.insertLast((k,v)) n = n + 1 {increment variable storing number of entries} return null {there was no previous entry with key equal to k} Maps and Dictionaries CSC311: Data Structures

The remove(k) Algorithm
Algorithm remove(k): Input: A key k Output: the removed value for k, null if k is not in M B =S.positions() while B.hasNext() do p = B.next() if p.element().key() = k then t = p.element().value() S.remove(p) n = n – 1 {decrement number of entries} return t {return the removed value} return null {there is no entry with key equal to k} Maps and Dictionaries CSC311: Data Structures

Performance of a List-Based Map
put, get and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key The unsorted list implementation is effective only for maps of small size Maps and Dictionaries CSC311: Data Structures

Hash Function and Hash Table
A hash function h maps keys of a given type to integers in a fixed interval [0, N - 1] Example: h(x) = x mod N is a hash function for integer keys The integer h(x) is called the hash value of key x A hash table for a given key type consists of Hash function h Array (called table) of size N When implementing a map with a hash table, the goal is to store item (k, o) at index i = h(k) Maps and Dictionaries CSC311: Data Structures

Example We design a hash table for a map storing entries as (SSN, Name), where SSN (social security number) is a nine-digit positive integer Our hash table uses an array of size N = 10,000 and the hash function h(x) = last four digits of x  1 2 3 4 9997 9998 9999 … Maps and Dictionaries CSC311: Data Structures

Hash Functions The hash code is applied first, and the compression function is applied next on the result, i.e., h(x) = h2(h1(x)) The goal of the hash function is to “disperse” the keys in an apparently random way A hash function is usually specified as the composition of two functions: Hash code: h1: keys  integers Compression function: h2: integers  [0, N - 1] Maps and Dictionaries CSC311: Data Structures

Hash Codes Memory address: Integer cast: Component sum:
We reinterpret the memory address of the key object as an integer (default hash code of all Java objects) Good in general, except for numeric and string keys Integer cast: We reinterpret the bits of the key as an integer Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., byte, short, int and float in Java) Component sum: We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows) Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double in Java) Maps and Dictionaries CSC311: Data Structures

Hash Codes Polynomial accumulation:
We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a0 a1 … an-1 We evaluate the polynomial p(z) = a0 + a1 z + a2 z2 + … … + an-1zn-1 at a fixed value z, ignoring overflows Especially suitable for strings (e.g., the choice z = 33 gives at most 6 collisions on a set of 50,000 English words) Polynomial p(z) can be evaluated in O(n) time using Horner’s rule: The following polynomials are successively computed, each from the previous one in O(1) time p0(z) = an-1 pi (z) = an-i-1 + zpi-1(z) (i = 1, 2, …, n -1) We have p(z) = pn-1(z) Maps and Dictionaries CSC311: Data Structures

Compression Functions
Division: h2 (y) = y mod N The size N of the hash table is usually chosen to be a prime The reason has to do with number theory and is beyond the scope of this course Multiply, Add and Divide (MAD): h2 (y) = (ay + b) mod N a and b are nonnegative integers such that a mod N  0 Otherwise, every integer would map to the same value b Maps and Dictionaries CSC311: Data Structures

Collision Handling Collisions occur when different elements are mapped to the same cell Separate Chaining: let each cell in the table point to a linked list of entries that map there  1 2 3 4 Separate chaining is simple, but requires additional memory outside the table Maps and Dictionaries CSC311: Data Structures

Map Methods with Separate Chaining used for Collisions
Delegate get and put methods to a list-based map at each cell: Algorithm get(k): Output: The value associated with the key k in the map, or null if there is no entry with key equal to k in the map return A[h(k)].get(k) {delegate the get to the list-based map at A[h(k)]} Algorithm put(k,v): Output: If there is an existing entry in our map with key equal to k, then we return its value (replacing it with v); otherwise, we return null t = A[h(k)].put(k,v) {delegate the put to the list-based map at A[h(k)]} if t = null then {k is a new key} n = n + 1 return t Maps and Dictionaries CSC311: Data Structures

Map Methods with Separate Chaining used for Collisions
Delegate the remove method to a list-based map at each cell: Algorithm remove(k): Output: The (removed) value associated with key k in the map, or null if there is no entry with key equal to k in the map t = A[h(k)].remove(k) {delegate the remove to the list-based map at A[h(k)]} if t ≠ null then {k was found} n = n - 1 return t Maps and Dictionaries CSC311: Data Structures

Linear Probing Example: h(x) = x mod 13
Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order Open addressing: the colliding item is placed in a different cell of the table Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell Each table cell inspected is referred to as a “probe” Colliding items lump together, causing future collisions to cause a longer sequence of probes 1 2 3 4 5 6 7 8 9 10 11 12 41 18 44 59 32 22 31 73 Maps and Dictionaries CSC311: Data Structures

Search with Linear Probing
Consider a hash table A that uses linear probing get(k) We start at cell h(k) We probe consecutive locations until one of the following occurs An item with key k is found, or An empty cell is found, or N cells have been unsuccessfully probed Algorithm get(k) i  h(k) p  0 repeat c  A[i] if c =  return null else if c.key () = k return c.element() else i  (i + 1) mod N p  p + 1 until p = N Maps and Dictionaries CSC311: Data Structures

Updates with Linear Probing
To handle insertions and deletions, we introduce a special object, called AVAILABLE, which replaces deleted elements remove(k) We search for an entry with key k If such an entry (k, o) is found, we replace it with the special item AVAILABLE and we return element o Else, we return null put(k, o) We throw an exception if the table is full We start at cell h(k) We probe consecutive cells until one of the following occurs A cell i is found that is either empty or stores AVAILABLE, or N cells have been unsuccessfully probed We store entry (k, o) in cell i Maps and Dictionaries CSC311: Data Structures

Double Hashing Double hashing uses a secondary hash function d(k) and handles collisions by placing an item in the first available cell of the series (i + jd(k)) mod N for j = 0, 1, … , N - 1 The secondary hash function d(k) cannot have zero values The table size N must be a prime to allow probing of all the cells Common choice of compression function for the secondary hash function: d2(k) = q - k mod q where q < N q is a prime The possible values for d2(k) are 1, 2, … , q Maps and Dictionaries CSC311: Data Structures

Example of Double Hashing
Consider a hash table storing integer keys that handles collision with double hashing N = 13 h(k) = k mod 13 d(k) = 7 - k mod 7 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 1 2 3 4 5 6 7 8 9 10 11 12 31 41 18 32 59 73 22 44 1 2 3 4 5 6 7 8 9 10 11 12 Maps and Dictionaries CSC311: Data Structures

Performance of Hashing
In the worst case, searches, insertions and removals on a hash table take O(n) time The worst case occurs when all the keys inserted into the map collide The load factor a = n/N affects the performance of a hash table Assuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is 1 / (1 - a) The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100% Applications of hash tables: small databases compilers browser caches Maps and Dictionaries CSC311: Data Structures

Dictionary ADT Dictionary ADT methods:
find(k): if the dictionary has an entry with key k, returns it, else, returns null findAll(k): returns an iterator of all entries with key k insert(k, o): inserts and returns the entry (k, o) remove(e): remove the entry e from the dictionary entries(): returns an iterator of the entries in the dictionary size(), isEmpty() The dictionary ADT models a searchable collection of key-element entries The main operations of a dictionary are searching, inserting, and deleting items Multiple items with the same key are allowed Applications: word-definition pairs credit card authorizations DNS mapping of host names (e.g., datastructures.net) to internet IP addresses (e.g., ) Maps and Dictionaries CSC311: Data Structures

Example Operation Output Dictionary insert(5,A) (5,A) (5,A)
insert(7,B) (7,B) (5,A),(7,B) insert(2,C) (2,C) (5,A),(7,B),(2,C) insert(8,D) (8,D) (5,A),(7,B),(2,C),(8,D) insert(2,E) (2,E) (5,A),(7,B),(2,C),(8,D),(2,E) find(7) (7,B) (5,A),(7,B),(2,C),(8,D),(2,E) find(4) null (5,A),(7,B),(2,C),(8,D),(2,E) find(2) (2,C) (5,A),(7,B),(2,C),(8,D),(2,E) findAll(2) (2,C),(2,E) (5,A),(7,B),(2,C),(8,D),(2,E) size() 5 (5,A),(7,B),(2,C),(8,D),(2,E) remove(find(5)) (5,A) (7,B),(2,C),(8,D),(2,E) find(5) null (7,B),(2,C),(8,D),(2,E) Maps and Dictionaries CSC311: Data Structures

A List-Based Dictionary
A log file or audit trail is a dictionary implemented by means of an unsorted sequence We store the items of the dictionary in a sequence (based on a doubly-linked list or array), in arbitrary order Performance: insert takes O(1) time since we can insert the new item at the beginning or at the end of the sequence find and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key The log file is effective only for dictionaries of small size or for dictionaries on which insertions are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation) Maps and Dictionaries CSC311: Data Structures

The findAll(k) Algorithm
Algorithm findAll(k): Input: A key k Output: An iterator of entries with key equal to k Create an initially-empty list L B = D.entries() while B.hasNext() do e = B.next() if e.key() = k then L.insertLast(e) return L.elements() Maps and Dictionaries CSC311: Data Structures

The insert and remove Methods
Algorithm insert(k,v): Input: A key k and value v Output: The entry (k,v) added to D Create a new entry e = (k,v) S.insertLast(e) {S is unordered} return e Algorithm remove(e): Input: An entry e Output: The removed entry e or null if e was not in D {We don’t assume here that e stores its location in S} B = S.positions() while B.hasNext() do p = B.next() if p.element() = e then S.remove(p) return null {there is no entry e in D} Maps and Dictionaries CSC311: Data Structures

Hash Table Implementation
We can also create a hash-table dictionary implementation. If we use separate chaining to handle collisions, then each operation can be delegated to a list-based dictionary stored at each hash table cell. Maps and Dictionaries CSC311: Data Structures

Binary Search Binary search performs operation find(k) on a dictionary implemented by means of an array-based sequence, sorted by key similar to the high-low game at each step, the number of candidate items is halved terminates after a logarithmic number of steps Example: find(7) 1 3 4 5 7 8 9 11 14 16 18 19 l m h 1 3 4 5 7 8 9 11 14 16 18 19 l m h 1 3 4 5 7 8 9 11 14 16 18 19 l m h 1 3 4 5 7 8 9 11 14 16 18 19 l=m =h Maps and Dictionaries CSC311: Data Structures

Search Table A search table is a dictionary implemented by means of a sorted array We store the items of the dictionary in an array-based sequence, sorted by key We use an external comparator for the keys Performance: find takes O(log n) time, using binary search insert takes O(n) time since in the worst case we have to shift n/2 items to make room for the new item remove takes O(n) time since in the worst case we have to shift n/2 items to compact the items after the removal A search table is effective only for dictionaries of small size or for dictionaries on which searches are the most common operations, while insertions and removals are rarely performed (e.g., credit card authorizations) Maps and Dictionaries CSC311: Data Structures

What is a Skip List A skip list for a set S of distinct (key, element) items is a series of lists S0, S1 , … , Sh such that Each list Si contains the special keys + and - List S0 contains the keys of S in nondecreasing order Each list is a subsequence of the previous one, i.e., S0  S1  …  Sh List Sh contains only the two special keys We show how to use a skip list to implement the dictionary ADT S3 + - S2 - 31 + S1 - 23 31 34 64 + S0 56 64 78 + 31 34 44 - 12 23 26 Maps and Dictionaries CSC311: Data Structures

Search S3 S2 S1 S0 We search for a key x in a a skip list as follows:
We start at the first position of the top list At the current position p, we compare x with y  key(next(p)) x = y: we return element(next(p)) x > y: we “scan forward” x < y: we “drop down” If we try to drop down past the bottom list, we return null Example: search for 78 S3 + - S2 - 31 + S1 - 23 31 34 64 + S0 - 12 23 26 31 34 44 56 64 78 + Maps and Dictionaries CSC311: Data Structures

Randomized Algorithms
A randomized algorithm performs coin tosses (i.e., uses random bits) to control its execution It contains statements of the type b  random() if b = 0 do A … else { b = 1} do B … Its running time depends on the outcomes of the coin tosses We analyze the expected running time of a randomized algorithm under the following assumptions the coins are unbiased, and the coin tosses are independent The worst-case running time of a randomized algorithm is often large but has very low probability (e.g., it occurs when all the coin tosses give “heads”) We use a randomized algorithm to insert items into a skip list Maps and Dictionaries CSC311: Data Structures

Insertion To insert an entry (x, o) into a skip list, we use a randomized algorithm: We repeatedly toss a coin until we get tails, and we denote with i the number of times the coin came up heads If i  h, we add to the skip list new lists Sh+1, … , Si +1, each containing only the two special keys We search for x in the skip list and find the positions p0, p1 , …, pi of the items with largest key less than x in each list S0, S1, … , Si For j  0, …, i, we insert item (x, o) into list Sj after position pj Example: insert key 15, with i = 2 S3 + - p2 S2 S2 - + + - 15 p1 S1 S1 - 23 + + - 23 15 p0 S0 S0 - 10 23 36 + + - 10 36 23 15 Maps and Dictionaries CSC311: Data Structures

Deletion To remove an entry with key x from a skip list, we proceed as follows: We search for x in the skip list and find the positions p0, p1 , …, pi of the items with key x, where position pj is in list Sj We remove positions p0, p1 , …, pi from the lists S0, S1, … , Si We remove all but one list containing only the two special keys Example: remove key 34 S3 - + p2 S2 S2 - 34 + - + p1 S1 S1 - 23 34 + - 23 + p0 S0 S0 - 12 23 34 45 + - 12 23 45 + Maps and Dictionaries CSC311: Data Structures

Implementation x quad-node
We can implement a skip list with quad-nodes A quad-node stores: entry link to the node prev link to the node next link to the node below link to the node above Also, we define special keys PLUS_INF and MINUS_INF, and we modify the key comparator to handle them quad-node x Maps and Dictionaries CSC311: Data Structures

Space Usage Consider a skip list with n entries
By Fact 1, we insert an entry in list Si with probability 1/2i By Fact 2, the expected size of list Si is n/2i The expected number of nodes used by the skip list is The space used by a skip list depends on the random bits used by each invocation of the insertion algorithm We use the following two basic probabilistic facts: Fact 1: The probability of getting i consecutive heads when flipping a coin is 1/2i Fact 2: If each of n entries is present in a set with probability p, the expected size of the set is np Thus, the expected space usage of a skip list with n items is O(n) Maps and Dictionaries CSC311: Data Structures

Height The running time of the search an insertion algorithms is affected by the height h of the skip list We show that with high probability, a skip list with n items has height O(log n) We use the following additional probabilistic fact: Fact 3: If each of n events has probability p, the probability that at least one event occurs is at most np Consider a skip list with n entires By Fact 1, we insert an entry in list Si with probability 1/2i By Fact 3, the probability that list Si has at least one item is at most n/2i By picking i = 3log n, we have that the probability that S3log n has at least one entry is at most n/23log n = n/n3 = 1/n2 Thus a skip list with n entries has height at most 3log n with probability at least /n2 Maps and Dictionaries CSC311: Data Structures

Search and Update Times
When we scan forward in a list, the destination key does not belong to a higher list A scan-forward step is associated with a former coin toss that gave tails By Fact 4, in each list the expected number of scan-forward steps is 2 Thus, the expected number of scan-forward steps is O(log n) We conclude that a search in a skip list takes O(log n) expected time The analysis of insertion and deletion gives similar results The search time in a skip list is proportional to the number of drop-down steps, plus the number of scan-forward steps The drop-down steps are bounded by the height of the skip list and thus are O(log n) with high probability To analyze the scan-forward steps, we use yet another probabilistic fact: Fact 4: The expected number of coin tosses required in order to get tails is 2 Maps and Dictionaries CSC311: Data Structures

Summary A skip list is a data structure for dictionaries that uses a randomized insertion algorithm In a skip list with n entries The expected space used is O(n) The expected search, insertion and deletion time is O(log n) Using a more complex probabilistic analysis, one can show that these performance bounds also hold with high probability Skip lists are fast and simple to implement in practice Maps and Dictionaries CSC311: Data Structures

Chapter 10: Search Trees Objectives:
Binary Search Trees: Search, update, and implementation AVL Trees: Properties and maintenance 2-4 Trees: Properties and maintenance Red-black Trees: Properties and equivalence to 2-4 Trees CSC311: Data Structures

Ordered Dictionaries Keys are assumed to come from a total order.
New operations: first(): first entry in the dictionary ordering last(): last entry in the dictionary ordering successors(k): iterator of entries with keys greater than or equal to k; increasing order predecessors(k): iterator of entries with keys less than or equal to k; decreasing order Search Trees CSC311: Data Structures

Binary Search Binary search can perform operation find(k) on a dictionary implemented by means of an array-based sequence, sorted by key similar to the high-low game at each step, the number of candidate items is halved terminates after O(log n) steps Example: find(7) 1 3 4 5 7 8 9 11 14 16 18 19 m l h l=m =h Search Trees CSC311: Data Structures

Search Tables A search table is a dictionary implemented by means of a sorted sequence We store the items of the dictionary in an array-based sequence, sorted by key We use an external comparator for the keys Performance: find takes O(log n) time, using binary search insert takes O(n) time since in the worst case we have to shift n/2 items to make room for the new item remove take O(n) time since in the worst case we have to shift n/2 items to compact the items after the removal The lookup table is effective only for dictionaries of small size or for dictionaries on which searches are the most common operations, while insertions and removals are rarely performed (e.g., credit card authorizations) Search Trees CSC311: Data Structures

Binary Search Trees A binary search tree is a binary tree storing keys (or key-value entries) at its internal nodes and satisfying the following property: Let u, v, and w be three nodes such that u is in the left subtree of v and w is in the right subtree of v. We have key(u)  key(v)  key(w) External nodes do not store items An inorder traversal of a binary search trees visits the keys in increasing order 6 9 2 4 1 8 Search Trees CSC311: Data Structures

Search To search for a key k, we trace a downward path starting at the root The next node visited depends on the outcome of the comparison of k with the key of the current node If we reach a leaf, the key is not found and we return nukk Example: find(4): Call TreeSearch(4,root) Algorithm TreeSearch(k, v) if T.isExternal (v) return v if k < key(v) return TreeSearch(k, T.left(v)) else if k = key(v) else { k > key(v) } return TreeSearch(k, T.right(v)) 6 9 2 4 1 8 < > = Search Trees CSC311: Data Structures

Insertion 6 9 2 4 1 8 5 < > w To perform operation inser(k, o), we search for key k (using TreeSearch) Assume k is not already in the tree, and let let w be the leaf reached by the search We insert k at node w and expand w into an internal node Example: insert 5 Search Trees CSC311: Data Structures

Deletion < To perform operation remove(k), we search for key k >
6 9 2 4 1 8 5 v w < > To perform operation remove(k), we search for key k Assume key k is in the tree, and let v be the node storing k If node v has a leaf child w, we remove v and w from the tree with operation removeExternal(w), which removes w and its parent Example: remove 4 Search Trees CSC311: Data Structures

Deletion (cont.) 3 1 8 6 9 5 v w z 2 We consider the case where the key k to be removed is stored at a node v whose children are both internal we find the internal node w that follows v in an inorder traversal we copy key(w) into node v we remove node w and its left child z (which must be a leaf) by means of operation removeExternal(z) Example: remove 3 Search Trees CSC311: Data Structures

Performance Consider a dictionary with n items implemented by means of a binary search tree of height h the space used is O(n) methods find, insert and remove take O(h) time The height h is O(n) in the worst case and O(log n) in the best case Search Trees CSC311: Data Structures

AVL Tree Definition AVL trees are balanced.
An AVL Tree is a binary search tree such that for every internal node v of T, the heights of the children of v can differ by at most 1. An example of an AVL tree where the heights are shown next to the nodes. Search Trees CSC311: Data Structures

Height of an AVL Tree Fact: The height of an AVL tree storing n keys is O(log n). Proof: Let us bound n(h): the minimum number of internal nodes of an AVL tree of height h. We easily see that n(1) = 1 and n(2) = 2 For n > 2, an AVL tree of height h contains the root node, one AVL subtree of height n-1 and another of height n-2. That is, n(h) = 1 + n(h-1) + n(h-2) Knowing n(h-1) > n(h-2), we get n(h) > 2n(h-2). So n(h) > 2n(h-2), n(h) > 4n(h-4), n(h) > 8n(n-6), … (by induction), n(h) > 2in(h-2i) Solving the base case we get: n(h) > 2 h/2-1 Taking logarithms: h < 2log n(h) +2 Thus the height of an AVL tree is O(log n) Search Trees CSC311: Data Structures

Insertion in an AVL Tree
Insertion is as in a binary search tree Always done by expanding an external node. Example: 44 17 78 32 50 88 48 62 44 17 78 32 50 88 48 62 54 c=z a=y b=x w before insertion after insertion Search Trees CSC311: Data Structures

Trinode Restructuring
let (a,b,c) be an inorder listing of x, y, z perform the rotations needed to make b the topmost node of the three case 2: double rotation (a right rotation about c, then a left rotation about a) (other two cases are symmetrical) c=y b=x a=z T0 T1 T2 T3 b=y a=z c=x T0 T1 T2 T3 b=y a=z c=x T0 T1 T2 T3 b=x c=y a=z T0 T1 T2 T3 case 1: single rotation (a left rotation about a) Search Trees CSC311: Data Structures

Insertion Example, continued
unbalanced... 88 44 17 78 32 50 48 62 2 1 3 54 T x y z ...balanced 4 Search Trees CSC311: Data Structures

Restructuring: Single Rotations
Search Trees CSC311: Data Structures

Restructuring: Double Rotations
Search Trees CSC311: Data Structures

Removal in an AVL Tree Removal begins as in a binary search tree, which means the node removed will become an empty external node. Its parent, w, may cause an imbalance. Example: 44 17 78 32 50 88 48 62 54 44 17 62 50 78 48 54 88 before deletion of 32 after deletion Search Trees CSC311: Data Structures

Rebalancing after Removal
Let z be the first unbalanced node encountered while travelling up the tree from w. Also, let y be the child of z with the larger height, and let x be the child of y with the larger height. We perform restructure(x) to restore balance at z. As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached 44 17 78 50 88 48 62 54 w c=x b=y a=z Search Trees CSC311: Data Structures

Running Times for AVL Trees
a single restructure is O(1) using a linked-structure binary tree find is O(log n) height of tree is O(log n), no restructures needed insert is O(log n) initial find is O(log n) Restructuring up the tree, maintaining heights is O(log n) remove is O(log n) Search Trees CSC311: Data Structures

Chapter 10: Search Trees Objectives:
Binary Search Trees: Search, update, and implementation AVL Trees: Properties and maintenance 2-4 Trees: Properties and maintenance Red-black Trees: Properties and equivalence to 2-4 Trees Fall 2006 CSC311: Data Structures

Multi-Way Search Tree A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores d -1 key-element items (ki, oi), where d is the number of children For a node with children v1 v2 … vd storing keys k1 k2 … kd-1 keys in the subtree of v1 are less than k1 keys in the subtree of vi are between ki-1 and ki (i = 2, …, d - 1) keys in the subtree of vd are greater than kd-1 The leaves store no items and serve as placeholders 15 30 Spring 2006 CSC311: Data Structures

Multi-Way Inorder Traversal
We can extend the notion of inorder traversal from binary trees to multi-way search trees Namely, we visit item (ki, oi) of node v between the recursive traversals of the subtrees of v rooted at children vi and vi + 1 An inorder traversal of a multi-way search tree visits the keys in increasing order 8 12 15 2 4 6 10 14 18 30 1 3 5 7 9 11 13 16 19 15 17 Spring 2006 CSC311: Data Structures

Multi-Way Searching Similar to search in a binary search tree
A each internal node with children v1 v2 … vd and keys k1 k2 … kd-1 k = ki (i = 1, …, d - 1): the search terminates successfully k < k1: we continue the search in child v1 ki-1 < k < ki (i = 2, …, d - 1): we continue the search in child vi k > kd-1: we continue the search in child vd Reaching an external node terminates the search unsuccessfully Example: search for 30 15 30 Spring 2006 CSC311: Data Structures

(2,4) Trees A (2,4) tree (also called 2-4 tree or tree) is a multi-way search with the following properties Node-Size Property: every internal node has at most four children Depth Property: all the external nodes have the same depth Depending on the number of children, an internal node of a (2,4) tree is called a 2-node, 3-node or 4-node 2 8 12 18 Spring 2006 CSC311: Data Structures

Height of a (2,4) Tree Theorem: A (2,4) tree storing n items has height O(log n) Proof: Let h be the height of a (2,4) tree with n items Since there are at least 2i items at depth i = 0, … , h - 1 and no items at depth h, we have n  … + 2h-1 = 2h - 1 Thus, h  log (n + 1) Searching in a (2,4) tree with n items takes O(log n) time depth items 1 1 2 h-1 2h-1 h Spring 2006 CSC311: Data Structures

Insertion We insert a new item (k, o) at the parent v of the leaf reached by searching for k We preserve the depth property but We may cause an overflow (i.e., node v may become a 5-node) Example: inserting key 30 causes an overflow 2 8 12 18 v Spring 2006 CSC311: Data Structures

Overflow and Split We handle an overflow at a 5-node v with a split operation: let v1 … v5 be the children of v and k1 … k4 be the keys of v node v is replaced nodes v' and v" v' is a 3-node with keys k1 k2 and children v1 v2 v3 v" is a 2-node with key k4 and children v4 v5 key k3 is inserted into the parent u of v (a new root may be created) The overflow may propagate to the parent node u u u v v' v" 12 18 12 18 27 30 35 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 Spring 2006 CSC311: Data Structures

Analysis of Insertion Let T be a (2,4) tree with n items
Tree T has O(log n) height Step 1 takes O(log n) time because we visit O(log n) nodes Step 2 takes O(1) time Step 3 takes O(log n) time because each split takes O(1) time and we perform O(log n) splits Thus, an insertion in a (2,4) tree takes O(log n) time Algorithm insert(k, o) 1. Search for key k to locate the insertion node v 2. Add the new entry (k, o) at node v 3. while overflow(v) if isRoot(v) create a new empty root above v v  split(v) Spring 2006 CSC311: Data Structures

Deletion We reduce deletion of an entry to the case where the item is at the node with leaf children Otherwise, we replace the entry with its inorder successor (or, equivalently, with its inorder predecessor) and delete the latter entry Example: to delete key 24, we replace it with 27 (inorder successor) 2 8 12 18 2 8 12 18 Spring 2006 CSC311: Data Structures

Underflow and Fusion u v v' w
Deleting an entry from a node v may cause an underflow, where node v becomes a 1-node with one child and no keys To handle an underflow at node v with parent u, we consider two cases Case 1: the adjacent siblings of v are 2-nodes Fusion operation: we merge v with an adjacent sibling w and move an entry from u to the merged node v' After a fusion, the underflow may propagate to the parent u 9 14 10 u v 9 10 14 v' w Spring 2006 CSC311: Data Structures

Underflow and Transfer
Case 2: an adjacent sibling w of v is a 3-node or a 4-node Transfer operation: 1. we move a child of w to v 2. we move an item from u to v 3. we move an item from w to u After a transfer, no underflow occurs 4 9 6 8 2 u v w 4 8 6 9 Spring 2006 CSC311: Data Structures

Analysis of Deletion Let T be a (2,4) tree with n items
Tree T has O(log n) height In a deletion operation We visit O(log n) nodes to locate the node from which to delete the entry We handle an underflow with a series of O(log n) fusions, followed by at most one transfer Each fusion and transfer takes O(1) time Thus, deleting an item from a (2,4) tree takes O(log n) time Spring 2006 CSC311: Data Structures

Implementing a Dictionary
Comparison of efficient dictionary implementations Search Insert Delete Notes Hash Table 1 expected no ordered dictionary methods simple to implement Skip List log n high prob. randomized insertion (2,4) Tree log n worst-case complex to implement Spring 2006 CSC311: Data Structures

Red-Black Trees A red-black tree can also be defined as a binary search tree that satisfies the following properties: Root Property: the root is black External Property: every leaf is black Internal Property: the children of a red node are black Depth Property: all the leaves have the same black depth 9 4 15 2 6 12 21 7 Spring 2006 CSC311: Data Structures

From (2,4) to Red-Black Trees
A red-black tree is a representation of a (2,4) tree by means of a binary tree whose nodes are colored red or black In comparison with its associated (2,4) tree, a red-black tree has same logarithmic time performance simpler implementation with a single node type 4 3 5 4 5 3 6 OR 3 5 2 7 Spring 2006 CSC311: Data Structures

Height of a Red-Black Tree
Theorem: A red-black tree storing n entries has height O(log n) Proof: The height of a red-black tree is at most twice the height of its associated (2,4) tree, which is O(log n) The search algorithm for a binary search tree is the same as that for a binary search tree By the above theorem, searching in a red-black tree takes O(log n) time Spring 2006 CSC311: Data Structures

Insertion To perform operation insert(k, o), we execute the insertion algorithm for binary search trees and color red the newly inserted node z unless it is the root We preserve the root, external, and depth properties If the parent v of z is black, we also preserve the internal property and we are done Else (v is red ) we have a double red (i.e., a violation of the internal property), which requires a reorganization of the tree Example where the insertion of 4 causes a double red: 6 6 v v 3 8 3 8 z z 4 Spring 2006 CSC311: Data Structures

Remedying a Double Red Consider a double red with child z and parent v, and let w be the sibling of v Case 1: w is black The double red is an incorrect replacement of a 4-node Restructuring: we change the 4-node replacement Case 2: w is red The double red corresponds to an overflow Recoloring: we perform the equivalent of a split 4 4 w v w v 2 7 2 7 z z 6 6 Spring 2006 CSC311: Data Structures

Restructuring A restructuring remedies a child-parent double red when the parent red node has a black sibling It is equivalent to restoring the correct replacement of a 4-node The internal property is restored and the other properties are preserved z 4 6 w v v 2 7 4 7 z w 6 2 Spring 2006 CSC311: Data Structures

Restructuring (cont.) There are four restructuring configurations depending on whether the double red nodes are left or right children 6 4 2 6 2 4 2 6 4 2 6 4 4 2 6 Spring 2006 CSC311: Data Structures

Recoloring A recoloring remedies a child-parent double red when the parent red node has a red sibling The parent v and its sibling w become black and the grandparent u becomes red, unless it is the root It is equivalent to performing a split on a 5-node The double red violation may propagate to the grandparent u 4 4 w v w v 2 7 2 7 z z 6 6 … 4 … 2 6 7 Spring 2006 CSC311: Data Structures

Analysis of Insertion Recall that a red-black tree has O(log n) height
Step 1 takes O(log n) time because we visit O(log n) nodes Step 2 takes O(1) time Step 3 takes O(log n) time because we perform O(log n) recolorings, each taking O(1) time, and at most one restructuring taking O(1) time Thus, an insertion in a red-black tree takes O(log n) time Algorithm insert(k, o) 1. We search for key k to locate the insertion node z 2. We add the new entry (k, o) at node z and color z red 3. while doubleRed(z) if isBlack(sibling(parent(z))) z  restructure(z) return else { sibling(parent(z) is red } z  recolor(z) Spring 2006 CSC311: Data Structures

Deletion To perform operation remove(k), we first execute the deletion algorithm for binary search trees Let v be the internal node removed, w the external node removed, and r the sibling of w If either v of r was red, we color r black and we are done Else (v and r were both black) we color r double black, which is a violation of the internal property requiring a reorganization of the tree Example where the deletion of 8 causes a double black: 6 6 v r 3 8 3 r w 4 4 Spring 2006 CSC311: Data Structures

Remedying a Double Black
The algorithm for remedying a double black node w with sibling y considers three cases Case 1: y is black and has a red child We perform a restructuring, equivalent to a transfer , and we are done Case 2: y is black and its children are both black We perform a recoloring, equivalent to a fusion, which may propagate up the double black violation Case 3: y is red We perform an adjustment, equivalent to choosing a different representation of a 3-node, after which either Case 1 or Case 2 applies Deletion in a red-black tree takes O(log n) time Spring 2006 CSC311: Data Structures

Red-Black Tree Reorganization
Insertion remedy double red Red-black tree action (2,4) tree action result restructuring change of 4-node representation double red removed recoloring split double red removed or propagated up Deletion remedy double black Red-black tree action (2,4) tree action result restructuring transfer double black removed recoloring fusion double black removed or propagated up adjustment change of 3-node representation restructuring or recoloring follows Spring 2006 CSC311: Data Structures

Chapter 13: Graphs I Objectives: Graph ADT: Operations
Graph Implementation: Data structures Graph Traversals: DFS and BFS Directed graphs Weighted graphs Shortest paths Minimum Spanning Trees (MST) CSC311: Data Structures

Graphs PVD ORD SFO LGA HNL LAX DFW MIA A graph is a pair (V, E), where
V is a set of nodes, called vertices E is a collection of pairs of vertices, called edges Vertices and edges are positions and store elements Example: A vertex represents an airport and stores the three-letter airport code An edge represents a flight route between two airports and stores the mileage of the route 849 PVD 1843 ORD 142 SFO 802 LGA 1743 337 1387 HNL 2555 1099 LAX 1233 DFW 1120 MIA Graphs I CSC311: Data Structures

Edge Types flight AA 1206 ORD PVD 849 miles ORD PVD Directed edge
ordered pair of vertices (u,v) first vertex u is the origin second vertex v is the destination e.g., a flight Undirected edge unordered pair of vertices (u,v) e.g., a flight route Directed graph all the edges are directed e.g., route network Undirected graph all the edges are undirected e.g., flight network flight AA 1206 ORD PVD 849 miles ORD PVD Graphs I CSC311: Data Structures

Applications Electronic circuits Transportation networks
Printed circuit board Integrated circuit Transportation networks Highway network Flight network Computer networks Local area network Internet Web Databases Entity-relationship diagram Graphs I CSC311: Data Structures

Terminology X U V W Z Y a c b e d f g h i j
End vertices (or endpoints) of an edge U and V are the endpoints of a Edges incident on a vertex a, d, and b are incident on V Adjacent vertices U and V are adjacent Degree of a vertex X has degree 5 Parallel edges h and i are parallel edges Self-loop j is a self-loop X U V W Z Y a c b e d f g h i j Graphs I CSC311: Data Structures

Terminology (cont.) V a b P1 d U X Z P2 h c e W g f Y Path Simple path
sequence of alternating vertices and edges begins with a vertex ends with a vertex each edge is preceded and followed by its endpoints Simple path path such that all its vertices and edges are distinct Examples P1=(V,b,X,h,Z) is a simple path P2=(U,c,W,e,X,g,Y,f,W,d,V) is a path that is not simple V a b P1 d U X Z P2 h c e W g f Y Graphs I CSC311: Data Structures

Terminology (cont.) V a b d U X Z C2 h e C1 c W g f Y Cycle
circular sequence of alternating vertices and edges each edge is preceded and followed by its endpoints Simple cycle cycle such that all its vertices and edges are distinct Examples C1=(V,b,X,g,Y,f,W,c,U,a,) is a simple cycle C2=(U,c,W,e,X,g,Y,f,W,d,V,a,) is a cycle that is not simple V a b d U X Z C2 h e C1 c W g f Y Graphs I CSC311: Data Structures

Properties Sv deg(v) = 2m Property 1 Notation Property 2
Proof: each edge is counted twice Property 2 In an undirected graph with no self-loops and no multiple edges m  n (n - 1)/2 Proof: each vertex has degree at most (n - 1) What is the bound for a directed graph? Notation n number of vertices m number of edges deg(v) degree of vertex v Example n = 4 m = 6 deg(v) = 3 Graphs I CSC311: Data Structures

Main Methods of the Graph ADT
Vertices and edges are positions store elements Accessor methods endVertices(e): an array of the two endvertices of e opposite(v, e): the vertex opposite of v on e areAdjacent(v, w): true iff v and w are adjacent replace(v, x): replace element at vertex v with x replace(e, x): replace element at edge e with x Update methods insertVertex(o): insert a vertex storing element o insertEdge(v, w, o): insert an edge (v,w) storing element o removeVertex(v): remove vertex v (and its incident edges) removeEdge(e): remove edge e Iterator methods incidentEdges(v): edges incident to v vertices(): all vertices in the graph edges(): all edges in the graph Graphs I CSC311: Data Structures

Edge List Structure Vertex object Edge object Vertex sequence
element reference to position in vertex sequence Edge object origin vertex object destination vertex object reference to position in edge sequence Vertex sequence sequence of vertex objects Edge sequence sequence of edge objects u a c b d v w z u v w z a b c d Graphs I CSC311: Data Structures

Adjacency List Structure
v b Edge list structure Incidence sequence for each vertex sequence of references to edge objects of incident edges Augmented edge objects references to associated positions in incidence sequences of end vertices u w u v w a b Graphs I CSC311: Data Structures

Adjacency Matrix Structure
v b Edge list structure Augmented vertex objects Integer key (index) associated with vertex 2D-array adjacency array Reference to edge object for adjacent vertices Null for non nonadjacent vertices The “old fashioned” version just has 0 for no edge and 1 for edge u w u 1 v 2 w 1 2  a b Graphs I CSC311: Data Structures

Asymptotic Performance
n vertices, m edges no parallel edges no self-loops Bounds are “big-Oh” Edge List Adjacency List Adjacency Matrix Space n + m n2 incidentEdges(v) m deg(v) n areAdjacent (v, w) min(deg(v), deg(w)) 1 insertVertex(o) insertEdge(v, w, o) removeVertex(v) removeEdge(e) Graphs I CSC311: Data Structures

Subgraphs A subgraph S of a graph G is a graph such that
The vertices of S are a subset of the vertices of G The edges of S are a subset of the edges of G A spanning subgraph of G is a subgraph that contains all the vertices of G Subgraph Spanning subgraph Graphs I CSC311: Data Structures

Non connected graph with two connected components
Connectivity A graph is connected if there is a path between every pair of vertices A connected component of a graph G is a maximal connected subgraph of G Connected graph Non connected graph with two connected components Graphs I CSC311: Data Structures

Trees and Forests A (free) tree is an undirected graph T such that
T is connected T has no cycles This definition of tree is different from the one of a rooted tree A forest is an undirected graph without cycles The connected components of a forest are trees Tree Forest Graphs I CSC311: Data Structures

Spanning Trees and Forests
A spanning tree of a connected graph is a spanning subgraph that is a tree A spanning tree is not unique unless the graph is a tree Spanning trees have applications to the design of communication networks A spanning forest of a graph is a spanning subgraph that is a forest Graph Spanning tree Graphs I CSC311: Data Structures

Depth-First Search Depth-first search (DFS) is a general technique for traversing a graph A DFS traversal of a graph G Visits all the vertices and edges of G Determines whether G is connected Computes the connected components of G Computes a spanning forest of G DFS on a graph with n vertices and m edges takes O(n + m ) time DFS can be further extended to solve other graph problems Find and report a path between two given vertices Find a cycle in the graph Depth-first search is to graphs while Euler tour is to binary trees Graphs I CSC311: Data Structures

DFS Algorithm The algorithm uses a mechanism for setting and getting “labels” of vertices and edges Algorithm DFS(G, v) Input graph G and a start vertex v of G Output labeling of the edges of G in the connected component of v as discovery edges and back edges setLabel(v, VISITED) for all e  G.incidentEdges(v) if getLabel(e) = UNEXPLORED w  opposite(v,e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) DFS(G, w) else setLabel(e, BACK) Algorithm DFS(G) Input graph G Output labeling of the edges of G as discovery edges and back edges for all u  G.vertices() setLabel(u, UNEXPLORED) for all e  G.edges() setLabel(e, UNEXPLORED) for all v  G.vertices() if getLabel(v) = UNEXPLORED DFS(G, v) Graphs I CSC311: Data Structures

Example unexplored vertex visited vertex unexplored edge
B A C E A visited vertex unexplored edge discovery edge back edge D B A C E D B A C E Graphs I CSC311: Data Structures

Example (cont.) D B A C E D B A C E D B A C E D B A C E Graphs I
CSC311: Data Structures

Properties of DFS Property 1 Property 2
DFS(G, v) visits all the vertices and edges in the connected component of v Property 2 The discovery edges labeled by DFS(G, v) form a spanning tree of the connected component of v D B A C E Graphs I CSC311: Data Structures

Analysis of DFS Setting/getting a vertex/edge label takes O(1) time
Each vertex is labeled twice once as UNEXPLORED once as VISITED Each edge is labeled twice once as DISCOVERY or BACK Method incidentEdges is called once for each vertex DFS runs in O(n + m) time provided the graph is represented by the adjacency list structure Recall that Sv deg(v) = 2m Graphs I CSC311: Data Structures

Path Finding We can specialize the DFS algorithm to find a path between two given vertices u and z using the template method pattern We call DFS(G, u) with u as the start vertex We use a stack S to keep track of the path between the start vertex and the current vertex As soon as destination vertex z is encountered, we return the path as the contents of the stack Algorithm pathDFS(G, v, z) setLabel(v, VISITED) S.push(v) if v = z return S.elements() for all e  G.incidentEdges(v) if getLabel(e) = UNEXPLORED w  opposite(v,e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) S.push(e) pathDFS(G, w, z) S.pop(e) else setLabel(e, BACK) S.pop(v) Graphs I CSC311: Data Structures

Cycle Finding Algorithm cycleDFS(G, v, z) setLabel(v, VISITED) S.push(v) for all e  G.incidentEdges(v) if getLabel(e) = UNEXPLORED w  opposite(v,e) S.push(e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) cycleDFS(G, w, z) S.pop(e) else T  new empty stack repeat o  S.pop() T.push(o) until o = w return T.elements() S.pop(v) We can specialize the DFS algorithm to find a simple cycle using the template method pattern We use a stack S to keep track of the path between the start vertex and the current vertex As soon as a back edge (v, w) is encountered, we return the cycle as the portion of the stack from the top to vertex w Graphs I CSC311: Data Structures

Breadth-First Search Breadth-first search (BFS) is a general technique for traversing a graph A BFS traversal of a graph G Visits all the vertices and edges of G Determines whether G is connected Computes the connected components of G Computes a spanning forest of G BFS on a graph with n vertices and m edges takes O(n + m ) time BFS can be further extended to solve other graph problems Find and report a path with the minimum number of edges between two given vertices Find a simple cycle, if there is one Graphs I CSC311: Data Structures

BFS Algorithm The algorithm uses a mechanism for setting and getting “labels” of vertices and edges Algorithm BFS(G, s) L0  new empty sequence L0.insertLast(s) setLabel(s, VISITED) i  0 while Li.isEmpty() Li +1  new empty sequence for all v  Li.elements() for all e  G.incidentEdges(v) if getLabel(e) = UNEXPLORED w  opposite(v,e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) setLabel(w, VISITED) Li +1.insertLast(w) else setLabel(e, CROSS) i  i +1 Algorithm BFS(G) Input graph G Output labeling of the edges and partition of the vertices of G for all u  G.vertices() setLabel(u, UNEXPLORED) for all e  G.edges() setLabel(e, UNEXPLORED) for all v  G.vertices() if getLabel(v) = UNEXPLORED BFS(G, v) Graphs I CSC311: Data Structures

Example unexplored vertex visited vertex unexplored edge
C B A E D L0 L1 F A unexplored vertex A visited vertex unexplored edge discovery edge cross edge L0 L0 A A L1 L1 B C D B C D E F E F Graphs I CSC311: Data Structures

Example (cont.) L0 L1 L0 L1 L2 L0 L1 L2 L0 L1 L2 C B A E D F C B A E D
Graphs I CSC311: Data Structures

Example (cont.) L0 L1 L2 L0 L1 L2 L0 L1 L2 C B A E D F A B C D E F C B

Properties Notation Property 1 Property 2 Property 3
Gs: connected component of s Property 1 BFS(G, s) visits all the vertices and edges of Gs Property 2 The discovery edges labeled by BFS(G, s) form a spanning tree Ts of Gs Property 3 For each vertex v in Li The path of Ts from s to v has i edges Every path from s to v in Gs has at least i edges A B C D E F L0 A L1 B C D L2 E F Graphs I CSC311: Data Structures

Analysis Setting/getting a vertex/edge label takes O(1) time
Each vertex is labeled twice once as UNEXPLORED once as VISITED Each edge is labeled twice once as DISCOVERY or CROSS Each vertex is inserted once into a sequence Li Method incidentEdges is called once for each vertex BFS runs in O(n + m) time provided the graph is represented by the adjacency list structure Recall that Sv deg(v) = 2m Graphs I CSC311: Data Structures

Applications Using the template method pattern, we can specialize the BFS traversal of a graph G to solve the following problems in O(n + m) time Compute the connected components of G Compute a spanning forest of G Find a simple cycle in G, or report that G is a forest Given two vertices of G, find a path in G between them with the minimum number of edges, or report that no such path exists Graphs I CSC311: Data Structures

DFS vs. BFS Applications DFS BFS DFS BFS
Spanning forest, connected components, paths, cycles  Shortest paths Biconnected components C B A E D L0 L1 F L2 A B C D E F DFS BFS Graphs I CSC311: Data Structures

DFS vs. BFS (cont.) Back edge (v,w) Cross edge (v,w) DFS BFS
w is an ancestor of v in the tree of discovery edges Cross edge (v,w) w is in the same level as v or in the next level in the tree of discovery edges C B A E D L0 L1 F L2 A B C D E F DFS BFS Graphs I CSC311: Data Structures

Digraphs A digraph is a graph whose edges are all directed
Short for “directed graph” Applications one-way streets flights task scheduling A C E B D Graphs I CSC311: Data Structures

Digraph Properties E A graph G=(V,E) such that D
B D A graph G=(V,E) such that Each edge goes in one direction: Edge (a,b) goes from a to b, but not b to a. If G is simple, m < n*(n-1). If we keep in-edges and out-edges in separate adjacency lists, we can perform listing of in-edges and out-edges in time proportional to their size. Graphs I CSC311: Data Structures

Digraph Application Scheduling: edge (a,b) means task a must be completed before b can be started ics21 ics22 ics23 ics51 ics53 ics52 ics161 ics131 ics141 ics121 ics171 The good life ics151 Graphs I CSC311: Data Structures

Directed DFS We can specialize the traversal algorithms (DFS and BFS) to digraphs by traversing edges only along their direction In the directed DFS algorithm, we have four types of edges discovery edges back edges forward edges cross edges A directed DFS starting a vertex s determines the vertices reachable from s E D C B A Graphs I CSC311: Data Structures

Reachability DFS tree rooted at v: vertices reachable from v via directed paths E D C E D A C F E D A B C F A B Graphs I CSC311: Data Structures

Strong Connectivity Each vertex can reach all other vertices a g c d e
b e f g Graphs I CSC311: Data Structures

Strong Connectivity Algorithm
Pick a vertex v in G. Perform a DFS from v in G. If there’s a w not visited, print “no”. Let G’ be G with edges reversed. Perform a DFS from v in G’. Else, print “yes”. Running time: O(n+m). a G: g c d e b f a G’: g c d e b f Graphs I CSC311: Data Structures

Strongly Connected Components
Maximal subgraphs such that each vertex can reach all other vertices in the subgraph Can also be done in O(n+m) time using DFS, but is more complicated (similar to biconnectivity). a d c b e f g { a , c , g } { f , d , e , b } Graphs I CSC311: Data Structures

Transitive Closure D E Given a digraph G, the transitive closure of G is the digraph G* such that G* has the same vertices as G if G has a directed path from u to v (u  v), G* has a directed edge from u to v The transitive closure provides reachability information about a digraph B G C A D E B C A G* Graphs I CSC311: Data Structures

Computing the Transitive Closure
We can perform DFS starting at each vertex O(n(n+m)) If there's a way to get from A to B and from B to C, then there's a way to get from A to C. Alternatively ... Use dynamic programming: The Floyd-Warshall Algorithm Graphs I CSC311: Data Structures

Floyd-Warshall Transitive Closure
Idea #1: Number the vertices 1, 2, …, n. Idea #2: Consider paths that use only vertices numbered 1, 2, …, k, as intermediate vertices: Uses only vertices numbered 1,…,k (add this edge if it’s not already in) i j Uses only vertices numbered 1,…,k-1 Uses only vertices numbered 1,…,k-1 k Graphs I CSC311: Data Structures

Topological Sorting Number vertices, so that (u,v) in E implies u < v 1 A typical student day wake up 2 3 eat study computer sci. 4 5 nap more c.s. 7 play 8 write c.s. program 6 9 work out make cookies for professors 10 sleep 11 dream about graphs Graphs I CSC311: Data Structures

Algorithm for Topological Sorting
Note: This algorithm is different than the one in Goodrich-Tamassia Running time: O(n + m). How…? Method TopologicalSort(G) H  G // Temporary copy of G n  G.numVertices() while H is not empty do Let v be a vertex with no outgoing edges Label v  n n  n - 1 Remove v from H Graphs I CSC311: Data Structures

Topological Sorting Algorithm using DFS
Simulate the algorithm by using depth-first search O(n+m) time. Algorithm topologicalDFS(G, v) Input graph G and a start vertex v of G Output labeling of the vertices of G in the connected component of v setLabel(v, VISITED) for all e  G.incidentEdges(v) if getLabel(e) = UNEXPLORED w  opposite(v,e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) topologicalDFS(G, w) else {e is a forward or cross edge} Label v with topological number n n  n - 1 Algorithm topologicalDFS(G) Input dag G Output topological ordering of G n  G.numVertices() for all u  G.vertices() setLabel(u, UNEXPLORED) for all e  G.edges() setLabel(e, UNEXPLORED) for all v  G.vertices() if getLabel(v) = UNEXPLORED topologicalDFS(G, v) Graphs I CSC311: Data Structures

Topological Sorting Example

9 Graphs I CSC311: Data Structures

8 9 Graphs I CSC311: Data Structures

7 8 9 Graphs I CSC311: Data Structures

6 7 8 9 Graphs I CSC311: Data Structures

6 5 7 8 9 Graphs I CSC311: Data Structures

4 6 5 7 8 9 Graphs I CSC311: Data Structures

3 4 6 5 7 8 9 Graphs I CSC311: Data Structures

2 3 4 6 5 7 8 9 Graphs I CSC311: Data Structures

2 1 3 4 6 5 7 8 9 Graphs I CSC311: Data Structures

Chapter 13: Graphs II Objectives: Graph ADT: Operations
Graph Implementation: Data structures Graph Traversals: DFS and BFS Directed graphs Weighted graphs Shortest paths Minimum Spanning Trees (MST) CSC311: Data Structures

Weighted Graphs PVD ORD SFO LGA HNL LAX DFW MIA
In a weighted graph, each edge has an associated numerical value, called the weight of the edge Edge weights may represent distances, costs, etc. Example: In a flight route graph, the weight of an edge represents the distance in miles between the endpoint airports 849 PVD 1843 ORD 142 SFO 802 LGA 1743 1205 337 1387 HNL 2555 1099 LAX 1233 DFW 1120 MIA Graphs II CSC311: Data Structures

Shortest Paths PVD ORD SFO LGA HNL LAX DFW MIA
Given a weighted graph and two vertices u and v, we want to find a path of minimum total weight between u and v. Length of a path is the sum of the weights of its edges. Example: Shortest path between Providence and Honolulu Applications Internet packet routing Flight reservations Driving directions 849 PVD 1843 ORD 142 SFO 802 LGA 1743 1205 337 1387 HNL 2555 1099 LAX 1233 DFW 1120 MIA Graphs II CSC311: Data Structures

Shortest Path Properties
Property 1: A subpath of a shortest path is itself a shortest path Property 2: There is a tree of shortest paths from a start vertex to all the other vertices Example: Tree of shortest paths from Providence 849 PVD 1843 ORD 142 SFO 802 LGA 1743 1205 337 1387 HNL 2555 1099 LAX 1233 DFW 1120 MIA Graphs II CSC311: Data Structures

Dijkstra’s Algorithm We grow a “cloud” of vertices, beginning with s and eventually covering all the vertices We store with each vertex v a label d(v) representing the distance of v from s in the subgraph consisting of the cloud and its adjacent vertices At each step We add to the cloud the vertex u outside the cloud with the smallest distance label, d(u) We update the labels of the vertices adjacent to u The distance of a vertex v from a vertex s is the length of a shortest path between s and v Dijkstra’s algorithm computes the distances of all the vertices from a given start vertex s Assumptions: the graph is connected the edges are undirected the edge weights are nonnegative Graphs II CSC311: Data Structures

Edge Relaxation Consider an edge e = (u,z) such that d(z) = 75
u is the vertex most recently added to the cloud z is not in the cloud The relaxation of edge e updates distance d(z) as follows: d(z)  min{d(z),d(u) + weight(e)} d(u) = 50 10 d(z) = 75 e u s z d(u) = 50 10 d(z) = 60 u e s z Graphs II CSC311: Data Structures

Example C B A E D F 3 2 8 5 4 7 1 9 A 4 8 2 8 2 4 7 1 B C D 3 9   2 5 E F A 4 A 4 8 8 2 2 8 2 3 7 2 3 7 1 7 1 B C D B C D 3 9 3 9 5 11 5 8 2 5 2 5 E F E F Graphs II CSC311: Data Structures

Example (cont.) A 4 8 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F A 4 8 2 7 2 3
A 4 8 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F A 4 8 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F Graphs II CSC311: Data Structures

Dijkstra’s Algorithm Algorithm DijkstraDistances(G, s) Q  new heap-based priority queue for all v  G.vertices() if v = s setDistance(v, 0) else setDistance(v, ) l  Q.insert(getDistance(v), v) setLocator(v,l) while Q.isEmpty() u  Q.removeMin() for all e  G.incidentEdges(u) { relax edge e } z  G.opposite(u,e) r  getDistance(u) + weight(e) if r < getDistance(z) setDistance(z,r) Q.replaceKey(getLocator(z),r) A priority queue stores the vertices outside the cloud Key: distance Element: vertex Locator-based methods insert(k,e) returns a locator replaceKey(l,k) changes the key of an item We store two labels with each vertex: Distance (d(v) label) locator in priority queue Graphs II CSC311: Data Structures

Analysis of Dijkstra’s Algorithm
Graph operations Method incidentEdges is called once for each vertex Label operations We set/get the distance and locator labels of vertex z O(deg(z)) times Setting/getting a label takes O(1) time Priority queue operations Each vertex is inserted once into and removed once from the priority queue, where each insertion or removal takes O(log n) time The key of a vertex in the priority queue is modified at most deg(w) times, where each key change takes O(log n) time Dijkstra’s algorithm runs in O((n + m) log n) time provided the graph is represented by the adjacency list structure Recall that Sv deg(v) = 2m The running time can also be expressed as O(m log n) since the graph is connected Graphs II CSC311: Data Structures

Shortest Paths Tree Using the template method pattern, we can extend Dijkstra’s algorithm to return a tree of shortest paths from the start vertex to all other vertices We store with each vertex a third label: parent edge in the shortest path tree In the edge relaxation step, we update the parent label Algorithm DijkstraShortestPathsTree(G, s) … for all v  G.vertices() setParent(v, ) for all e  G.incidentEdges(u) { relax edge e } z  G.opposite(u,e) r  getDistance(u) + weight(e) if r < getDistance(z) setDistance(z,r) setParent(z,e) Q.replaceKey(getLocator(z),r) Graphs II CSC311: Data Structures

Why Dijkstra’s Algorithm Works
Dijkstra’s algorithm is based on the greedy method. It adds vertices by increasing distance. Suppose it didn’t find all shortest distances. Let F be the first wrong vertex the algorithm processed. When the previous node, D, on the true shortest path was considered, its distance was correct. But the edge (D,F) was relaxed at that time! Thus, so long as d(F)>d(D), F’s distance cannot be wrong. That is, there is no wrong vertex. A 8 4 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F Graphs II CSC311: Data Structures

Why It Doesn’t Work for Negative-Weight Edges
Dijkstra’s algorithm is based on the greedy method. It adds vertices by increasing distance. If a node with a negative incident edge were to be added late to the cloud, it could mess up distances for vertices already in the cloud. A 4 8 6 7 5 4 7 1 B C D -8 5 9 2 5 E F C’s true distance is 1, but it is already in the cloud with d(C)=5! Graphs II CSC311: Data Structures

Bellman-Ford Algorithm
Works even with negative-weight edges Must assume directed edges (for otherwise we would have negative-weight cycles) Iteration i finds all shortest paths that use i edges. Running time: O(nm). Can be extended to detect a negative-weight cycle if it exists How? Algorithm BellmanFord(G, s) for all v  G.vertices() if v = s setDistance(v, 0) else setDistance(v, ) for i  1 to n-1 do for each e  G.edges() { relax edge e } u  G.origin(e) z  G.opposite(u,e) r  getDistance(u) + weight(e) if r < getDistance(z) setDistance(z,r) Graphs II CSC311: Data Structures

Nodes are labeled with their d(v) values
Bellman-Ford Example Nodes are labeled with their d(v) values 4 4 8 8 -2 -2 8 -2 4 7 1 7 1       3 9 3 9 -2 5 -2 5     4 4 8 8 -2 -2 5 7 1 -1 7 1 8 -2 4 5 -2 -1 1 3 9 3 9 6 9 4 -2 5 -2 5   1 9 Graphs II CSC311: Data Structures

DAG-based Algorithm Works even with negative-weight edges
Uses topological order Doesn’t use any fancy data structures Is much faster than Dijkstra’s algorithm Running time: O(n+m). Algorithm DagDistances(G, s) for all v  G.vertices() if v = s setDistance(v, 0) else setDistance(v, ) Perform a topological sort of the vertices for u  1 to n do {in topological order} for each e  G.outEdges(u) { relax edge e } z  G.opposite(u,e) r  getDistance(u) + weight(e) if r < getDistance(z) setDistance(z,r) Graphs II CSC311: Data Structures

Nodes are labeled with their d(v) values
DAG Example Nodes are labeled with their d(v) values 1 1 4 4 8 8 -2 -2 8 -2 4 3 7 2 1 4 3 7 2 1 4       3 9 3 9 -5 5 -5 5     6 5 6 5 1 1 4 4 8 8 -2 -2 5 3 7 2 1 4 -1 3 7 2 1 4 8 -2 4 5 -2 -1 9 3 3 9 1 7 4 -5 5 -5 5   1 7 6 5 6 5 (two steps) Graphs II CSC311: Data Structures

Minimum Spanning Trees
Spanning subgraph Subgraph of a graph G containing all the vertices of G Spanning tree Spanning subgraph that is itself a (free) tree Minimum spanning tree (MST) Spanning tree of a weighted graph with minimum total edge weight Applications Communications networks Transportation networks ORD 10 1 PIT DEN 6 7 9 3 DCA STL 4 8 5 2 DFW ATL Graphs II CSC311: Data Structures

Replacing f with e yields a better spanning tree
Cycle Property 8 4 2 3 6 7 9 e C f Cycle Property: Let T be a minimum spanning tree of a weighted graph G Let e be an edge of G that is not in T and C let be the cycle formed by e with T For every edge f of C, weight(f)  weight(e) Proof: By contradiction If weight(f) > weight(e) we can get a spanning tree of smaller weight by replacing e with f Replacing f with e yields a better spanning tree 8 4 2 3 6 7 9 C e f Graphs II CSC311: Data Structures

Replacing f with e yields another MST
Partition Property Partition Property: Consider a partition of the vertices of G into subsets U and V Let e be an edge of minimum weight across the partition There is a minimum spanning tree of G containing edge e Proof: Let T be an MST of G If T does not contain e, consider the cycle C formed by e with T and let f be an edge of C across the partition By the cycle property, weight(f)  weight(e) Thus, weight(f) = weight(e) We obtain another MST by replacing f with e U V 7 f 4 9 5 2 8 8 3 e 7 Replacing f with e yields another MST U V 7 f 4 9 5 2 8 8 3 e 7 Graphs II CSC311: Data Structures

Kruskal’s Algorithm A priority queue stores the edges outside the cloud Key: weight Element: edge At the end of the algorithm We are left with one cloud that encompasses the MST A tree T which is our MST Algorithm KruskalMST(G) for each vertex V in G do define a Cloud(v) of  {v} let Q be a priority queue. Insert all edges into Q using their weights as the key T   while T has fewer than n-1 edges do edge e = T.removeMin() Let u, v be the endpoints of e if Cloud(v)  Cloud(u) then Add edge e to T Merge Cloud(v) and Cloud(u) return T Graphs II CSC311: Data Structures

Data Structure for Kruskal Algortihm
The algorithm maintains a forest of trees An edge is accepted it if connects distinct trees We need a data structure that maintains a partition, i.e., a collection of disjoint sets, with the operations: -find(u): return the set storing u -union(u,v): replace the sets storing u and v with their union Graphs II CSC311: Data Structures

Representation of a Partition
Each set is stored in a sequence Each element has a reference back to the set operation find(u) takes O(1) time, and returns the set of which u is a member. in operation union(u,v), we move the elements of the smaller set to the sequence of the larger set and update their references the time for operation union(u,v) is min(nu,nv), where nu and nv are the sizes of the sets storing u and v Whenever an element is processed, it goes into a set of size at least double, hence each element is processed at most log n times Graphs II CSC311: Data Structures

Partition-Based Implementation
A partition-based version of Kruskal’s Algorithm performs cloud merges as unions and tests as finds. Algorithm Kruskal(G): Input: A weighted graph G. Output: An MST T for G. Let P be a partition of the vertices of G, where each vertex forms a separate set. Let Q be a priority queue storing the edges of G, sorted by their weights Let T be an initially-empty tree while Q is not empty do (u,v)  Q.removeMinElement() if P.find(u) != P.find(v) then Add (u,v) to T P.union(u,v) return T Running time: O((n+m)log n) Graphs II CSC311: Data Structures

Kruskal Example 2704 BOS 867 849 PVD ORD 187 740 144 1846 JFK 621 184
1258 802 SFO BWI 1391 1464 337 1090 DFW 946 LAX 1235 1121 MIA 2342 Graphs II CSC311: Data Structures

Example Graphs II CSC311: Data Structures

Example JFK BOS MIA ORD LAX DFW SFO BWI PVD 867 2704 187 1258 849 144
740 1391 184 946 1090 1121 2342 1846 621 802 1464 1235 337 Graphs II CSC311: Data Structures

Prim-Jarnik’s Algorithm
Similar to Dijkstra’s algorithm (for a connected graph) We pick an arbitrary vertex s and we grow the MST as a cloud of vertices, starting from s We store with each vertex v a label d(v) = the smallest weight of an edge connecting v to a vertex in the cloud At each step: We add to the cloud the vertex u outside the cloud with the smallest distance label We update the labels of the vertices adjacent to u Graphs II CSC311: Data Structures

Prim-Jarnik’s Algorithm
Algorithm PrimJarnikMST(G) Q  new heap-based priority queue s  a vertex of G for all v  G.vertices() if v = s setDistance(v, 0) else setDistance(v, ) setParent(v, ) l  Q.insert(getDistance(v), v) setLocator(v,l) while Q.isEmpty() u  Q.removeMin() for all e  G.incidentEdges(u) z  G.opposite(u,e) r  weight(e) if r < getDistance(z) setDistance(z,r) setParent(z,e) Q.replaceKey(getLocator(z),r) A priority queue stores the vertices outside the cloud Key: distance Element: vertex Locator-based methods insert(k,e) returns a locator replaceKey(l,k) changes the key of an item We store three labels with each vertex: Distance Parent edge in MST Locator in priority queue Graphs II CSC311: Data Structures

Example  7 7 D 7 D 2 2 B 4 B 4 8 9  5 9  5 5 2 F 2 F C C 8 8 3 3 8 8 E E A 7 A 7 7 7 7 7 7 D 2 7 D 2 B 4 B 4 5 9  5 5 9 4 2 F 5 C 2 F 8 C 8 3 8 3 8 E A E 7 7 A 7 7 Graphs II CSC311: Data Structures

Example (contd.) 7 7 D 2 B 4 9 4 5 5 2 F C 8 3 8 E A 3 7 7 7 D 2 B 4 5
7 D 2 B 4 5 9 4 5 2 F C 8 3 8 E A 3 7 Graphs II CSC311: Data Structures

Analysis Graph operations Label operations Priority queue operations
Method incidentEdges is called once for each vertex Label operations We set/get the distance, parent and locator labels of vertex z O(deg(z)) times Setting/getting a label takes O(1) time Priority queue operations Each vertex is inserted once into and removed once from the priority queue, where each insertion or removal takes O(log n) time The key of a vertex w in the priority queue is modified at most deg(w) times, where each key change takes O(log n) time Prim-Jarnik’s algorithm runs in O((n + m) log n) time provided the graph is represented by the adjacency list structure Recall that Sv deg(v) = 2m The running time is O(m log n) since the graph is connected Graphs II CSC311: Data Structures

CSC311: Data Structures Java Review

Similar presentations

Presentation on theme: "CSC311: Data Structures Java Review"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSC311: Data Structures Java Review

Similar presentations

Presentation on theme: "CSC311: Data Structures Java Review"— Presentation transcript:

Similar presentations

About project

Feedback