# Data Structures and Algorithms

## Presentation on theme: "Data Structures and Algorithms"— Presentation transcript:

Data Structures and Algorithms
Course’s slides: Abstract data types, Stacks, Queues, Lists, Heaps, Binary search, Multiplication

Abstract data type A theoretical description of an algorithm, if realized in application is affected very much by: computer resources, implementation, data. Such a theory include fundamental concepts: Abstract Data Type (ADT) or data type, or data structures tools to express operations of algorithms; computational resources to implement the algorithm and test its functionality; evaluation of the complexity of algorithms.

What is a Data Type? A name for the INTEGER data type
E.g., “int” Collection of (possible) data items E.g., integers can have values in the range of -231 to 231 – 1 Associated set of operations on those data items E.g., arithmetic operations like +, -, *, /, etc.

Abstract data type An abstract data type (ADT) is defined as a mathematical model of the data objects that make up a data type, as well as the functions that operate on these objects (and logical or other relations between objects). ADT consist of two parts: data objects and operations with data objects. The term data type refers to the implementation of the mathematical model specified by an ADT The term data structure refers to a collection of computer variables that are connected in some specific manner The notion of data type include basic data types. Basic data types are related to a programming language.

Example: Integer Set Data Type
ABSTRACT (Theoretical) DATA TYPE Mathematical set of integers, I Possible data items: -inf, …, -1, 0, 1, …, +inf Operations: +, -, *, mod, div, etc. Actual, Implemented Data Type (available in C++): It’s called “int” Only has range of -231 to 231 – 1 for the possible data items (instead of –inf to +inf) Has same arithmetic operations available What’s the relationship/difference between the ADT and the Actual, Implemented Data Type in C++? The range of possible data items is different.

ADT – Class DECLARATION (lib.h) Class DEFINITION (lib.cpp)
The THREE Essentials… ADT – Class DECLARATION (lib.h) ABSTRACT (Theoretical) DATA TYPE E.g., the mathematical class I in our example Actual IMPLEMENTED DATA TYPE What you have in C++; for example, “int” in our example INSTANTIATED DATA TYPE – DATA STRUCTURE E.g., x in int x = 5; (in our example) Stores (or structures) the data item(s) Can be a variable, array, object, etc.; holds the actual data (e.g., a specific value) Class DEFINITION (lib.cpp) Object (project.cpp)

Implementation of an ADT
The data structures used in implementations are EITHER already provided in a programming language (primitive or built-in) or are built from the language constructs (user-defined). In either case, successful software design uses data abstraction: Separating the declaration of a data type from its implementation.

Summary of ADT (Abstract or Actual) Data Types have three properties:
Name Possible Data Items Operations on those data items The Data Type declaration goes in the .h (header) file – e.g., the class declaration The Data Type definitions go in the .cpp (implementation) file – e.g., the class definition

Stacks Stacks are a special form of collection with LIFO semantics
Two methods int push( Stack s, void *item ); - add item to the top of the stack void *pop( Stack s ); - remove an item from the top of the stack Like a plate stacker other methods int IsEmpty( Stack s ); /* Return TRUE if empty */ void *Top( Stack s ); /* Return the item at the top, without deleting it */

Stacks This ADT covers a set of objects as well as operations performed on these objects: Initialize (S) – creates a necessary structured space in computer memory to locate objects in S; Push(x) – inserts x into S; Pop – deletes object from the stack that was most recently inserted into; Top – returns an object from the stack that was most recently inserted into; Kill (S) - releases an amount of memory occupied by S. The operations with stack objects obey LIFO property: Last-In-First-Out. This is a logical constrain or logical condition. The operations Initialize and Kill are more oriented to an implementation of this ADT, but they are important in some algorithms and applications too. The stack is a dynamic data set with a limited access to objects.

Stacks - Implementation
Arrays Provide a stack capacity to the constructor Flexibility limited but matches many real uses Capacity limited by some constraint Memory in your computer Size of the plate stacker, etc push, pop methods Variants of AddToC…, DeleteFromC… Linked list also possible Stack: basically a Collection with special semantics!

Array Stack Implementation
We can use an array of elements as a stack The top is the index of the next available element in the array top integer T [ ] stack Object of type T Object of type T null

We can use the same LinearNode class that we used for LinkedSet implementation We change the attribute name to “top” to have a meaning consistent with a stack count integer top LinearNode next; T element; LinearNode next; T element; null Object of type T Object of type T

The N-Queens Problem Suppose you have 8 chess queens...
...and a chess board We'll start with a description of a problem which involves a bunch of queens from a chess game, and a chess board.

The N-Queens Problem Can the queens be placed on the board so that no two queens are attacking each other ? Some of you may have seen this problem before. The goal is to place all the queens on the board so that none of the queens are attacking each other.

The N-Queens Problem Two queens are not allowed in the same row...
If you play chess, then you know that this forbids two queens from being in the same row...

The N-Queens Problem Two queens are not allowed in the same row, or in the same column... ...or in the same column...

The N-Queens Problem Two queens are not allowed in the same row, or in the same column, or along the same diagonal. ...or along the same diagonal. As a quick survey, how many of you think that a solution will be possible? In any case, we shall find out, because we will write a program to try to find a solution. As an aside, if the program does discover a solution, we can easily check that the solution is correct. But suppose the program tells us that there is no solution. In that case, there are actually two possibilies to keep in mind: 1. Maybe the problem has no solution. 2. Maybe the problem does have a solution, and the program has a bug! Moral of the story: Always create an independent test to increase the confidence in the correctness of your programs.

The N-Queens Problem N Queens N columns N rows
The number of queens, and the size of the board can vary. N columns The program that we write will actually permit a varying number of queens. The number of queens must always equal the size of the chess board. For example, if I have six queens, then the board will be a six by six chess board. N rows

The N-Queens Problem We will write a program which tries to find a way to place N queens on an N x N chess board. At this point, I can give you a demonstration of the program at work. The demonstration uses graphics to display the progress of the program as it searches for a solution. During the demonstration, a student can provide the value of N. With N less than 4, the program is rather boring. But N=4 provides some interest. N=10 takes a few minutes, but it is interesting to watch and the students can try to figure out the algorithm used by the program.

How the program works The program uses a stack to keep track of where each queen is placed. I want to show you the algorithm that the program uses. The technique is called backtracking. The key feature is that a stack is used to keep track of each placement of a queen.

How the program works Each time the program decides to place a queen on the board, the position of the new queen is stored in a record which is placed in the stack. For example, when we place the first queen in the first column of the first row, we record this placement by pushing a record onto the stack. This record contains both the row and column number of the newly- placed queen. ROW 1, COL 1

How the program works We also have an integer variable to keep track of how many rows have been filled so far. In addition to the stack, we also keep track of one other item: an integer which tells us how many rows currently have a queen placed. ROW 1, COL 1 1 filled

How the program works Each time we try to place a new queen in the next row, we start by placing the queen in the first column... ROW 2, COL 1 When we successfully place a queen in one row, we move to the next row. We always start by trying to place the queen in the first column of the new row. ROW 1, COL 1 1 filled

How the program works ...if there is a conflict with another queen, then we shift the new queen to the next column. ROW 2, COL 2 But each new placement must be checked for potential conflicts with the previous queen. If there is a conflict, then the newly-placed queen is shifted rightward. ROW 1, COL 1 1 filled

How the program works If another conflict occurs, the queen is shifted rightward again. ROW 2, COL 3 Sometimes another conflict will occur, and the newly-placed queen must continue shifting rightward. ROW 1, COL 1 1 filled

How the program works When there are no conflicts, we stop and add one to the value of filled. ROW 2, COL 3 When the new queen reaches a spot with no conflicts, then the algorithm can move on. In order to move on, we add one to the value of filled... ROW 1, COL 1 2 filled

How the program works Let's look at the third row. The first position we try has a conflict... ROW 3, COL 1 ROW 2, COL 3 ...and place a new queen in the first column of the next row. ROW 1, COL 1 2 filled

How the program works ...so we shift to column 2. But another conflict arises... ROW 3, COL 2 ROW 2, COL 3 In this example, there is a conflict with the placement of the new queen, so we move her rightward to the second column. ROW 1, COL 1 2 filled

How the program works ...and we shift to the third column.
Yet another conflict arises... ROW 3, COL 3 ROW 2, COL 3 Another conflict arises, so we move rightward to the third column. ROW 1, COL 1 2 filled

How the program works ...and we shift to column 4. There's still a conflict in column 4, so we try to shift rightward again... ROW 3, COL 4 ROW 2, COL 3 Yet another conflict arises, so we move to the fourth column. The key idea is that each time we try a particular location for the new queen, we need to check whether the new location causes conflicts with our previous queens. If so, then we move the new queen to the next possible location. ROW 1, COL 1 2 filled

How the program works ...but there's nowhere else to go. 2 filled
ROW 3, COL 4 ROW 2, COL 3 Sometimes we run out of possible locations for the new queens. This is where backtracking comes into play. ROW 1, COL 1 2 filled

How the program works When we run out of room in a row: pop the stack,
reduce filled by 1 and continue working on the previous row. ROW 2, COL 3 To backtrack, we throw out the new queen altogether, popping the stack, reducing filled by 1, and returning to the previous row. At the previous row, we continue shifting the queen rightward. ROW 1, COL 1 1 filled

How the program works Now we continue working on row 2, shifting the queen to the right. ROW 2, COL 4 Notice that we continue the previous row from the spot where we left off. The queen shifts from column 3 to column 4. We don't return her back to column 1. It is the use of the stack that lets us easily continue where we left off. The position of this previous queen is recorded in the stack, so we can just move the queen rightward one more position. ROW 1, COL 1 1 filled

How the program works This position has no conflicts, so we can increase filled by 1, and move to row 3. ROW 2, COL 4 The new position for row 2 has no conflicts, so we can increase filled by 1, and move again to row 3. ROW 1, COL 1 2 filled

How the program works In row 3, we start again at the first column. 2
ROW 3, COL 1 ROW 2, COL 4 At the new row, we again start at the first column. So the general rules are: When the algorithm moves forward, it always starts with the first column. But when the algorithm backtracks, it continues whereever it left off. ROW 1, COL 1 2 filled

Pseudocode for N-Queens
Initialize a stack where we can keep track of our decisions. Place the first queen, pushing its position onto the stack and setting filled to 0. repeat these steps if there are no conflicts with the queens... else if there is a conflict and there is room to shift the current queen rightward... else if there is a conflict and there is no room to shift the current queen rightward... Here’s the pseudocode for implementing the backtrack algorithm. The stack is initialized as an empty stack, and then we place the first queen. After the initialization, we enter a loop with three possible actions at each iteration. We'll look at each action in detail...

Pseudocode for N-Queens
repeat these steps if there are no conflicts with the queens... Increase filled by 1. If filled is now N, then the algorithm is done. Otherwise, move to the next row and place a queen in the first column. The nicest possibility is when none of the queens have any conflicts. In this case, we can increase filled by 1. If filled is now N, then we are done! But if filled is still less than N, then we can move to the next row and place a queen in the first column. When this new queen is placed, we'll record its position in the stack. Another aside: How do you suppose the program "checks for conflicts"? Hint: It helps if the stack is implemented in a way that permits the program to peek inside and see all of the recorded positions. This "peek inside" operation is often implemented with a stack, although the ability to actually change entries is limited to the usual pushing and popping.

Pseudocode for N-Queens
repeat these steps if there are no conflicts with the queens... else if there is a conflict and there is room to shift the current queen rightward... Move the current queen rightward, adjusting the record on top of the stack to indicate the new position. The second possiblity is that a conflict arises, and the new queen has room to move rightward. In this case, we just move the new queen to the right.

Pseudocode for N-Queens
repeat these steps if there are no conflicts with the queens... else if there is a conflict and there is room to shift the current queen rightward... else if there is a conflict and there is no room to shift the current queen rightward... The last possiblity is that a conflict exists, but the new queen has run out of room. In this case we backtrack: Pop the stack, Reduce filled by 1. We must keep doing these two steps until we find a row where the queen can be shifted rightward. In other words, until we find a row where the queen is not already at the end. At that point, we shift the queen rightward, and continue the loop. But there is one potential pitfall here! Backtrack! Keep popping the stack, and reducing filled by 1, until you reach a row where the queen can be shifted rightward. Shift this queen right.

Pseudocode for N-Queens
repeat these steps if there are no conflicts with the queens... else if there is a conflict and there is room to shift the current queen rightward... else if there is a conflict and there is no room to shift the current queen rightward... The potential pitfall: Maybe the stack becomes empty during this popping. What would that indicate? Answer: It means that we backtracked right back to the beginning, and ran out of possible places to place the first queen. In that case, the problem has no solution. Backtrack! Keep popping the stack, and reducing filled by 1, until you reach a row where the queen can be shifted rightward. Shift this queen right.

Stacks - Relevance Stacks appear in computer programs Stack frame
Key to call / return in functions & procedures Stack frame allows recursive calls Call: push stack frame Return: pop stack frame Stack frame Function arguments Return address Local variables

Summary Stacks have many applications.
The application which we have shown is called backtracking. The key to backtracking: Each choice is recorded in a stack. When you run out of choices for the current decision, you pop the stack, and continue trying different choices for the previous decision. A quick summary . . .

Stacks and Queues Array Stack Implementation
Linked Stack Implementation Queue Abstract Data Type (ADT) Queue ADT Interface Queue Design Considerations

Queue Abstract Data Type
A queue is a linear collection where the elements are added to one end and removed from the other end The processing is first in, first out (FIFO) The first element put on the queue is the first element removed from the queue Think of a line of people waiting for a bus (The British call that “queuing up”)

A Conceptual View of a Queue
Rear of Queue (or Tail) Front of Queue (or Head) Removing an Element Adding an Element

Queue Terminology We enqueue an element on a queue to add one
We dequeue an element off a queue to remove one We can also examine the first element without removing it We can determine if a queue is empty or not and how many elements it contains (its size) The L&C QueueADT interface supports the above operations and some typical class operations such as toString()

Queue Design Considerations
Although a queue can be empty, there is no concept for it being full. An implementation must be designed to manage storage space For first and dequeue operation on an empty queue, this implementation will throw an exception Other implementations could return a value null that is equivalent to “nothing to return”

Queue Design Considerations
No iterator method is provided That would be inconsistent with restricting access to the first element of the queue If we need an iterator or other mechanism to access the elements in the middle or at the end of the collection, then a queue is not the appropriate data structure to use

Queues This ADT covers a set of objects as well as operations performed on objects: queueinit (Q) – creates a necessary structured space in computer memory to locate objects in Q; put (x) – inserts x into Q; get – deletes object from the queue that has been residing in Q the longest; head – returns an object from the queue that has been residing in Q the longest; kill (Q) – releases an amount of memory occupied by Q. The operations with queue obey FIFO property: First-In-First-Out. This is a logical constrain or logical condition. The queue is a dynamic data set with a limited access to objects. The application to illustrate usage of a queue is: queueing system simulation (system with waiting lines) (implemented by using the built-in type of pointer)

Queue implementation Just as with stacks, queues can be implemented using arrays or lists. For the first of all, let’s consider the implementation using arrays. Define an array for storing the queue elements, and two markers: one pointing to the location of the head of the queue, the other to the first empty space following the tail. When an item is to be added to the queue, a test to see if the tail marker points to a valid location is made, then the item is added to the queue and the tail marker is incremented by 1. When an item is to be removed from the queue, a test is made to see if the queue is empty and, if not, the item at the location pointed to by the head marker is retrieved and the head marker is incremented by 1.

Queue implementation This procedure works well until the first time when the tail marker reaches the end of the array. If some removals have occurred during this time, there will be empty space at the beginning of the array. However, because the tail marker points to the end of the array, the queue is thought to be 'full' and no more data can be added. We could shift the data so that the head of the queue returns to the beginning of the array each time this happens, but shifting data is costly in terms of computer time, especially if the data being stored in the array consist of large data objects.

Queue implementation We may now formalize the algorithms for dealing with queues in a circular array. • Creating an empty queue: Set Head = Tail = 0. • Testing if a queue is empty: is Head == Tail? • Testing if a queue is full: is (Tail + 1) mod QSIZE == Head? • Adding an item to a queue: if queue is not full, add item at location Tail and set Tail = (Tail + 1) mod QSIZE. • Removing an item from a queue: if queue is not empty, remove item from location Head and set Head = (Head + 1) mod QSIZE.

A list is one of the most fundamental data structures used to store a collection of data items. The importance of the List ADT is that it can be used to implement a wide variety of other ADTs. That is, the LIST ADT often serves as a basic building block in the construction of more complicated ADTs. A list may be defined as a dynamic ordered n-tuple: L == (l1, 12, ....., ln)

Linked list The use of the term dynamic in this definition is meant to emphasize that the elements in this n-tuple may change over time. Notice that these elements have a linear order that is based upon their position in the list. The first element in the list, 11, is called the head of the list. The last element, ln, is referred to as the tail of the list. The number of elements in a list L is refered to as the length of the list. Thus the empty list, represented by (), has length 0. A list can homogeneous or heterogeneous.

Linked list 0. Initialize ( L ). This operation is needed to allocate the amount of memory and to give a structure to this amount. 1. Insert (L, x, i). If this operation is successful, the boolean value true is returned; otherwise, the boolean value false is returned. 2. Append (L, x). Adds element x to the tail of L, causing the length of the list to become n+1. If this operation is successful, the boolean value true is returned; otherwise, the boolean value false is returned. 3. Retrieve (L, i). Returns the element stored at position i of L, or the null value if position i does not exist. 4. Delete (L, i). Deletes the element stored at position i of L, causing elements to move in their positions. 5. Length (L). Returns the length of L.

Linked list 6. Reset (L). Resets the current position in L to the head (i.e., to position 1) and returns the value 1. If the list is empty, the value 0 is returned. 7. Current (L). Returns the current position in L. 8. Next (L). Increments and returns the current position in L. Note that only the Insert, Delete, Reset, and Next operations modify the lists to which they are applied. The remaining operations simply query lists in order to obtain information about them.

Linked lists Flexible space use Linked list
Dynamically allocate space for each element as needed Include a pointer to the next item Linked list Each node of the list contains the data item (an object pointer in our ADT) a pointer to the next node Data Next object

Linked lists Collection structure has a pointer to the list head
Initially NULL Add first item Allocate space for node Set its data pointer to object Set Next to NULL Set Head to point to new node Collection node Head Data Next object

Linked lists Add second item Allocate space for node
Set its data pointer to object Set Next to current Head Set Head to point to new node Collection Head node node Data Next object2 Data Next object

struct t_node { void *item; struct t_node *next; } node; typedef struct t_node *Node; struct collection { Node head; …… }; int AddToCollection( Collection c, void *item ) { Node new = malloc( sizeof( struct t_node ) ); new->item = item; new->next = c->head; c->head = new; return TRUE; }

struct t_node { void *item; struct t_node *next; } node; typedef struct t_node *Node; struct collection { Node head; …… }; int AddToCollection( Collection c, void *item ) { Node new = malloc( sizeof( struct t_node ) ); new->item = item; new->next = c->head; c->head = new; return TRUE; } Recursive type definition - C allows it! Error checking, asserts omitted for clarity!

Linked lists Add time Search time Constant - independent of n
Worst case - n Collection Head node node Data Next object2 Data Next object

Linked lists – Delete Implementation
void *DeleteFromCollection( Collection c, void *key ) { Node n, prev; n = prev = c->head; while ( n != NULL ) { if ( KeyCmp( ItemKey( n->item ), key ) == 0 ) { prev->next = n->next; return n; } prev = n; n = n->next; return NULL; head

Minor addition needed to allow for deleting this one! An exercise!
Linked lists – Delete Implementation void *DeleteFromCollection( Collection c, void *key ) { Node n, prev; n = prev = c->head; while ( n != NULL ) { if ( KeyCmp( ItemKey( n->item ), key ) == 0 ) { prev->next = n->next; return n; } prev = n; n = n->next; return NULL; head Minor addition needed to allow for deleting this one! An exercise!

Linked lists - LIFO and FIFO
Simplest implementation Add to head Last-In-First-Out (LIFO) semantics Modifications First-In-First-Out (FIFO) Keep a tail pointer head struct t_node { void *item; struct t_node *next; } node; typedef struct t_node *Node; struct collection { Node head, tail; }; tail tail is set in the AddToCollection method if head == NULL

Dynamic set ADT The concept of a set serves as the basis for a wide variety of useful abstract data types. A large number of computer applications involve the manipulation of sets of data elements. Thus, it makes sense to investigate data structures and algorithms that support efficient implementation of various operations on sets. Another important difference between the mathematical concept of a set and the sets considered in computer science: • a set in mathematics is unchanging, while the sets in CS are considered to change over time as data elements are added or deleted. Thus, sets are refered here as dynamic sets. In addition, we will assume that each element in a dynamic set contains an identifying field called a key, and that a total ordering relationship exists on these keys. It will be assumed that no two elements of a dynamic set contain the same key.

Dynamic set ADT The concept of a dynamic set as an DYNAMIC SET ADT is to be specified, that is, as a collection of data elements, along with the legal operations defined on these data elements. If the DYNAMIC SET ADT is implemented properly, application programmers will be able to use dynamic sets without having to understand their implementation details. The use of ADTs in this manner simplifies design and development, and promotes reusability of software components. A list of general operations for the DYNAMIC SET ADT. In each of these operations, S represents a specific dynamic set:

Dynamic set ADT Search(S, k). Returns the element with key k in S, or the null value if an element with key k is not in S. Insert(S, x). Adds element x to S. If this operation is successful, the boolean value true is returned; otherwise, the boolean value false is returned. Delete(S, k). Removes the element with key k in S. If this operation is successful, the boolean value true is returned; otherwise, the boolean value false is returned. Minimum(S). Returns the element in dynamic set S that has the smallest key value, or the null value if S is empty. Maximum(S). Returns the element in S that has the largest key value, or the null value if S is empty. Predecessor(S, k). Returns the element in S that has the largest key value less than k, or the null value if no such element exists. Successor(S, k). Returns the element in S that has the smallest key value greater than k, or the null value if no such element exists.

Dynamic set ADT In many instances an application will only require the use of a few DYNAMIC SET operations. Some groups of these operations are used so frequently that they are given special names: the ADT that supports Search, Insert, and Delete operations is called the DICTIONARY ADT; the STACK, QUEUE, and PRIORITY QUEUE ADTs are all special types of dynamic sets. A variety of data structures will be described in forthcoming considerations that they can be used to implement either the DYNAMIC SET ADT, or ADTs that support specific subsets of the DYNAMIC SET ADT operations. Each of the data structures described will be analyzed in order to determine how efficiently they support the implementation of these operations. In each case, the analysis will be performed in terms of n, the number of data elements stored in the dynamic set.

Generalized queue Stacks and FIFO queues are identifying items according to the time that they were inserted into the queue. Alternatively, the abstract concepts may be identified in terms of a sequential listing of the items in order, and refer to the basic operations of inserting and deleting items from the beginning and the end of the list: if we insert at the end and delete at the end, we get a stack (precisely as in array implementation); if we insert at the beginning and delete at the beginning, we also get a stack (precisely as in linked-list implementation); if we insert at the end and delete at the beginning, we get a FIFO queue (precisely as in linked-list implementation); if we insert at the beginning and delete at the end, we also get a FIFO queue (this option does not correspond to any of implementations given).

Generalized queue Specifically, pushdown stacks and FIFO queues are special instances of a more general ADT: the generalized queue. Instances generalized queues differ in only the rule used when items are removed: for stacks, the rule is "remove the item that was most recently inserted"; for FIFO queues, the rule is "remove the item that was least recently inserted"; there are many other possibilities to consider. A powerful alternative is the random queue, which uses the rule: "remove a random item"

Generalized queue The algorithm can expect to get any of the items on the queue with equal probability. The operations of a random queue can be implemented: in constant time using an array representation (it requires to reserve space ahead of time) using linked-list alternative (which is less attractive however, because implementing both, insertion and deletion efficiently is a challenging task) Random queues can be used as the basis for randomized algorithms, to avoid, with high probability, worst-case performance scenarios.

Generalized queue Building on this point of view, the dequeue ADT may be defined, where either insertion or deletion at either end are allowed. The implementation of dequeue is a good exercise to program. The priority queue ADT is another example of generalized queue. The items in a priority queue have keys and the rule for deletion is: "remove the item with the smallest key" The priority queue ADT is useful in a variety of applications, and the problem of finding efficient implementations for this ADT has been a research goal in computer science for many years.

Heaps and the heapsort Heaps and priority queues
Heap structure and position numbering Heap structure property Heap ordering property Removal of top priority node Inserting a new node into the heap The heap sort Source code for heap sort program

Heaps and priority queues
A heap is a data structure used to implement an efficient priority queue. The idea is to make it efficient to extract the element with the highest priority ­ the next item in the queue to be processed. We could use a sorted linked list, with O(1) operations to remove the highest priority node and O(N) to insert a node. Using a tree structure will involve both operations being O(log2N) which is faster.

Heap structure and position numbering 1
A heap can be visualised as a binary tree in which every layer is filled from the left. For every layer to be full, the tree would have to have a size exactly equal to 2n­1, e.g. a value for size in the series 1, 3, 7, 15, 31, 63, 127, 255 etc. So to be practical enough to allow for any particular size, a heap has every layer filled except for the bottom layer which is filled from the left.

Heap structure and position numbering 2

Heap structure and position numbering 3
In the above diagram nodes are labelled based on position, and not their contents. Also note that the left child of each node is numbered node*2 and the right child is numbered node*2+1. The parent of every node is obtained using integer division (throwing away the remainder) so that for a node i's parent has position i/2 . Because this numbering system makes it very easy to move between nodes and their children or parents, a heap is commonly implemented as an array with element 0 unused.

Heap Properties A heap T storing n keys has height h = log(n + 1), which is O(log n)

Heap ordering

Heap Insertion Insert 6

Heap Insertion Add key in next available position

Heap Insertion Begin Unheap

Heap Insertion

Heap Insertion Terminate unheap when reach root
key child is greater than key parent

Removal of top priority node
The rest of these notes assume a min heap will be used. Removal of the top node creates a hole at the top which is "bubbled" downwards by moving values below it upwards, until the hole is in a position where it can be replaced with the rightmost node from the bottom layer. This process restores the heap ordering property.

Heap Removal Remove element from priority queues? removeMin( )

Heap Removal Begin downheap

Heap Removal

Heap Removal

Heap Removal Terminate downheap when reach leaf level
key parent is greater than key child

The heap sort Using a heap to sort data involves performing N insertions followed by N delete min operations as described above. Memory usage will depend upon whether the data already exists in memory or whether the data is on disk. Allocating the array to be used to store the heap will be more efficient if N, the number of records, can be known in advance. Dynamic allocation of the array will then be possible, and this is likely to be preferable to preallocating the array.

Heaps A heap is a binary tree T that stores a key-element pairs at its internal nodes It satisfies two properties: MinHeap: key(parent)  key(child) [OR MaxHeap: key(parent) ≥ key(child)] all levels are full, except the last one, which is left-filled

What are Heaps Useful for?
To implement priority queues Priority queue = a queue where all elements have a “priority” associated with them Remove in a priority queue removes the element with the smallest priority insert removeMin

Heap or Not a Heap?

ADT for Min Heap objects: n > 0 elements organized in a binary tree so that the value in each node is at least as large as those in its children method: Heap Create(MAX_SIZE)::= create an empty heap that can hold a maximum of max_size elements Boolean HeapFull(heap, n)::= if (n==max_size) return TRUE else return FALSE Heap Insert(heap, item, n)::= if (!HeapFull(heap,n)) insert item into heap and return the resulting heap else return error Boolean HeapEmpty(heap, n)::= if (n>0) return FALSE else return TRUE Element Delete(heap,n)::= if (!HeapEmpty(heap,n)) return one instance of the smallest element in the heap and remove it from the heap else return error

Building a Heap build (n + 1)/2 trivial one-element heaps
build three-element heaps on top of them

Building a Heap downheap to preserve the order property
now form seven-element heaps

Building a Heap

Building a Heap