Introduction to Algorithms and Data Structures

Introduction to Algorithms and Data Structures
A.E. Csallner Department of Applied Informatics University of Szeged Hungary

Algorithms and Data Structures I
Finite sequence of finite steps Provides the solution to a given problem Properties: Finiteness Definiteness Executability Communication: Input Output About algorithms Algorithms and Data Structures I

Structured programming
Design strategies: Bottom-up: synthesize smaller algorithmic parts into bigger ones Top-down: formulate the problem and repeatedly break it up into smaller and smaller parts About algorithms Algorithms and Data Structures I

Example : Shoe a horse shoe a horse a horse has four hooves shoe a hoof need a horseshoe need to fasten the horseshoe to the hoof hammer a horseshoe drive a cog into a hoof need cogs hammer a cog Structured programming Algorithms and Data Structures I

Basic elements of structured programming Sequence: series of actions Selection: branching on a decision Iteration: conditional repetition All structured algorithms can be defined using only these three elements (E.W. Dijkstra 1960s) Structured programming Algorithms and Data Structures I

Algorithm description methods
An algorithm description method defines an algorithm so that the description code should be unambiguous; programming language independent; still easy to implement; state-of-the-art Algorithm description Algorithms and Data Structures I

Some possible types of classification: Age (when the description method was invented) Purpose (e.g. structural or object-oriented) Formulation (graphical or text code, etc.) ... Algorithm description Algorithms and Data Structures I

Most popular and useful description methods Flow diagram old not definitely structured(!) graphical very intuitive and easy to use Algorithm description Algorithms and Data Structures I

A possible notation of flow diagrams
START Circle: STOP STOP Algorithm description Algorithms and Data Structures I

Rectangle: Any action execution can be given here Algorithm description Algorithms and Data Structures I

Diamond: Any yes/no question no yes yes Algorithm description Algorithms and Data Structures I

START An example: Iteration Selection Sequence Need more horseshoes? yes Hammer a horseshoe no Shoe a hoof STOP Algorithm description Algorithms and Data Structures I

Most popular and useful description methods Pseudocode old definitely structured text based very easy to implement Algorithm description Algorithms and Data Structures I

Properties of a possible pseudocode
Assignment instruction:  Looping constructs as in Pascal: for-do instruction (counting loop) for variable  initial value to/downto final value do body of the loop Algorithm description Algorithms and Data Structures I

while-do instruction (pre-test loop) while stay-in test do body of the loop repeat-until instruction (post-test loop) repeat body of the loop until exit test Algorithm description Algorithms and Data Structures I

Conditional constructs as in Pascal: if-then-else instruction (else clause is optional) if test then test passed clause else test failed clause Blocks are denoted by indentation Algorithm description Algorithms and Data Structures I

Object identifiers are references Field of an object separator is a dot: object.field object.method object.method(formal parameter list) Empty reference is NIL Algorithm description Algorithms and Data Structures I

Arrays are objects Parameters are passed by value Algorithm description Algorithms and Data Structures I

An example: ShoeAHorse(Hooves) hoof  1 while hoof ≤ Hooves.Count do horseshoe  HammerAHorseshoe Hooves[hoof]  horseshoe hoof  hoof + 1 Iteration Sequence Algorithm description Algorithms and Data Structures I

Type algorithms Algorithm classification on the I/O structure Sequence → Value Sequence → Sequence More sequences → Sequence Sequence → More sequences Type algorithms Algorithms and Data Structures I

Sequence → Value sequence calculations (e.g. summation, product of a series, linking elements together, etc.), decision (e.g. checking whether a sequence contains any element with a given property), selection (e.g. determining the first element in a sequence with a given property provided we know that there exists at least one), Type algorithms Algorithms and Data Structures I

Sequence → Value (continued) search (e.g. finding a given element), counting (e.g. counting the elements having a given property), minimum or maximum search (e.g. finding the least or the largest element). Type algorithms Algorithms and Data Structures I

Sequence → Sequence selection (e.g. collect the elements with a given property of a sequence), copying (e.g. copy the elements of a sequence to create a second sequence), sorting (e.g. arrange elements into an increasing order). Type algorithms Algorithms and Data Structures I

More sequences → Sequence union (e.g. set union of sequences), intersection (e.g. set intersection of sequences), difference (e.g. set difference of sequences), uniting sorted sequences (merging / combing two ordered sequences). Type algorithms Algorithms and Data Structures I

Sequence → More sequences filtering (e.g. filtering out elements of a sequence having given properties). Type algorithms Algorithms and Data Structures I

Special algorithms Iterative algorithm Consists of two parts: Initialization (usually initializing data) Iteration (repeated part) Special algorithms Algorithms and Data Structures I

Recursive algorithms Basic types: direct (self-reference) indirect (mutual references) Two alternative parts depending on the base criterion: Base case (if the problem is small enough) Recurrences (direct or indirect self-reference) Special algorithms Algorithms and Data Structures I

An example of recursive algorithms: Towers of Hanoi Aim: Move n disks from a rod to another, using a third one Rules: One disk moved at a time No disk on top of a smaller one Special algorithms Algorithms and Data Structures I

Recursive solution of the problem
1st step: move n–1 disks 2nd step: move 1 disk 3rd step: move n–1 disks Special algorithms Algorithms and Data Structures I

Pseudocode of the recursive solution TowersOfHanoi(n,FirstRod,SecondRod,ThirdRod) 1 if n > 0 2 then TowersOfHanoi(n – 1,FirstRod,ThirdRod,SecondRod) write “Move a disk from ” FirstRod “ to ” SecondRod TowersOfHanoi(n – 1, ThirdRod,SecondRod,FirstRod) line 2 line 4 line 3 Special algorithms Algorithms and Data Structures I

Backtracking algorithms Backtracking algorithm: Sequence of systematic trials Builds a tree of decision branches Steps back (backtracking) in the tree if no branch at a point is effective Special algorithms Algorithms and Data Structures I

An example of the backtracking algorithms Eight Queens Puzzle: eight chess queens to be placed on a chessboard so that no two queens attack each other Special algorithms Algorithms and Data Structures I

Pseudocode of the iterative solution EightQueens 1 column  1 2 RowInColumn[column]  0 3 repeat 4 repeat inc(RowInColumn[column]) 5 until IsSafe(column, RowInColumn) 6 if RowInColumn[column] > 8 7 then column  column – 1 8 else if column < 8 9 then column  column RowInColumn[column]  0 11 else draw chessboard 12 until column = 0 Special algorithms Algorithms and Data Structures I

Complexity of algorithms
Questions regarding an algorithm: Does it solve the problem? How fast does it solve the problem? How much storage place does it occupy to solve the problem? Complexity issues of the algorithm Analysis of algorithms Algorithms and Data Structures I

Elementary storage or time: independent from the size of the input. Example 1 If an algorithm needs 500 kilobytes to store some internal data, this can be considered as elementary. Example 2 If an algorithm contains a loop whose body is executed 1000 times, it counts as an elementary algorithmic step. Analysis of algorithms Algorithms and Data Structures I

Hence a block of instructions count as a single elementary step if none of the particular instructions depends on the size of the input. A looping construct counts as a single elementary step if the number of iterations it executes does not depend on the size of the input and its body is an elementary step. ⇒ to shoe a horse can be considered as an elementary step ⇔ it takes constant time (one step) to shoe a horse Analysis of algorithms Algorithms and Data Structures I

The time complexity of an algorithm is a function depending on the size of the input. Notation: T(n) where n is the size of the input Function T can depend on more than one variable, e.g. T(n,m) if the input of the algorithm is an n⨯m matrix. Analysis of algorithms Algorithms and Data Structures I

Example: Find the minimum of an array. Minimum(A) 1 min  A[1] 2 i  1 3 repeat 4 i  i if A[i] < min 6 then min  A[i] 7 until i  A.Length 8 return min 1 n − 1 1 Analysis of algorithms Algorithms and Data Structures I

Hence T(n) = n (where n = A.Length) Does this change if line 8 (return min) is considered as an extra step? In other words: n ≈ n + 1 It does not change! Proof: n + 1 = (n − 1) + 2 ? this counts as a single elementary step ≈ (n − 1) + 1 = n Analysis of algorithms Algorithms and Data Structures I

This so-called asymptotic behavior can be formulated rigorously in the following way: We say that f (x) = O(g(x)) (big O notation) if (∃C, x0 > 0) (∀x ≥ x0) 0 ≤ f (x) ≤ C∙g(x) means that g is an asymptotic upper bound of f Analysis of algorithms Algorithms and Data Structures I

C∙g(x) f (x) g(x) x0 Analysis of algorithms Algorithms and Data Structures I

The O notation denotes an upper bound. If g is also a lower bound of f then we say that f (x) = θ (g(x)) if (∃c, C, x0 > 0) (∀x ≥ x0) 0 ≤ c∙g(x) ≤ f (x) ≤ C∙g(x) means that f asymptotically equals g Analysis of algorithms Algorithms and Data Structures I

f (x) C∙g(x) g(x) c∙g(x) x0C x0c =x0 Analysis of algorithms Algorithms and Data Structures I

What does the asymptotic notation show us? We have seen: T(n) = θ (n) for the procedure Minimum(A) where n = A.Length However, due to the definition of the θ function T(n) = θ (n), T(2n) = θ (n), T(3n) = θ (n) ... Minimum does not run slower on more data? ? Analysis of algorithms Algorithms and Data Structures I

What does the asymptotic notation show us? Asymtotic notation shows us the tendency: T(n) = θ (n2) quadratic tendency n data → a certain amount of time t 2n data → time ≈ 22t = 4t 3n data → time ≈ 32t = 9t T(n) = θ (n) linear tendency n data → a certain amount of time t 2n data → time ≈ 2t 3n data → time ≈ 3t Analysis of algorithms Algorithms and Data Structures I

Analyzing recursive algorithms
Recursive algorithm – recursive function T Example: Towers of Hanoi TowersOfHanoi(n,FirstRod,SecondRod,ThirdRod) 1 if n > 0 2 then TowersOfHanoi(n – 1,FirstRod,ThirdRod,SecondRod) write “Move a disk from ” FirstRod “ to ” SecondRod TowersOfHanoi(n – 1, ThirdRod,SecondRod,FirstRod) T(n)= T(n−1) +T(n−1) +1 =2T(n−1)+1 Analysis of algorithms Algorithms and Data Structures I

T(n) = 2T(n−1) + 1 is a recursive function In general it is very difficult (sometimes insoluble) to determine the explicit form of an implicit (recursive) formula If the algorithm is recursive, the solution can be achieved using recursion trees. T(n)= =2T(n−1)+1 Analysis of algorithms Algorithms and Data Structures I

Recursion tree of TowersOfHanoi:
1 n−2 1 2 4 2n−1 2n−1 Analysis of algorithms Algorithms and Data Structures I

Time complexity: T(n) = 2n − 1 = θ (2n) − exponential time (very slow) Example: n = 64 (from the original legend) T(n) = 2n − 1 = 264 − 1 ≈ 1.8∙1019 seconds = ≈ 3∙1017 minutes = ≈ 5.1∙1015 hours = ≈ 2.1∙1014 days = ≈ 5.8∙1011 years > half a trillion years = (assuming one disk move per second) Analysis of algorithms Algorithms and Data Structures I

Different cases Problem (example): search a given element in a sequence (array). LinearSearch(A,w) 1 i  0 2 repeat i  i until A[i] = w or i = A.Length 4 if A[i] = w then return i 5 else return NIL Analysis of algorithms Algorithms and Data Structures I

Array: Best case Element wanted: 8 Time complexity: T(n) = 1 = θ (1) Worst case Element wanted: 2 Time complexity: T(n) = n = θ (n) 8 1 3 9 5 6 2 Average case? Analysis of algorithms Algorithms and Data Structures I

Array: The mean value of the time complexities on all possible inputs: T(n) = = n∙(n + 1) / 2n = (n + 1) / 2 = θ (n) (The same as in the worst case) 8 8 1 1 3 3 9 9 5 5 6 6 2 2 ( 1 + 2 + 3 + 4 + ... + n ) / n = Average case? Analysis of algorithms Algorithms and Data Structures I

Basic data structures To store a set of data of the same type in a linear structure, two basic solutions exist: Arrays: physical sequence in the memory Linked lists: the particular elements are linked together using links (pointers or indices) 18 29 22 18 29 22 head key link Arrays and linked lists Algorithms and Data Structures I

Arrays vs. linked lists Time complexity of some operations on arrays and linked lists in the worst case Search Insert Delete Minimum Maximum Successor Predecessor Array O(n) Linked list O(1) Arrays and linked lists Algorithms and Data Structures I

Doubly linked lists: Dummy head lists: Indirection (indirect reference): pointer.key Double indirection: pointer.link.key 18 29 22 head dummy head X 18 29 22 pointer to be continued... Arrays and linked lists Algorithms and Data Structures I

Array representation of linked lists dummy head X 18 29 22 1 2 3 4 5 6 7 8 22 X 18 29 key 5 7 2 link dummy head 3 Problem: a lot of garbage Arrays and linked lists Algorithms and Data Structures I

Garbage collection for array-represented lists The empty cells are linked to a separate garbage list using the link array: 1 2 3 4 5 6 7 8 22 X 18 29 key 8 5 7 1 2 4 link dummy head 3 6 garbage Arrays and linked lists Algorithms and Data Structures I

To allocate place for a new key and use it: the first element of the garbage list is linked out from the garbage and linked into the proper list with a new key (33 here) if necessary. 1 2 3 4 5 6 7 8 22 X 18 33 29 key 8 5 6 7 5 1 2 4 link dummy head 3 1 6 6 garbage new Arrays and linked lists Algorithms and Data Structures I

Pseudocode for garbage management Allocate(link) 1 if link.garbage = 0 2 then return 0 3 else new  link.garbage 4 link.garbage  link[link.garbage] 5 return new Free(index,link) 1 link[index]  link.garbage 2 link.garbage  index Arrays and linked lists Algorithms and Data Structures I

Dummy head linked lists (...continued) FindAndDelete for simple linked lists FindAndDelete(toFind,key,link) 1 if key[link.head] = toFind 2 then toDelete  link.head 3 link.head  link[link.head] 4 Free(toDelete,link) 5 else toDelete  link[link.head] 6 pointer  link.head 7 while toDelete  0 and key[toDelete]  toFind 8 do pointer  toDelete 9 toDelete  link[toDelete] 10 if toDelete  0 11 then link[pointer]  link[toDelete] Free(toDelete,link) extra case: the first element is to be deleted an additional pointer is needed to step forward Arrays and linked lists Algorithms and Data Structures I

Dummy head linked lists (...continued) FindAndDelete for dummy head linked lists FindAndDeleteDummy(toFind,key,link) 1 pointer  link.dummyhead 2 while link[pointer]  0 and key[link[pointer]]  toFind 3 do pointer  link[pointer] 4 if link[pointer]  0 5 then toDelete  link[pointer] 6 link[pointer]  link[toDelete] 7 Free(toDelete,link) Arrays and linked lists Algorithms and Data Structures I

Stacks and queues Common properties: only two operations are defined: store a new key (called push and enqueue, resp.) extract a key (called pop and dequeue, resp.) all (both) operations work in constant time Different properties: stacks are LIFO structures queues are FIFO (or pipeline) structures Stacks and queues Algorithms and Data Structures I

Two erroneous cases: an empty data structure is intended to be extracted from: underflow no more space but insertion attempted: overflow Stacks and queues Algorithms and Data Structures I

Stack management using arrays push(8) push(1) Stack: 3 push(3) 1 push(9) Stack overflow 8 pop  top pop pop pop Stack underflow Stacks and queues Algorithms and Data Structures I

Stack management using arrays Push(key,Stack) 1 if Stack.top = Stack.Length 2 then return Overflow error 3 else Stack.top  Stack.top Stack[Stack.top]  key  stack overflow Stacks and queues Algorithms and Data Structures I

Stack management using arrays Pop(Stack) 1 if Stack.top = 0 2 then return Underflow error 3 else Stack.top  Stack.top − 1 4 return Stack[Stack.top + 1]  stack underflow Stacks and queues Algorithms and Data Structures I

Queue management using arrays ? Queue: end ↓ Empty queue: beginning = n end = 0 8 3 1 6 4 7 2 9 5 ← beginning Stacks and queues Algorithms and Data Structures I

Queue management using arrays Enqueue(key,Queue) 1 if Queue.beginning = Queue.end 2 then return Overflow error 3 else if Queue.end = Queue.Length 4 then Queue.end  1 5 else Queue.end  Queue.end Queue[Queue.end]  key queue overflow Stacks and queues Algorithms and Data Structures I

Queue management using arrays Dequeue(Queue) 1 if Queue.end = 0 2 then return Underflow error 3 else if Queue.beginning = Queue.Length 4 then Queue.beginning  1 5 else inc(Queue.beginning) 6 key  Queue[Queue.beginning] 7 if Queue.beginning = Queue.end 8 then Queue.beginning  Queue.Length 9 Queue.end  0 10 return key  queue underflow Stacks and queues Algorithms and Data Structures I

Binary search trees Linear data structures cannot provide better time complexity than n in some cases Idea: let us use another kind of structure Solution: rooted trees (especially binary trees) special order of keys (‘search trees’) Binary search trees Algorithms and Data Structures I

A binary tree: Notions: depth (height) levels root vertex (node) edge parent - child twins (siblings) leaf Binary search trees Algorithms and Data Structures I

A binary tree: search for all vertices 28 all keys in the left subtree are smaller all keys in the right subtree are greater 12 30 7 21 49 14 26 50 Binary search trees Algorithms and Data Structures I

Implementation of binary search trees: 28 12 30 21 14 26 49 50 7 link to the parent key and other data link to the left child link to the right child Binary search trees Algorithms and Data Structures I

Binary search tree operations: tree walk inorder: left root right 7 28 12 increasing order 14 21 12 30 26 28 30 7 21 49 49 50 14 26 50 Binary search trees Algorithms and Data Structures I

InorderWalk(Tree) 1 if Tree  NIL 2 then InorderWalk(Tree.Left) 3 visit Tree, e.g. check it or list it 4 InorderWalk(Tree.Right) The so-called preorder and postorder tree walks only differ by the order of lines 2-4: preorder: root → left → right postorder: left → right → root Binary search trees Algorithms and Data Structures I

Binary search tree operations: tree search 28 TreeSearch(14) < < TreeSearch(45) 12 30 < < 7 21 49 < < 14 26 50 Binary search trees Algorithms and Data Structures I

TreeSearch(toFind,Tree) 1 while Tree  NIL and Tree.key  toFind 2 do if toFind < Tree.key 3 then Tree  Tree.Left 4 else Tree  Tree.Right 5 return Tree Binary search trees Algorithms and Data Structures I

Binary search tree operations: insert 28 TreeInsert(14) < 12 30 < 7 21 49 < new vertices are always inserted as leaves 14 26 50 Binary search trees Algorithms and Data Structures I

Binary search tree operations: tree minimum tree maximum 28 12 30 7 21 49 14 26 50 Binary search trees Algorithms and Data Structures I

TreeMinimum(Tree) 1 while Tree.Left  NIL 2 do Tree  Tree.Left 3 return Tree TreeMaximum(Tree) 1 while Tree.Right  NIL 2 do Tree  Tree.Right Binary search trees Algorithms and Data Structures I

Binary search tree operations: successor of an element 28 TreeSuccessor(12) if the element has no right child: parent-left child relation 12 30 TreeSuccessor(26) tree minimum 7 21 49 14 26 50 Binary search trees Algorithms and Data Structures I

TreeSuccessor(Element) 1 if Element.Right  NIL 2 then return TreeMinimum(Element.Right) 3 else Above  Element.Parent 4 while Above  NIL and Element = Above.Right 5 do Element  Above 6 Above  Above.Parent 7 return Above Finding the predecessor is similar. Binary search trees Algorithms and Data Structures I

Binary search tree operations: delete 28 1. if the element has no children: TreeDelete(26) 12 30 7 21 49 14 26 50 Binary search trees Algorithms and Data Structures I

Binary search tree operations: delete 28 2. if the element has only one child: TreeDelete(30) 12 30 49 50 7 21 14 26 Binary search trees Algorithms and Data Structures I

Binary search tree operations: delete 28 3. if the element has two children: 12 is substituted for a close key, e.g. the successor, 14 TreeDelete(12) 12 30 tree minimum 49 50 7 21 the successor, found in the right subtree has at most one child 14 26 Binary search trees Algorithms and Data Structures I

The case if Element has no children: TreeDelete(Element,Tree) 1 if Element.Left = NIL and Element.Right = NIL 2 then if Element.Parent = NIL 3 then Tree  NIL 4 else if Element = (Element.Parent).Left 5 then (Element.Parent).Left  NIL 6 else (Element.Parent).Right  NIL 7 Free(Element) 8 return Tree 9- next page Binary search trees Algorithms and Data Structures I

The case if Element has only a right child: -8 previous page 9 if Element.Left = NIL and Element.Right  NIL 10 then if Element.Parent = NIL 11 then Tree  Element.Right (Element.Right).Parent  NIL 13 else (Element.Right).Parent  Element.Parent if Element = (Element.Parent).Left then (Element.Parent).Left  Element.Right else (Element.Parent).Right  Element.Right 17 Free(Element) 18 return Tree 19- next page Binary search trees Algorithms and Data Structures I

The case if Element has only a left child: -18 previous page 19 if Element.Left  NIL and Element.Right = NIL 20 then if Element.Parent = NIL 21 then Tree  Element.Left (Element.Left).Parent  NIL 23 else (Element.Left).Parent  Element.Parent if Element = (Element.Parent).Left then (Element.Parent).Left  Element.Left else (Element.Parent).Right  Element.Left 27 Free(Element) 28 return Tree 29- next page Very similar to the previous case Binary search trees Algorithms and Data Structures I

The case if Element has two children: -28 previous page 29 if Element.Left  NIL and Element.Right  NIL 30 then Substitute  TreeSuccessor(Element) 31 if Substitute.Right  NIL 32 then (Substitute.Right).Parent  Substitute.Parent 33 if Substitute = (Substitute.Parent).Left 34 then (Substitute.Parent).Left  Substitute.Right 35 else (Substitute.Parent).Right  Substitute.Right 36 Substitute.Parent  Element.Parent 37 if Element.Parent = NIL 38 then Tree  Substitute 39 else if Element = (Element.Parent).Left then (Element.Parent).Left  Substitute else (Element.Parent).Right  Substitute 42 Substitute.Left  Element.Left 43 (Substitute.Left).Parent  Substitute 44 Substitute.Right  Element.Right 45 (Substitute. Right).Parent  Substitute 27 Free(Element) 28 return Tree Substitute is linked out from its place Substitute is linked into Elements place Binary search trees Algorithms and Data Structures I

Time complexity of binary search tree operations T(n) = O(d) for all operations (except for the walk), where d denotes the depth of the tree The depth of any randomly built binary search tree is d = O(log n) Hence the time complexity of the search tree operations in the average case is T(n) = O(log n) Stacks and queues Algorithms and Data Structures I

Binary search If insert and delete is used rarely then it is more convenient and faster to use an oredered array instead of a binary search tree. Faster: the following operations have T(n) = O(1) constant time complexity: minimum, maximum, successor, predecessor. Search has the same T(n) = O(log n) time complexity as on binary search trees: Binary search Algorithms and Data Structures I

Search has the same T(n) = O(log n) time complexity as on binary search trees: Let us search key 29 in the ordered array below: central element < 2 3 7 12 29 31 45 search here Binary search Algorithms and Data Structures I

Search has the same T(n) = O(log n) time complexity as on binary search trees: Let us search key 29 in the ordered array below: < central element 2 3 7 12 29 31 45 search here Binary search Algorithms and Data Structures I

Search has the same T(n) = O(log n) time complexity as on binary search trees: Let us search key 29 in the ordered array below: central element found! = 2 3 7 12 29 31 45 search here Binary search Algorithms and Data Structures I

Search has the same T(n) = O(log n) time complexity as on binary search trees: O(log n) 2 3 7 12 29 31 45 This result can also be derived from: if we halve n elements k times, we get 1 ⇔ n / 2k = 1 ⇔ k = log2 n = O(log n) Binary search Algorithms and Data Structures I

Sorting Problem There is a set of data from a base set with a given order over it (e.g. numbers, texts). Arrange them according to the order of the base set. Example 12 2 7 3 sorting Sorting Algorithms and Data Structures I

Sorting sequences We sort sequences in a lexicographical order: from two sequences the sequence is ‘smaller’ which has a smaller value at the first position where they differ. Example (texts) < ? g o n e g o o d n < o in the alphabet Sorting Algorithms and Data Structures I

Insertion sort Principle 14 8 69 22 75 Insertion sort Algorithms and Data Structures I

Implementation of insertion sort with arrays insertion step: 22 69 75 38 14 sorted part unsorted part Insertion sort Algorithms and Data Structures I

InsertionSort(A) 1 for i  2 to A.Length 2 do ins  A[i] 3 j  i – 1 4 while j > 0 and ins < A[j] 5 do A[j + 1]  A[j] 6 j  j – 1 7 A[j + 1]  ins Insertion sort Algorithms and Data Structures I

Time complexity of insertion sort Best case In each step the new element is inserted to the end of the sorted part: T(n) = = n − 1 = θ (n) Worst case In each step the new element is inserted to the beginning of the sorted part: T(n) = n = n(n + 1)/2 − 1 = θ (n2) Insertion sort Algorithms and Data Structures I

Time complexity of insertion sort Average case In each step the new element is inserted somewhere in the middle of the sorted part: T(n) = 2/2 + 3/2 + 4/ n/2 = = (n(n + 1)/2 − 1) / 2 = θ (n2) The same as in the worst case Insertion sort Algorithms and Data Structures I

Another implementation of insertion sort The input is providing elements continually (e.g. file, net) The sorted part is a linked list where the elements are inserted one by one The time complexity is the same in every case. Insertion sort Algorithms and Data Structures I

Another implementation of insertion sort The linked list implementation delivers an on-line algorithm: after each step the subproblem is completely solved the algorithm does not need the whole input to partially solve the problem Cf. off-line algorithm: the whole input has to be known prior to the substantive procedure Insertion sort Algorithms and Data Structures I

Merge sort Principle 8 69 14 8 75 14 69 75 2 22 25 36 2 22 25 36 sort the parts recursively Merge sort Algorithms and Data Structures I

8 14 69 75 2 22 25 36 merge (comb) the parts ready Merge sort Algorithms and Data Structures I

Time complexity of merge sort
Merge sort is a recursive algorithm, and so is its time complexity function T(n) What it does: First it halves the actual (sub)array: O(1) Then calls itself for the two halves: 2T(n/2) Last it merges the two ordered parts: O(n) Hence T(n) = 2T(n/2) + O(n) = ? Merge sort Algorithms and Data Structures I

Recursion tree of merge sort:
1 1 1 1 n n∙log n Merge sort Algorithms and Data Structures I

Time complexity of merge sort is T(n) = θ (n∙logn) This worst case time complexity is optimal among comparison sorts (using only pair comparisons) ⇒ fast but unfortunately merge sort does not sort in-place, i.e. it uses auxiliary storage of a size comparable with the input Merge sort Algorithms and Data Structures I

Heapsort An array A is called heap if for all its elements A[i] ≥ A[2i] and A[i] ≥ A[2i + 1] This property is called heap property It is easier to understand if a binary tree is built from the elements filling the levels row by row 45 27 34 20 23 31 18 19 3 14 Heapsort Algorithms and Data Structures I

45 27 34 20 23 31 18 19 3 14 Heapsort Algorithms and Data Structures I

1 45 2 27 3 34 4 20 5 23 6 31 7 18 The heap property turns into a simple parent-child relation in the tree representation 8 19 9 3 10 14 Heapsort Algorithms and Data Structures I

An important application of heaps is realizing priority queues: A data structure supporting the operations insert maximum (or minimum) extract maximum (or extract minimum) Heapsort Algorithms and Data Structures I

First we have to build a heap from an array. Let us suppose that only the kth element infringes the heap property. In this case it is sunk level by level to a place where it fits. In the example k = 1 (the root): Heapsort Algorithms and Data Structures I

1 15 2 37 3 34 4 20 5 23 6 31 7 18 k = 1 The key and its children are compared It is exchanged for the greater child 8 19 9 3 10 14 Heapsort Algorithms and Data Structures I

1 37 2 15 3 34 4 20 5 23 6 31 7 18 k = 2 The key and its children are compared It is exchanged for the greater child 8 19 9 3 10 14 Heapsort Algorithms and Data Structures I

1 37 2 23 3 34 4 20 5 15 6 31 7 18 k = 5 The key and its children are compared It is the greatest ⇒ ready 8 19 9 3 10 14 Heapsort Algorithms and Data Structures I

Sink(k,A) 1 if 2*k ≤ A.HeapSize and A[2*k] > A[k] 2 then greatest  2*k 3 else greatest  k 4 if 2*k + 1 ≤ A.HeapSize and A[2*k + 1] > A[greatest] 5 then greatest  2*k if greatest  k 7 then Exchange(A[greatest],A[k]) 8 Sink(greatest,A) Heapsort Algorithms and Data Structures I

To build a heap from an arbitrary array, all elements are mended by sinking them: BuildHeap(A) 1 A.HeapSize  A.Length 2 for k  A.Length / 2 downto 1 3 do Sink(k,A) this is the array’s last element that has any children we are stepping backwards; this way every visited element has only ancestors which fulfill the heap property Heapsort Algorithms and Data Structures I

Time complexity of building a heap To sink an element costs O(logn) in the worst case Since n/2 elements have to be sunk, an upper bound for the BuildHeap procedure is T(n) = O(n∙logn) It can be proven that the sharp bound is T(n) = θ (n) Heapsort Algorithms and Data Structures I

Time complexity of the priority queue operations if the queue is realized using heaps insert append the new element to the array O(1) rise the new element O(logn) The time complexity is T(n) = O(logn) Heapsort Algorithms and Data Structures I

Time complexity of the priority queue operations if the queue is realized using heaps maximum read out the key of the root O(1) The time complexity is T(n) = O(1) Heapsort Algorithms and Data Structures I

Time complexity of the priority queue operations if the queue is realized using heaps extract maximum exchange the root for the array’s last element O(1) extract the last element O(1) sink the root O(logn) The time complexity is T(n) = O(logn) Heapsort Algorithms and Data Structures I

The heapsort algorithm build a heap θ (n) iterate the following (n−1)∙O(logn) = O(n∙logn): exchange the root for the array’s last element O(1) exclude the heap’s last element from the heap O(1) sink the root O(logn) The time complexity is T(n) = O(n∙logn) Heapsort Algorithms and Data Structures I

HeapSort(A) 1 BuildHeap(A) 2 for k  A.Length downto 2 3 do Exchange(A[1],A[A.HeapSize]) 4 A.HeapSize  A.HeapSize – 1 5 Sink(1,A) Heapsort Algorithms and Data Structures I

Quicksort Principle 22 69 8 75 25 12 14 36 Rearrange and part the elements so that every key in the first part is smaller than any in the second part. Quicksort Algorithms and Data Structures I

Quicksort Principle 12 14 8 75 69 22 25 36 Rearrange and part the elements so that every key in the first part is smaller than any in the second part. Quicksort Algorithms and Data Structures I

Quicksort Principle 12 14 8 12 8 14 8 12 14 22 36 69 25 75 75 69 22 25 36 22 25 36 69 75 Sort each part recursively, this will result in the whole array being sorted. Quicksort Algorithms and Data Structures I

The partition algorithm choose any of the keys stored in the array; this will be the so-called pivot key exchange the large elements at the beginning of the array to the small ones at the end of it pivot key not less than the pivot key not greater than the pivot key 22 22 69 8 75 25 12 14 36 Quicksort Algorithms and Data Structures I

Partition(A,first,last) 1 left  first – 1 2 right  last pivotKey  A[RandomInteger(first,last)] 4 repeat 5 repeat left  left until A[left] ≥ pivotKey 7 repeat right  right – 1 8 until A[right] ≤ pivotKey 9 if left < right 10 then Exchange(A[left],A[right]) 11 else return right 12 until false Quicksort Algorithms and Data Structures I

The time complexity of the partition algorithm is T(n) = θ (n) because each element is visited exactly once. The sorting is then: QuickSort(A,first,last) 1 if first < last 2 then border  Partition(A,first,last) 3 QuickSort(A,first,border) 4 QuickSort(A,border+1,last) Quicksort Algorithms and Data Structures I

Quicksort is a divide and conquer algorithm like merge sort, however, the partition is unbalanced (merge sort always halves the subarray). The time complexity of a divide and conquer algorithm highly depends on the balance of the partition. In the best case the quicksort algorithm halves the subarrays at every step ⇒ T(n) = θ (n∙logn) Quicksort Algorithms and Data Structures I

Recursion tree of the worst case n n 1 n − 1 n − 1 1 n − 2 n − 2 1 1 n∙(n + 1) / 2 Quicksort Algorithms and Data Structures I

the same as in the best case!
Thus, the worst case time complexity of sort is T(n) = θ (n2) The average case time complexity is T(n) = θ (n∙logn) the same as in the best case! The proof is difficult but let’s see a special case to understand quicksort better. quick Quicksort Algorithms and Data Structures I

Let λ be a positive number smaller than 1: 0 < λ < 1 Assumption: the partition algorithm never provides a worse partition ratio than (1− λ) : λ Example 1: Let λ := 0.99 The assumption demands that the partition algorithm does not leave less than 1% as the smaller part. Quicksort Algorithms and Data Structures I

Example 2: Let λ := Due to the assumption, if we have at most one billion(!) elements then the assumption is fulfilled for any functioning of the partition algorithm. (Even if it always cuts off only one element from the others). In the following it is assumed for the sake of simplicity that λ ≥ 0.5, i.e. always the λ part is bigger. Quicksort Algorithms and Data Structures I

Recursion tree of the λ ratio case n n (1 − λ)n λn n (1 − λ)λn λ2n ≤ n λdn ≤ n ≤ n∙logn Quicksort Algorithms and Data Structures I

In the special case if none of the parts arising at the partitions are bigger than a given λ ratio (0.5 ≤ λ < 1), the time complexity of quicksort is T(n) = O(n∙logn) The time complexity of quicksort is practically optimal because the number of elements to be sorted is always bounded by a number N (finite storage). Using the value λ = 1 − 1/N it can be proven that quicksort finishes in O(n∙logn) time in every possible case. Quicksort Algorithms and Data Structures I

Greedy algorithms Problem Optimization problem: Let a function f(x) be given. Find an x where f is optimal (minimal or maximal) ‘under given circumstances’ ‘Given circumstances’: An optimization problem is constrained if functional constraints have to be fulfilled such as g(x) ≤ 0 Greedy algorithms Algorithms and Data Structures I

Feasible set: the set of those x values where the given constraints are fulfilled Constrained optimization problem: minimize f(x) subject to g(x) ≤ 0 Greedy algorithms Algorithms and Data Structures I

Example Problem: There is a city A and other cities B1,B2,...,Bn which can be reached from A by bus directly. Find the farthest of these cities where you can travel so that your money suffices. A B1 B2 Bn ... Greedy algorithms Algorithms and Data Structures I

Model: Let x denote any of the cities: x ∊ {B1,B2,...,Bn}, f(x) the distance between A and x, t(x) the price of the bus ticket from A to x, m the money you have, and g(x) = t(x) − m the constraint function. The constrained optimization problem to solve: minimize (− f(x)) s.t. g(x) ≤ 0 Greedy algorithms Algorithms and Data Structures I

In general, optimization problems are much more difficult!
However, there is a class of optimization problems which can be solved using a step-by-step simple straightforward principle: greedy algorithms: at each step the same kind of decision is made, striving for a local optimum, and decisions of the past are never revisited. Greedy algorithms Algorithms and Data Structures I

Question:Which problems can be solved using greedy algorithms? Answer: Problems which obey the following two rules: Greedy choice property: If a greedy choice is made first, it can always be completed to achieve an optimal solution to the problem. Optimal substructure property: Any substructure of an optimal solution provides an optimal solution to the adequate subproblem. Greedy algorithms Algorithms and Data Structures I

Counter example Find the shortest route from Szeged to Budapest. The greedy choice property is infringed: You cannot simply choose the closest town first  Greedy algorithms Algorithms and Data Structures I

Budapest Szeged Deszk Deszk is the closest to Szeged but situated in the opposite direction Greedy algorithms Algorithms and Data Structures I

Proper example Activity-selection problem: Let’s spend a day watching TV. Aim: Watch as many programs (on the wole) as you can. Greedy strategy: Watch the program ending first, then the next you can watch on the whole ending first, etc. Activity-selection problem Algorithms and Data Structures I

No more programs left: ready Exclude those which have already begun
Include the first one Let’s sort the programs by their ending times The optimum is 4 (TV programs) Activity-selection problem Algorithms and Data Structures I

Check the greedy choice property: The first choice of any optimal solution can be exchanged for the greedy one Activity-selection problem Algorithms and Data Structures I

Check the optimal substructure property: The part of an optimal solution is optimal also for the subproblem If this was not optimal for the subproblem, the whole solution could be improved by improving the subproblem’s solution Activity-selection problem Algorithms and Data Structures I

Huffman codes Notions C is an alphabet if it is a set of symbols F is a file over C if it is a text built up of the characters of C Huffman codes Algorithms and Data Structures I

Assume we have the following alphabet C = {a, b, c, d, e} Code it with binary codewords of equal length How many bits per codeword do we need at least? 2 are not enough (only four codewords: 00, 01, 10, 11) Build codewords using 3 bit coding a = 000 b = 001 c = 010 d = 011 e = 100 Huffman codes Algorithms and Data Structures I

b c d e 1 a = 000 b = 001 c = 010 d = 011 e = 100 a = 000 b = 001 c = 010 d = 011 e = 100 a = 000 b = 001 c = 010 d = 011 e = 100 c a = 000 b = 001 c = 010 d = 011 e = 100 1 a = 000 b = 001 c = 010 d = 011 e = 100 1 a = 000 b = 001 c = 010 d = 011 e = 100 a = 000 b = 001 c = 010 d = 011 e = 100 Build the T binary tree of the coding Huffman codes Algorithms and Data Structures I

Further notation For each cC character its frequency in the file is denoted by f(c) For each cC character its length is defined by its depth in the T tree of coding, dT(c) Hence the length of the file (in bits) equals B(T)=cC f(c)dT(c) Huffman codes Algorithms and Data Structures I

Problem Let a C alphabet and a file over it given. Find a T coding of the alphabet with minimal B(T) Huffman codes Algorithms and Data Structures I

Example Consider an F file of 20,000 characters over the alphabet C = {a, b, c, d, e} Assume the frequencies of the particular characters in the file are f(a) = 5,000 f(b) = 2,000 f(c) = 6,000 f(d) = 3,000 f(e) = 4,000 Huffman codes Algorithms and Data Structures I

Using the 3 bit coding defined previously, the bit-length of the file equals B(T)=cC f(c)dT(c)= 5,0003+2,0003+6,0003+3,0003+4,0003= (5,000+2,000+6,000+3,000+4,000)3= 20,0003=60,000 This is a so-called fixed-length code since for all x,yC dT(x)=dT(y) holds Huffman codes Algorithms and Data Structures I

The fixed-length code is not always optimal 1 1 B(T’)= B(T)−f(e)1= 60,000−4,0001 = 56,000 a b c d 1 e Huffman codes Algorithms and Data Structures I

Idea Construct a variable-length code, i.e., where the code-lengths for different characters can differ from each other We expect that if more frequent characters get shorter codewords then the resulting file will become shorter Huffman codes Algorithms and Data Structures I

Problem: How do we recognize when a codeword ends and a new begins. Using delimiters is too “expensive” Solution: Use prefix codes, i.e., codewords none of which is also a prefix of some other codeword Result: The codewords can be decoded without using delimiters Huffman codes Algorithms and Data Structures I

For instance if then the following codes’ meaning is = However, what if a variable-length code was not prefix-free: a = 10 b = 010 c = 00 a c b c c a b Huffman codes Algorithms and Data Structures I

Then if then 100= b or 100= a c ? An extra delimiter would be needed a = 10 b = 100 c = 0 a = 10 b = 100 c = 0 Huffman codes Algorithms and Data Structures I

Realize the original idea with prefix codes f(a) = 5,000 f(b) = 2,000 f(c) = 6,000 f(d) = 3,000 f(e) = 4,000 rare frequent Frequent codewords should be shorter, e.g., a = 00, c = 01, e = 10 Rare codewords can be longer, e.g., b = 110, d = 111 Huffman codes Algorithms and Data Structures I

Question: How can such a coding be done algorithmically? Answer: The Huffman codes provide exactly this solution Huffman codes Algorithms and Data Structures I

The bitlength of the file using this K prefix code is B(K)=cC f(c)dK(c)= 5,0002+2,0003+6,0002+3,0003+4,0002= (5,000+6,000+4,000)2+(2,000+3,000 )3= 30,000+15,000=45,000 (cf. the fix-length codes gave 60,000, the improved one 56,000) Huffman codes Algorithms and Data Structures I

The greedy method producing Huffman codes Sort the characters of the C alphabet in increasing order according to their frequency in the file and link them to an empty list Delete the two leading characters, some x and y from the list and connect them with a common parent z node. Let f(z)=f(x)+f(y), insert z into the list and repeat step 2 until the the list runs empty. Huffman codes Algorithms and Data Structures I

Example character frequency (thousands) List: a : 5 b : 2 c : 6 d : 3 e : 4 Huffman codes Algorithms and Data Structures I

Example 1. Sort List: a : 5 b : 2 c : 6 d : 3 e : 4 Huffman codes Algorithms and Data Structures I

Example 2. Merge and rearrange 5 List: b : 2 d : 3 e : 4 a : 5 c : 6 Huffman codes Algorithms and Data Structures I

Example 2. Merge and rearrange 9 List: b : 2 d : 3 5 e : 4 a : 5 c : 6 Huffman codes Algorithms and Data Structures I

Example 2. Merge and rearrange 11 List: a : 5 c : 6 e : 4 b : 2 d : 3 5 9 Huffman codes Algorithms and Data Structures I

Example 2. Merge and rearrange 20 1 List: e : 4 b : 2 d : 3 5 9 a : 5 c : 6 11 Huffman codes Algorithms and Data Structures I

Example Ready e : 4 b : 2 d : 3 5 9 a : 5 c : 6 11 20 1 a = 10 b = 010 c = 11 d = 011 e = 00 Huffman codes Algorithms and Data Structures I

f(a) = 5,000 f(b) = 2,000 f(c) = 6,000 f(d) = 3,000 f(e) = 4,000 Example Length of file in bits a = 10 b = 010 c = 11 d = 011 e = 00 B(H)=cC f(c)dH(c)= 5,0002+2,0003+6,0002+3,0003+4,0002= (5,000+6,000+4,000)2+(2,000+3,000 )3= 30,000+15,000=45,000 Huffman codes Algorithms and Data Structures I

Optimality of the Huffman codes Assertion 1. There exists an optimal solution where the two rarest characters are deepest twins in the tree of the coding Assertion 2. Merging two (twin) characters leads to a problem similar to the original one Corollary. The Huffman codes provide an optimal character coding Huffman codes Algorithms and Data Structures I

Proof of Assertion 1 (There exists an optimal solution where the two rarest characters are deepest twins in the tree of the coding). Two rarest characters Changing nodes this way the total lenght does not increase Huffman codes Algorithms and Data Structures I

Proof of Assertion 2 (Merging two (twin) characters leads to a problem similar to the original one). Twin characters The new problem is smaller than the original one but similar to it Huffman codes Algorithms and Data Structures I

Graph representations
Graphs can represent different structures, connections and relations 1 4 2 3 7 5 1 7 Weighted graphs can represent capacities or actual flow rates 4 2 4 5 2 3 Graphs Algorithms and Data Structures I

Adjacency-matrix 1 7 1 2 3 4 1 2 3 4 7 5 1 2 3 4 7 5 1 2 3 4 7 5 4 2 4 5 2 3 1: there is an edge leading from ‘row’ to ‘column’ 0: there is no such edge Drawback 1: redundant elements Drawback 2: superfluous elements Graphs Algorithms and Data Structures I

Adjacency-list 1 4 1 2 4 3 2 3 Optimal storage usage Drawback: slow search operations Graphs Algorithms and Data Structures I

Single-source shortest path methods
Problem: find the shortest path between two vertices in a graph Source: the starting point (vertex) Single-source shortest path method: algorithm to find the shortest path to all vertices in a graph running out Graphs Algorithms and Data Structures I

Walk a graph: choose an initial vertex as the source visit all vertices starting from the source Graph walk methods: depth-first search breadth-first search Graph walk Algorithms and Data Structures I

Depth-first search Backtrack algorithm It goes as far as it can without revisiting any vertex, then backtracks source Graph walk Algorithms and Data Structures I

Breadth-first search Like an explosion in a mine The shockwave reaches the adjacent vertices first, and starts over from them Graph walk Algorithms and Data Structures I

The breadth-first search is not only simpler to implement but it is also the basis for several important graph algorithms (e.g. Dijkstra) Notation in the following pseudocode: A is the adjacency-matrix of the graph s is the source D is an array containing the distances from the source P is an array containing the predecessor along a path Q is the queue containing the unprocessed vertices already reached Graph walk Algorithms and Data Structures I

BreadthFirstSearch(A,s,D,P) 1 for i  1 to A.CountRows 2 do P[i]  0 3 D[i]  ∞ 4 D[s]  0 5 Q.Enqueue(s) 6 repeat 7 v  Q.Dequeue 8 for j  1 to A.CountColumns 9 do if A[v,j] > 0 and D[j] = ∞ 10 then D[j]  D[v] P[j]  v 12 Q.Enqueue(j) 13 until Q.IsEmpty Graph walk Algorithms and Data Structures I

The D,P pairs are displayed in the figure. 1,4 1,4 6 2,6 1 9 0,0 4 3,9 8 1,4 2,6 2 10 5 3,9 1,4 3 7 2,6 D is the shortest distance from the source The shortest paths can be reconstructed using P Graph walk Algorithms and Data Structures I

Dijkstra’s algorithm Problem: find the shortest path between two vertices in a weighted graph Idea: extend the breadth-first search for graphs having integer weights: unweighted edges (total weight = 3∙1 = 3) 3 virtual vertices Dijkstra’s algorithm Algorithms and Data Structures I

Dijkstra(A,s,D,P) 1 for i  1 to A.CountRows 2 do P[i]  0 3 D[i]  ∞ 4 D[s]  0 5 for i  1 to A.CountRows 6 do M.Enqueue(i) 7 repeat 8 v  M.ExtractMinimum 9 for j  1 to A.CountColumns 10 do if A[v,j] > 0 11 then if D[j] > D[v] + A[v,j] 12 then D[j]  D[v] + A[v,j] 13 P[j]  v 14 until M.IsEmpty minimum priority queue Dijkstra’s algorithm Algorithms and Data Structures I

Time complexity of Dikstra’s algorithm Initialization of D and P: O(n)
Building a heap for the priority queue: O(n) Search: n∙O(logn + n) = O(n(logn + n)) = O(n2) Grand total: T(n) = O(n2) extracting the minimum checking all neighbors number of loop executions Dijkstra’s algorithm Algorithms and Data Structures I

Introduction to Algorithms and Data Structures

Similar presentations

Presentation on theme: "Introduction to Algorithms and Data Structures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Algorithms and Data Structures

Similar presentations

Presentation on theme: "Introduction to Algorithms and Data Structures"— Presentation transcript:

Similar presentations

About project

Feedback