Tonga Institute of Higher Education Design and Analysis of Algorithms IT 254 Lecture 5: Advanced Design Techniques.

Slides:



Advertisements
Similar presentations
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Advertisements

Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
1.1 Data Structure and Algorithm Lecture 6 Greedy Algorithm Topics Reference: Introduction to Algorithm by Cormen Chapter 17: Greedy Algorithm.
Greedy Algorithms Greed is good. (Some of the time)
Analysis of Algorithms
David Luebke 1 5/4/2015 CS 332: Algorithms Dynamic Programming Greedy Algorithms.
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2010 Lecture 3 Tuesday, 2/9/10 Amortized Analysis.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
CPSC 411, Fall 2008: Set 6 1 CPSC 411 Design and Analysis of Algorithms Set 6: Amortized Analysis Prof. Jennifer Welch Fall 2008.
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2005 Lecture 3 Tuesday, 2/8/05 Amortized Analysis.
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you.
Tirgul 9 Amortized analysis Graph representation.
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Lecture 3 Tuesday, 2/10/09 Amortized Analysis.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Amortized Analysis (chap. 17) Not just consider one operation, but a sequence of operations on a given data structure. Average cost over a sequence of.
CS333 / Cutler Amortized Analysis 1 Amortized Analysis The average cost of a sequence of n operations on a given Data Structure. Aggregate Analysis Accounting.
Data Structures – LECTURE 10 Huffman coding
Andreas Klappenecker [based on the slides of Prof. Welch]
Greedy Algorithms Huffman Coding
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 2 (Part 2) Tuesday, 9/11/01 Amortized Analysis.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Ajinkya Nene, Lynbrook CS Club. 04/21/2014. Definition Allows you to figure the worst-case bound for the performance of an algorithm (most useful for.
Advanced Algorithm Design and Analysis (Lecture 5) SW5 fall 2004 Simonas Šaltenis E1-215b
CS 473Lecture 121 CS473-Algorithms I Lecture 12 Amortized Analysis.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Amortized Analysis The problem domains vary widely, so this approach is not tied to any single data structure The goal is to guarantee the average performance.
Amortized Analysis Typically, most data structures provide absolute guarantees on the worst case time for performing a single operation. We will study.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
GREEDY ALGORITHMS UNIT IV. TOPICS TO BE COVERED Fractional Knapsack problem Huffman Coding Single source shortest paths Minimum Spanning Trees Task Scheduling.
Huffman Coding Yancy Vance Paredes. Outline Background Motivation Huffman Algorithm Sample Implementation Running Time Analysis Proof of Correctness Application.
Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2004 Simonas Šaltenis E1-215b
Amortized Analysis Some of the slides are from Prof. Leong Hon Wai’s resources at National University of Singapore.
CSCE 411H Design and Analysis of Algorithms Set 6: Amortized Analysis Prof. Evdokia Nikolova* Spring 2013 CSCE 411H, Spring 2013: Set 6 1 * Slides adapted.
David Luebke 1 12/12/2015 CS 332: Algorithms Amortized Analysis.
Greedy Algorithms.
Greedy Algorithms Analysis of Algorithms.
Amortized Analysis. Problem What is the time complexity of n insert operations into an dynamic array, which doubles its size each time it is completely.
Introduction to Algorithms Amortized Analysis My T. UF.
David Luebke 1 2/26/2016 CS 332: Algorithms Dynamic Programming.
Amortized Analysis In amortized analysis, the time required to perform a sequence of operations is averaged over all the operations performed Aggregate.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Amortized Analysis.
Andreas Klappenecker [partially based on the slides of Prof. Welch]
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Greedy Method 6/22/2018 6:57 PM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
The Greedy Method and Text Compression
CSCE 411 Design and Analysis of Algorithms
Amortized Analysis The problem domains vary widely, so this approach is not tied to any single data structure The goal is to guarantee the average performance.
The Greedy Method and Text Compression
Hashing Exercises.
Greedy Algorithm.
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
Algorithms (2IL15) – Lecture 2
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
CSCE 411 Design and Analysis of Algorithms
CS 332: Algorithms Amortized Analysis Continued
Algorithm Design Techniques Greedy Approach vs Dynamic Programming
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
CSCE 411 Design and Analysis of Algorithms
Presentation transcript:

Tonga Institute of Higher Education Design and Analysis of Algorithms IT 254 Lecture 5: Advanced Design Techniques

Other methods of design, like divide-and-conquer and randomization, can usually be applied to many problems. There are newer techniques that are a little more complicated, but allow computer scientists to solve harder problems. We will look at two methods. "Greedy" programming is a way to optimize a solution where you must make a choice. In this method, you make the best choice at each time and by the end you will have made the best choice overall "Amortized Analysis" is a tool for analyzing algorithms that allows us to look at algorithms where the running time changes depending on input, and then to find the average "amortized" running time.

Greedy Algorithms Optimization problems is a category of problems that focus on finding the best solution. In these problems, there can be many possible solutions. Each solution has a value and the goal is to find the solution with the optimal (maybe maximum or minimum) value. Greedy algorithms are a technique used to solve these problems. A greedy algorithm will always make the choice that looks best at the moment it makes it. In mathematical words: It makes the locally optimal choice, hoping that it will lead to the globally optimal choice.

Greedy Algorithm: Activity Selection The first example we will look at is an activity selector. Suppose there is a set S = { 1,2, … n } of n proposed activities that all need to use a resource. For example, a classroom which can only be used by one class at a time. Each activity "i" has a start time and a finish time so that start i < finish i The activity selection problem is to select a maximum size set of activities that all can use a resource. For this problem, we assume that all activities are sorted in order of increasing finish time: –f 1 < f 2 < f 3 < … < f n

Activity Selector We can use the following pseudo-code to demonstrate a greedy algorithm to solve the problem. Here "A" is the set of all activities. "j" is the last activity added. Since activities are listed in sorted order, fj is always the maximum finishing time of any activity in A. So f j = max { f k : k Э A } Lines 3-4 select the first activity and put it in A Lines 5-8 add an activity in the start time is after the finishing time of the last thing added. This is a O(n) algorithm if activities are sorted before the algorithm 1:Greedy-Selector 2: n = length of Set 3: A = new Array(1) 4: j = 1 5: for i = 2 to n 6: if s i > f j 7: A = A υ { i } 8: j = i 9: return A

Activity Selector The activity picked next by the algorithm is always the one with the earliest finish time that can be scheduled without conflicts. The activity picked is thus the "greedy" choice, because it does not look at all choices, but only the best one it sees right in front of it. The greedy choice maximizes the amount of unscheduled time.

Greedy Properties How do we know if the Greedy Algorithm is the best choice for solving a problem? The first important thing to know about is if a globally optimal solution can be gotten by making locally optimal solutions. This means, can we get the best answer for the whole problem, by making good answers for smaller problems. This means that we can sometimes use induction to help us see if the problem can be solved with a greedy algorithm.

Properties: Optimal Substructure "Optimal substructure" is an important property for a problem to have if you want to solve it using a greedy method Optimal substructure means the best solution to a problem is made of the best solutions to sub-problems inside of it The Activity Selector was an example of this. The sub problem was which activity added to the list will maximize the size of the list right now. After solving each of these sub-problems, we have an answer that solves the whole problem

Greedy Algorithms: Compression "Huffman coding" is a method to compress data. The main idea of data compressions is that you start with an original thing x (text, picture, movie) and you want to change x with a function C(x) so that C(x) has fewer bits than x But you also want a way to change C(x) back into x, with a function D(x) Thus: D(C(x)) = x C(x) = compress x D(x) = decompress x

Compression There are two types of compression algorithms, –Lossless Compression: D(C(x)) = x –Lossy Compression: D(C(x)) ≈ x Things that compress text or programs should be lossless, but things like pictures and sound can be lossy –(You don’t want words from a book to be almost the same as words from a book) –(But with a picture, if the quality is not perfect you will not notice that much)

Huffman Compression Huffman compression can give saving of % most of the time, depending on the data being compressed. Huffman's greedy algorithm uses a table of the frequencies of occurrence for each character to build an optimal way of representing a character as a binary string. This means that it will count how many times it sees each letter and make a table of the counts. Then it will find the shortest binary strings (like "01010") that will map to the letter that occurs the most.

Huffman Compression Huffman will do compression based on a "variable-length" code. This means that characters that occur more often will use fewer bits, while characters that do not occur often will use more bits. A "fixed-length" code means every character has the same amount of bits used to save it. For example, what if we have a book with 100,000 letters that we want to compress ABCDEF # (in thousands Fixed-length code Variable length code

Huffman Compression In the "fixed-length" coding we need 3 bits for each character and there are 100,000 characters. Thus, 300,000 bits are needed In the variable length code, we can do better. –(45*1 + 13*3 + 12*3 + 16*3 + 9*4 + 5*4) = 224,000 bits. –This saving is about 25% better than the fixed length

Huffman Compression If you noticed the variable lengthed codes from before, there was one character that had a 1 bit code, but there was no two bit codes. This is because Huffman uses "prefix codes." This means that a used code is never the same as the beginning of another code. (Prefix means the part of a word that comes in the beginning) This makes encoding and decoding much easier if we follow this rule. For example: abc = = And = = aabe Prefix codes make sure that we always get the same thing out that we put in ( D(C(x)) = x )

Huffman The decoding needs an easy way to find which letter belongs to a mapping. A binary tree is a good data structure for this. The leaves of the tree will be a character and the paths will represent either 0 or 1. 0 can mean "go to the left child" 1 can mean "go to the right child" This also means we can easily know how many bits we need to save the data. Cost = SumAllCharacters(height(c) * count(c)) where "c" is a character Or Cost =

Example Trees Fixed Length Codes Variable Length Codes Each node stores the total number of characters in the sub tree below The leaves contain the character and the number of occurrences We can see that a variable length code tree may become very unbalanced

Huffman Coding Huffman invented a greedy algorithm that will find the variable length codes and build the tree in a bottom up manner. C is a set of n characters. Q is a Priority Queue that holds and sorts all the characters by frequency The for loop keeps taking out the two minimum nodes and making them children of z, a new node that holds the total frequency count Then we insert z back into the Queue until we have done this for each character. Lastly, we return the Extract- Min which is the root of the tree Huffman(C) n = length of C Q = C for i = 1 to n – 1 z = new Node() z->left = Extract-Min(Q) z->right = Extract-Min(Q) z->count = z->left->count + z->right->count Insert(Q,z) return Extract-Min(Q)

Huffman Tree Building A B C D EF

Huffman Trees The Huffman coding uses a greedy algorithm because it has an "optimal substructure." This means that if it is able to solve small problems correctly, then it can put the small problems together to get the best solution for the whole problem In the Huffman codes, we solve a prefix code problem for two nodes at a time. We "greedily" choose the two smallest nodes and then work on the next two smallest. This process demonstrates how to use greedy algorithms to solve a problem that would otherwise be difficult

Amortized Analysis Amortized analysis is the time required to do a sequence of operations averaged over all the operations done. We have looked at the worst-case running times of individual operations, but… Sometimes the cost of a single operation changes a lot, so the worst case is not so good. Instead, we want to look at the average cost of an operation over a series of operations. This is different from the “Average-case” analysis. Amortized analysis looks at the average cost of an operation. Average-case looks at how long an entire function will take on average

Amortized Thus, “amortized time” means: –If any sequence of n operations take < T(n) time, the amortized time per operation is T(n)/n –Also, if the amortized time of one operation is U(n), then any sequence of n operations takes n*U(n) time This average is over a sequence of operations for any sequence –Not the average for an input distribution –Not the average over random choices made by an algorithm Amortized analysis: a way to express an algorithm in such a way that even if the worst case is bad, the total performance of a sequence of operations is not always bad

Amortized Analysis Method One: Accounting method –Charge each operation an amortized cost –Amount not used stored in “bank” –Later operations can used stored credit –Balance must not go negative Method Two: Aggregate method of amortized analysis: –n operations take time T(n) –Average cost of an operation = T(n)/n

Dynamic Tables What if we want to make a table, like a hash table, for dynamic data (data that changes). And we want to make it as small as possible Problem: if too many items inserted, table may be too small for all of them, so… Solution: get more memory if we need it

Dynamic Tables 1. Initialize table to size m = 1 2. Insert elements until n elements > m 3. Generate new table of size 2m 4. Reinsert old elements into new table 5. (back to step 2) –What is the worst-case cost of an insert? –One insert can be costly, but the total?

Analysis Of Dynamic Tables Let c i = cost of i th insert –c i = i if i-1 is exact power of 2, –1 otherwise Example: –OperationTable Size Cost Insert(1)11 1

Analysis Of Dynamic Tables Let c i = cost of i th insert –c i = i if i-1 is exact power of 2, –1 otherwise Example: –OperationTable Size Cost Insert(1)11 1 Insert(2) Insert(3)

Analysis Of Dynamic Tables Let c i = cost of i th insert –c i = i if i-1 is exact power of 2, –1 otherwise Example: –OperationTable Size Cost Insert(1)11 1 Insert(2) Insert(3) Insert(4)41 4 Insert(5)

Analysis Of Dynamic Tables Let c i = cost of i th insert –c i = i if i-1 is exact power of 2, –1 otherwise Example: –OperationTable Size Cost Insert(1)11 1 Insert(2) Insert(3) Insert(4)41 4 Insert(5) Insert(6)81 6 Insert(7)81 7 Insert(8)81 8

Analysis Of Dynamic Tables Let c i = cost of i th insert –c i = i if i-1 is exact power of 2, –1 otherwise Example: –OperationTable Size Cost Insert(1)11 1 Insert(2) Insert(3) Insert(4)41 Insert(5) Insert(6)81 Insert(7)81 Insert(8)81 Insert(9)

Aggregate Analysis: Dynamic Tables "n" Insert() operations cost: Average cost of operation = (total cost)/(# operations) < 3n/n < 3 So we can say a dynamic table costs the same as a fixed-size table –Both O(1) per Insert operation, even though in the dynamic table some operations are really bad.

Accounting Analysis We can also use another form of amortized analysis, called the "accounting method". For our dynamic table we can "charge" each operation $3 amortized cost –Use $1 to perform immediate Insert() –Store $2 When table doubles –Use the saved $2 to reinsert two old items –We’ve "paid" these costs with the last n/2 Insert()s Benefit: O(1) amortized cost per operation

Accounting Analysis Suppose we also must support insert & delete. Then the table could shrink and grow –Table overflows  double it (as before) –Table < 1/4 full  halve it –Charge $3 for Insert (as before) –Charge $2 for Delete Store extra $1 in emptied slot Use later to pay to copy remaining items to new table when shrinking table We only need extra $1 because we're reinserting fewer items

Example: Stack with Multipop A stack is a data structure that usually has these operations: –Push: Insert new element at the top of the stack –Pop: Delete top element from the stack A stack can be made so that Push and Pop are both O(1) operations. So what about if we add an operation: –Multipop(k): Pop k elements off the stack

Example: Stack with multipop Analysis of a sequence of n operations –One Multipop can take O(n) time and we might think that a sequence of n Multipops can take O(n 2 ) –But this is not very "tight," meaning that n Multipops can have a smaller upper bound –We know that if n elements have been put on the stack, there can be no more than n pushes and n pops. Each element can be popped at most once each time it is pushed –Number of Pop operations is bounded by n –Total cost of n operations of Multipop, Pop or Push is O(n) –Thus, the Amortized cost of one operation is O(n)/n = O(1) –The aggregate method used here shows that although the upperbound for n operations could be O(n 2 ), in reality the average cost of an operation is O(1)

Example: Binary Counter Consider the following problem: You want a binary counter that can use n Increment operations We use an array A that holds bits so that A[i] = 0 or A[i] = 1 A[0] is lowest order bit, so value of counter is X = SUM(A[i]*2 i ) from i  n Algorithm Increment(A) A[0] = A[0]+1 i = 0 while (A[i] == 2) A[i+1] = A[i+1] + 1; A[i] = 0; i++

Example: Binary Counter The running time of Increment is the number of iterations of the while loop Examples: –X = 47  A = –X = 48  A = –X = 49  A = Increment from x = 47 to x = 48 has cost 5 Increment from x = 48 to x = 49 has cost 1

Example: Binary Counter Analysis of a sequence of n increments –Number of bits in representation of n is log n, thus n operations = O(n) * O(lg n) = O(n lg n) –Amortized running time of Increment is O(1) for one operation, O(n) for n operations –A[0] will flip on every increment (n times) –A[1] will flip on every second increment (n/2 times) –A[2] will flip on every third increment (n/4) –A[i] will flip on every 2 i increment (n/2 i times) –Total running time T(n) = Σ(n/2 i ) from i=0 to log n –T(n) = Σ(n/2 i ) to lg n < Σ (1/2 i ) to ∞ = 2n = O(n) –Amortized cost for one operation O(n)/n = O(1)

Example: Binary Counter Accounting Analysis: –Every 1 in A will hold one credit –Change from 1 -> 0 paid using a credit –Change from 0 -> 1 paid by Increment. Pay one credit to do the flip and place one credit on new 1 –Increment cost O(1) amortized.

Summary Advanced design techniques should be put in your mind in case you run across a problem that seems especially difficult. Greedy algorithms can be very helpful in solving problems that would seem hard to program. Amortized analysis is not really a way to program, but is instead a way to find out running times. What it does special is that it realizes that different operations and different input may change the running time. In these cases, we may be more interested in an average running time of an operation, instead of the worst case of an algorithm.