Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Data Structures Sartaj Sahni

Similar presentations


Presentation on theme: "Advanced Data Structures Sartaj Sahni"— Presentation transcript:

1 Advanced Data Structures Sartaj Sahni
Acknowledge EDGE and on-campus overflow students.

2 Clip Art Sources www.barrysclipart.com www.livinggraphics.com

3 What The Course Is About
Study data structures for: External sorting Single and double ended priority queues Dictionaries Multidimensional search Computational geometry Image processing Packet routing and classification

4 What The Course Is About
Concerned with: Worst-case complexity Average complexity Amortized complexity

5 Prerequisites C++ (reading knowledge at least) Asymptotic Complexity
Big Oh, Theta, and Omega notations Undergraduate data structures Stacks and Queues Linked lists Trees Graphs

6 Web Site www.cise.ufl.edu/~sahni/cop5536 https://lss.at.ufl.edu
Handouts, syllabus, text, readings, assignments, past exams, past exam solutions, TAs, Internet lectures, PowerPoint presentations, etc. My office data.

7 Assignments, Tests, & Grades
25% for assignments There will be two assignments. 25% for each test There will be three tests.

8 Grades (Rough Cutoffs)
B+ >= 77% B >= 72% B- >= 67% C+ >= 63% C >= 60% C- >= 55%

9 Kinds Of Complexity Worst-case complexity. Average complexity.
Amortized complexity.

10 Quick Sort Sort n distinct numbers.
Worst-case time is (say) 10n2 microseconds on some computer. This means that for every n, there is a sequence of n numbers for which quick sort will take 10n2 microseconds to complete. Also, there is no sequence of n numbers for which quick sort will take more than 10n2 microseconds to complete. First we take a look at what we can/cannot do with worst case and average complexities. For any n, the worst-case time is the max of the times over all instances.

11 Quick Sort Average time is (say) 5n log2n microseconds on some computer. Consider any n, say n = 1000. Add up the time taken to sort each of the 1000! possible 1000 element sequences. Divide by 1000!. The result is 5000 log21000.

12 Quick Sort What if we sort only 500 of these 1000! sequences?
We can only conclude that the total time for these 500 sequences will be <= 500*(worst-case time) = 500*(10n2) We cannot conclude that the time will be 500*(average time). Average time gives no performance guarantee on doing the task several times.

13 Task Sequence Suppose that a sequence of n tasks is performed.
The worst-case cost of a task is cwc. Let ci be the (actual) cost of the ith task in this sequence. So, ci <= cwc, 1 <= i <= n. n * cwc is an upper bound on the cost of the sequence. j * cwc is an upper bound on the cost of the first j tasks. Now let’s turn our attention to a sequence of (abstract) tasks. Previous example focused on a sequence of quick sorts (task was sort using quick sort). More generally, the tasks could, for example, be inserts, deletes, and searches in some data structure. Average cost of a task is not of much use in this analysis. Similarly, if Cavg is average task cost (independent of sequence), cost of task sequence may not be n*Cavg. So, the only tool we have to bound the cost of a task sequence is worst-case cost.

14 Task Sequence Let cavg be the average cost of a task in this sequence.
So, cavg = Sci/n. n * cavg is the cost of the sequence. j * cavg is not an upper bound on the cost of the first j tasks. Usually, determining cavg is quite hard. Really, the only tool we presently have to provide performance guarantees is worst-case analysis.

15 Task Sequence At times, a better upper bound than j * cwc or n * cwc on sequence cost is obtained using amortized complexity. WC can be used to give a performance guarantee on a sequence of tasks but average cannot. There exists several data structures that perform better than the performance bound from a wc analysis. For many of these, we can arrive at a better performance guarantee, one that explains the observed better perforance, using amortized complexity.

16 Amortized Complexity The amortized complexity of a task is the amount you charge the task. The conventional way to bound the cost of doing a task n times is to use one of the expressions n*(worst-case cost of task) S(worst-case cost of task i) The amortized complexity way to bound the cost of doing a task n times is to use one of the expressions n*(amortized cost of task) S(amortized cost of task i) Second expression is used when the bound on the cost of a task depends on the task index or on the nature of the task (insert, search, delete).

17 Amortized Complexity The amortized complexity/cost of individual tasks in any task sequence must satisfy: S(actual cost of task i) <= S(amortized cost of task i) So, we can use S(amortized cost of task i) as a bound on the actual complexity of the task sequence. WC analysis requires actual cost of task i <= wc cost of task i.

18 Amortized Complexity The amortized complexity of a task may bear no direct relationship to the actual complexity of the task.

19 Amortized Complexity In worst-case complexity analysis, each task is charged an amount that is >= its cost. S(actual cost of task i) <= S(worst-case cost of task i) In amortized analysis, some tasks may be charged an amount that is < their cost. <= S(amortized cost of task i)

20 Potential Function P(i) = amortizedCost(i) – actualCost(i) + P(i – 1)
S(P(i) – P(i–1)) = S(amortizedCost(i) –actualCost(i)) P(n) – P(0) = S(amortizedCost(i) –actualCost(i)) P(n) – P(0) >= 0 When P(0) = 0, P(i) is the amount by which the first i tasks/operations have been over charged. P(I) is potential after I’th operation. P(0) is initial potential. This function keeps track of the accumulated difference between the amortized (I.e. charged) costs and actual costs.

21 Arithmetic Statements
Rewrite an arithmetic statement as a sequence of statements that do not use parentheses. a = x+((a+b)*c+d)+y; is equivalent to the sequence: z1 = a+b; z2 = z1*c+d; a = x+z2+y;

22 Arithmetic Statements
a = x+((a+b)*c+d)+y; The rewriting is done using a stack and a method processNextSymbol. create an empty stack; for (int i = 1; i <= n; i++) // n is number of symbols in statement processNextSymbol();

23 Arithmetic Statements
a = x+((a+b)*c+d)+y; processNextSymbol extracts the next symbol from the input statement. Symbols other than ) and ; are simply pushed on to the stack. b + a ( ( + x = a

24 Arithmetic Statements
a = x+((a+b)*c+d)+y; If the next symbol is ), symbols are popped from the stack up to and including the first (, an assignment statement is generated, and the left hand symbol is added to the stack. a = x + ( b z1 = a+b;

25 Arithmetic Statements
a = x+((a+b)*c+d)+y; If the next symbol is ), symbols are popped from the stack up to and including the first (, an assignment statement is generated, and the left hand symbol is added to the stack. d + c * z1 ( z1 = a+b; + x z2 = z1*c+d; = a

26 Arithmetic Statements
a = x+((a+b)*c+d)+y; If the next symbol is ), symbols are popped from the stack up to and including the first (, an assignment statement is generated, and the left hand symbol is added to the stack. y + z2 z1 = a+b; + x z2 = z1*c+d; = a

27 Arithmetic Statements
a = x+((a+b)*c+d)+y; If the next symbol is ;, symbols are popped from the stack until the stack becomes empty. The final assignment statement a = x+z2+y; is generated. y + z2 z1 = a+b; + x z2 = z1*c+d; = a

28 Complexity Of processNextSymbol
a = x+((a+b)*c+d)+y; O(number of symbols that get popped from stack) O(i), where i is for loop index.

29 Overall Complexity (Conventional Analysis)
create an empty stack; for (int i = 1; i <= n; i++) // n is number of symbols in statement processNextSymbol(); So, overall complexity is O(Si) = O(n2). Alternatively, O(n*n) = O(n2). Although correct, a more careful analysis permits us to conclude that the complexity is O(n). Note that if we do worst-case amount of work when I = 10 (say), we can’t do worst-case amount of work when I = 11 as now stack has only 1 element on it!

30 Ways To Determine Amortized Complexity
Aggregate method. Accounting method. Potential function method.

31 Aggregate Method Somehow obtain a good upper bound on the actual cost of the n invocations of processNextSymbol() Divide this bound by n to get the amortized cost of one invocation of processNextSymbol() Easy to see that S(actual cost) <= S(amortized cost) Because, sum of the amortized costs equals the obtained good upper bound.

32 Aggregate Method The actual cost of the n invocations of processNextSymbol() equals number of stack pop and push operations. The n invocations cause at most n symbols to be pushed on to the stack. This count includes the symbols for new variables, because each new variable is the result of a ) being processed. Note that no )s get pushed on to the stack. Actually, there are n-1 pushes as there is no push when ; is processed. Only pushed items may be popped. So, there are n-2 pops.

33 Aggregate Method The actual cost of the n invocations of processNextSymbol() is at most 2n. So, using 2n/n = 2 as the amortized cost of processNextSymbol() is OK, because this cost results in S(actual cost) <= S(amortized cost) Since the amortized cost of processNextSymbol() is 2, the actual cost of all n invocations is at most 2n. Note that the amortized cost of 2 is sometimes less than the actual cost, sometimes more, and sometimes equal. We could assign an amortized cost of 3 as well.

34 Aggregate Method The aggregate method isn’t very useful, because to figure out the amortized cost we must first obtain a good bound on the aggregate cost of a sequence of invocations. Since our objective was to use amortized complexity to get a better bound on the cost of a sequence of invocations, if we can obtain this better bound through other techniques, we can omit dividing the bound by n to obtain the amortized cost.


Download ppt "Advanced Data Structures Sartaj Sahni"

Similar presentations


Ads by Google