Complexity Sorting Searching

Complexity Sorting Searching
Tirgul 11 Complexity Sorting Searching

Complexity

A motivation for complexity
Talking about running time of algorithms is difficult. Different computers will run them at different speeds Different problem sizes run for a different amount of time (even on same computer). Different inputs of the same size run for a different amount of time.

Using Upper and Lower Bounds to talk about running time.
Running time on some computer Upper Bound Different inputs Lower Bound 5 10 15 Problem Size 20

Bounds on running time We want to bound the running time:
“for arrays of size n it takes my computer 45·n2 milliseconds at most to sort the array with bubblesort” But what about other computers? (faster/slower) Instead we say: “On any computer, there is a constant c such that it takes at most c·n2 time units to sort an array of size n with bubblesort” How is this sentence useful?

Big O notation. We say that running time of an alg. is O(f(n))
If there exists c such that for large enough n: RunningTime(n) < c·f(n) Translation: From some point c·f(n) is an upper bound on the running time. Why only for “large enough n”?

Use of big-O notation Example:
The running time of bubble sort on arrays of size n is O(n2). Translation: for a sufficiently large array, the running time is less than c·n2 for some c (No matter how bad the input is).

Big Omega Big Theta For lower bounds we use Ω(f(n)).
I.e., if we say the running time of an algorithm is Ω(f(n)), we mean that RunningTime(n)>c·f(n) for some c. Big Theta If the running time of an algorithm is Ω(f(n)) and O(f(n)), we say that the running time of the algorithm is

Big-O notation Do not assume that because we analyze running time asymptotically we no longer care about the constants. If you have two algorithms with different asymptotic limits – easy to choose. (may be misleading if constants are really large). Usually we are happy when something runs in polynomial time and not exponential time. O(n2) is much much better than O(2n).

Examples.

Examples What is the running time of the following code:
for (int i = 0; i < arr.length; i++) { System.out.println(arr[i]); }

Examples What about this code:
for (int i = 0; i < arr.length; i++) { if (arr[i]==1) break; System.out.println(arr[i]); }

Examples Checking if the number n is prime.
We can try and divide the number n by all integers in the range 2,…,n1/2. Best case: sometimes we find it isn’t prime really quickly (e.g. it divides by 2). Worst case: sometimes the number is prime and we try n1/2 different divisions. Running time is O(n1/2), and is also Ω(1).

Bounds might not be tight
We saw that finding out if a number n is prime takes us O(n1/2) time. It also takes us O(n5) time. The second bound guarantees less. It is not tight.

Space Complexity The amount of memory a certain algorithm requires matters to us. Do we have enough memory to run it? When we say something like ‘the space complexity of this algorithm is O(n2)’ we mean: there exists a ‘c’ such that there are at most c*n2 memory cells used by the alg. at any given time.

Sorting

Bubble Sort public static void bubble1(int[] data) { for(int j=0; j<data.length-1; j++){ for (int i=1; i<data.length-j; i++){ if(data[i-1]>data[i]){ swap(data,i-1,i); } The internal loop runs n-1 times, then n-2 times, then n-3 times and so on… What is the complexity?

Bubble Sort

Analysis of Running Time
Let’s assume the swapping procedure along with the if that surrounds it run in at most k time units. Total running time is then at most: What is the lower bound?

A Stable Sorting Algorithm
A sorting algorithm is called ‘stable’ if it maintains the original order of two equal items. Example: Original Array 4 3 2a 1a 1b 2b Result of a stable sort 1a 1b 2a 2b 3 4 A non-stable sort 1a 1b 2b 2a 3 4

Bubble Sort Question: Is this a stable sort?
public static void bubble1(int[] data) { for(int j=0; j<data.length-1; j++){ for (int i=1; i<data.length-j; i++){ if(data[i-1]>data[i]){ swap(data,i-1,i); } Question: Is this a stable sort? Question: What happens if we start this on a sorted array?

Modified Bubble Sort public static void bubble2(int[] data) { for(int j=0; j<data.length-1; j++){ boolean movedSomething = false; //added this for (int i=1; i<data.length-j; i++){ if(data[i-1]>data[i]){ movedSomething = true; //added this swap(data,i-1,i); } if(!movedSomething) break; //added this We detect when the array is sorted and stop in the middle.

Modified Bubble Sort Before the modification, running time was always nearly the same but now what? What is the best case? What is the worst case? What is the running time in each case? Which version of bubble sort is better?

Complexity of Modified Bubble Sort.
What is the worst case running time? O(n2) - The array is sorted in reverse – we don’t exit early. What is the best case running time? Ω(n) – in the best case we start with a sorted array. Go over it once and see that nothing was moved.

Selection Sort (a.k.a.Max-Sort)
Pseudo Code: For i=0,1,…,n-2 Run over the array from i onwards. Find j- the index of the minimal item. Swap the values in indexes i,j.

Running Time of selection sort
Note first that the running time is always pretty much the same. Finding the minimal value always requires us to go over the entire sub-array. There is no shortcut, and we never stop early. First, we have an array of size n-1 to scan, then size n-2, then size n-2… We’ve already analyzed this pattern: Running time is O(n2).

Insert Sort public static void insertSort(int[] data){
for(int i=1; i<data.length; i++){ //assume before index i the array is sorted insert(data,i); //insert i into array 0...(i-1) } private static void insert(int[] data, int index) { int value = data[index]; int j; for(j=index-1; j>=0 && data[j]>value; j-- ){ data[j+1]=data[j]; data[j+1] = value;

Complexity of Insertion Sort
Worst case complexity: O(n2) – If the original array is sorted backwards we insert all the way in. First into an array of size 1, then 2, then 3… Is this pattern familiar? Best case complexity: Ω(n) – The array is already sorted. Each insert stops after 1 step, but we call insert Ω(n) times.

Bucket Sort Elements are distributed among beans:
Elements are sorted within each bin:

Bucket Sort public static void bucketSort(int[] data,int maxVal){
int[] temp; temp=new int[maxVal+1]; int k=0; for(int i=0; i<data.length; i++) temp[data[i]]++; for (int i=0; i<=maxVal ; i++) for (int j=0;j<temp[i]; j++){ data[k]=i; k++; }

Complexity of Bucket Sort
O(n+M) Where M is the data range and n is the number of elements in the array

Conclusions: The Lesson: Soon to come:
Pick the right algorithm for the job. If we know the array is nearly sorted some algorithms may be better. Soon to come: Sorting algorithms that sort in O(n·log(n)) This is the best you can do without assumptions on the data (There is a formal proof but we will not learn it in this course).

Why Sort? Faster search (Binary Search).
Other things may also work faster. Another Example: Given an array of numbers, find the pair with the smallest difference. Return the difference.

Finding the smallest difference
The naïve way: public static int minimalDif(int[] numbers){ int diff = Integer.MAX_VALUE; for(int i=0; i<numbers.length; i++){ for(int j=i+1; j<numbers.length; j++){ int currentDiff = Math.abs(numbers[i]-numbers[j]); if(currentDiff<diff){ diff = currentDiff; } return diff;

Complexity of the naïve case
The internal loop runs n-1 times, then n-2 times, then n-3 times, … As we’ve seen before , this also gives O(n2) running time. A faster idea: Use a sorting algorithm that works in O(n·log(n)), then go over consecutive pairs only. What is the total running time this way?

Finding the smallest difference
Assumes numbers is ordered in non-decreasing order. The better way: public static int minimalDif(int[] numbers){ int diff = Integer.MAX_VALUE; for(int i=0; i<numbers.length-1; i++){ int currentDiff = Math.abs(numbers[i]-numbers[i+1]); if(currentDiff<diff){ diff = currentDiff; } return diff;

n = num of elements in the array
Linear Search //searching for an element e in an unsorted array public static int linearSearch(int[] intArr, int e){ for (inti= 0; i< intArr.length;i++){ if (intArr[i]==e) return i; } return -1; //we need to check each and every element till we find e. //if e is not in the array we checked n elements // (n = array length) //if the array is sorted we know when to stop searching even if the element wasn’t found Complexity: O(n), n = num of elements in the array

Linear search in a sorted array
public static int linearSearch(int[] intArr, int e){ for (inti= 0; i< intArr.length;i++){ if (intArr[i]==e) { return i; } if (intArr[i] > e){ return -1; Complexity is still O(n)

Binary Search

Binary search ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > > >
At the start, all of the stored values are unknown: ? ? ? ? ? ? ? ? ? ? ? ? ? look here first ? ? ? ? ? ? > > > > > > > still unknown definitely too high look here second look here next < < < ? ? ? > > > > > > > definitely too low still unknown definitely too high

Termination When the lower and upper bounds of the unknown area pass each other, the unknown area is empty and we terminate (unless we’ve already found the value) Goal: Locate a value, or decide it isn’t there Intentional Bound: We’ve found the value Necessary Bound: The lower and upper bounds of our search pass each other Plan: Pick a component midway between the upper and lower bounds. Reset the lower or upper bound, as appropriate.

The Binary Search Pseudocode
initialize lower and upper to the lower and upper array bounds; do { set middle to (lower plus upper) / 2; if the value of data[middle] is lower set middle (plus 1) the new lower bound else set middle (minus 1) the new upper bound } while we find the value or run out of unknown data; decide why we left the loop, and return an appropriate position

The Binary Search Java Code
int binarySearch (int[ ] data, int num) { // Binary search for num in an ordered array int middle, lower = 0, upper = (data.length - 1); do { middle = ((lower + upper) / 2); if (num < data[middle]) upper = middle - 1; else lower = middle + 1; } while ( (data[middle] != num) && (lower <= upper) ); //Postcondition: if data[middle] isn't num, no // component is if (data[middle] == num) return middle; else return -1; } Complexity: O(log2n)

Recursive power Given an integer x and a non-negative integer n, we want to compute xn What’s the recursive formula? What’s the base case? What do you think?

Recursive power #1 public static int power(int x, int n){
if (n == 0) { return 1; } return x * power(x, n-1); 23 = 2·22 = 2·2·21 = 2·2·2·20 = 2·2·2·1 = 8 What is the runtime complexity of this method?

Complexity of recursive methods
When analyzing the complexity of recursive methods, we need to take into account how many recursive calls will be made (relative to the size of the input) When calculating xn, power is called n+1 times Each call takes a constant amount of time, so the runtime complexity is O(n) We can show this more formally…

Recursive power #1 complexity
Let T(n) be the runtime of power(x,n) The base case takes constant time: T(0) = c For the general case, T(n) = c + T(n-1) = c + (c + T(n-2)) = 2c + T(n-2) = 2c + (c + T(n-3)) = 3c + T(n-3) … Is there a pattern here?

We could prove this by induction In general, we get (for any i): T(n) = c*i + T(n-i) We would like n-i to be 0, so assign i=n: T(n) = c*n + T(0) = cn + c = O(n) Actually, this gives a lower bound too: T(n) ≥ c*n, so: T(n) = Ω(n)

Ω(f(n)) is a way of saying that an algorithm can’t be more efficient than growth rate of f(n) Lower bounds Ω(f(n)) can be used to show that an algorithm is bad (or worse than …) Upper bounds O(f(n)) can be used to show that an algorithm is good (or better than …)

Don’t be confused when comparing complexity of different contexts. For recursive power #1 algorithm the runtime complexity is O(n), when n is the exponent argument of the power X^n. For insertion sort the runtime complexity is O(n^2), when n is the length of the array to sort. We should be careful when referring to n as the “size” of the input.

So for recursive power #1 algorithm we got for X^n T(n)=O(n) and T(n)= Ω(n). Can we do better?

Recursive power rethought
Recall that xa+b = xa · xb We can decompose the power n: n = 2(n / 2) + (n % 2) So we can “divide and conquer”: xn = x2(n/2)+(n%2) = xn/2 · xn/2 · xn%2

Recursive power #2 public static int power(int x, int n){ if (n == 0)
return 1; if (n == 1) return x; int tmp = power(x, n/2); return tmp * tmp * power(x, n%2); } Why do we need two base cases? Why do we use a temporary variable to store xn/2? 25 = 22·22·21 = (21·21·20)2·21 = (2·2·1)2·2 = 32

How many times is power called when calculating xn? Note that power(x,n%2) is always a base case (n%2 is either 0 or 1) power(x,n) is computed using one call to power(x,n/2) So the runtime complexity is O(log n) More formally…

Let T(n) be the runtime of power(x,n) The base cases take constant time: T(0) = T(1) = c For the general case, T(n) = c + T(n/2) = c + (c + T(n/4)) = 2c + T(n/4) = 2c + (c + T(n/8)) = 3c + T(n/8) … Is there a pattern here?

In general, we get (for any i): T(n) = c*i + T(n/2i) We would like n/2i to be 1, so assign i = log2(n): T(n) = c*(log n) + T(1) = c*(log n) + c = O(log n)

Exam questions

Common1 – The Idea We have less memory and less power (no sorting). We’ll have to work more. For every cell in the array we’ll count the number of appearances of that number. Counting will be inefficient, and we will only remember the most common number so far. Tip: Explain the algorithm you are about to use. If you get the implementation wrong you may still get some points.

Solution public static int common1(int[] arr){ int mostCommon = 0;
Tip: Explain and document your code public static int common1(int[] arr){ int mostCommon = 0; int numAppearances = 0; for(int i=0; i<arr.length; i++){ int currentAppear = countTimes(i,arr); if(currentAppear>numAppearances){ mostCommon = arr[i]; numAppearances = currentAppear; } return mostCommon; Will hold most common value found so far For every item, count the number of times it appears Check against the current most common value

private static int countTimes(int index, int[] arr) { int result = 1;
Tip: Break code to small manageable pieces private static int countTimes(int index, int[] arr) { int result = 1; for(int j=index+1; j<arr.length; j++){ if(arr[index]==arr[j]){ result++; } return result; Counts the number of times arr[index] appears from index and onwards. It is enough to count the number of times the item appears after the index. If it appears before then we already covered it.

A small optimization public static int common1(int[] arr){
int mostCommon = 0; int numAppearances=0; for(int i=0; i<arr.length ; i++){ int currentAppear = countTimes(i,arr); if(currentAppear>numAppearances){ mostCommon = arr[i]; numAppearances = currentAppear; } return mostCommon; A small optimization: if we are close to the end of the array then countTimes will not be large - numAppearances

Tip: Do not forget to answer all parts of the question!
Complexity Tip: Do not forget to answer all parts of the question! Best case: An array with the same value (e.g. 1,1,1,1,1) We count the appearances of cell 0 and then stop early. O(n) running time for this input. Worst case: The array {1,2,3,4,5,6…} We count appearances of each value (n passes, then n-1, then n-2…) O(n2) work for this input.

Common 2 Now we are allowed to sort.
This will probably save us running time. Working on a sorted array: 1,1,2,2,2,3,3,4,4,4,5,7,7,7,7 Each item appears in a consecutive segment. We’ll use that.

Common 2 Tip: Write a draft and then copy the code to a clean area once you are convinced it works public static int common2(int[] arr){ qsort(arr); int mostCommon = 0; int numAppearances=0; int segmentStart = 0; do{ int nextSegment = findNextSegment(segmentStart, arr); if(nextSegment-segmentStart>numAppearances){ numAppearances = nextSegment-segmentStart; mostCommon = arr[segmentStart]; } segmentStart = nextSegment; } while(segmentStart<arr.length); return mostCommon; As before, most common value so far Marks the beginning of the current segment Find start of next segment, and check the length of the current one People often forget this

Complexity? What is the running time now?
QuickSort takes O(n*log(n)) in the best case. O(n2) in the worst case. Our algorithm touches every array item just once. For every segment we work in linear time in the length of the segment. A total of O(n) So the running time of quick sort dominates for both best and worst cases.

Common 3 Now we have information about the data.
We can allocate an array with one cell per value, and count the number of appearances (per value)

Common3 private static int ALLOWED_VALS = 100;
public static int common3(int[] arr){ int[] frequencies = new int[ALLOWED_VALS]; for(int i=0; i<arr.length; i++){ frequencies[arr[i]]++; } int mostCommon = 0; for(int i=1; i<frequencies.length; i++){ if(frequencies[mostCommon]<frequencies[i]){ mostCommon = i; return mostCommon; Count the appearances of each item Now, find the most common value Tip: simulate your code to see that it works

Complexity? We go over the array of data once, then go over the array of frequencies once. The values are limited in range so the array of frequencies is of constant size! Running time is O(n).

Ternary Search static int Ternary(int []data, int key) {
int pivot1, pivot2, lower=0, upper=data.length-1; do { pivot1 = (upper-lower)/3+lower; pivot2 = 2*(upper-lower)/3+lower; if (data[pivot1] > key){ upper = pivot1-1; } else if (data[pivot2] > key){ upper = pivot2-1; lower = pivot1+1; } else { lower = pivot2+1; } } while((data[pivot1] != key) && (data[pivot2] != key) && (lower <= upper)); if (data[pivot1] == key) return pivot1; if (data[pivot2] == key) return pivot2; else return -1;

Complexity Just like binary search, but instead of dividing to two, we divide to three. Same analysis as in binary search can show that the complexity is O(log3n). We note that: logan = logbn/logba Therefore O(log3n)=O(logn)

Find Sum x+y=z static boolean FindSum(int []data, int z){
for (int i=0; i < data.length; i++){ for (int j=i+1; j<data.length; j++){ if (data[i]+data[j] == z){ System.out.println("x=" + data[i] + " and y=" + data[j]); return true; } return false; How do we show the complexity?

Complexity Remember the analysis for Bubble Sort:
First loop has n iterates. Inner loop:

Find Sum in sorted array x+y=z
static boolean FindSumSorted(int []data, int z){ int lower=0, upper=data.length-1; while ((data[lower]+data[upper] != z) && (lower < upper)) { if (data[lower]+data[upper] > z) upper--; else if (data[lower]+data[upper] < z) lower++; } if (lower >= upper) { return false; else { System.out.println("x=" + data[lower] + " and y=" + data[upper]); return true; Complexity?

Complexity Sorting Searching

Similar presentations

Presentation on theme: "Complexity Sorting Searching"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Complexity Sorting Searching

Similar presentations

Presentation on theme: "Complexity Sorting Searching"— Presentation transcript:

Similar presentations

About project

Feedback