Algorithms and Data Structures lecture 3

Algorithms and Data Structures lecture 3
Szymon Grabowski Łódź, 2015

Abstract data types (ADTs)
Idea: set of operations (interface) is specified. Implementations may vary (and their details are hidden from the user). For example (as we’ll see later), a stack can be implemented as a linked list or in an array. Also, the ADTs are usually containers, i.e., the type of items they store can be arbitrary (but usu. fixed for a particular structure instance). E.g., a stack may keep floating-point numbers, or integer numbers, or strings, or some database records, or even other stacks (!)... whatever.

Typical ADT inferface

Basic data structures: arrays, lists
Array – extremely simple, good for static data. Sorted array: enables binary search to find a given key. Unsorted array: linear scan to find a given key. Binary search, example: Where is x = 8? 6 < 8? Yes. 1, 2, 3, 3, 4, 6, 6, 8, 9, 11, 12 8 < 8? No. 8 = 8? Yes 1, 2, 3, 3, 4, 6, 6, 8, 9, 11, 12

Array, cont’d Search in a unsorted array, example: Where is x = 8 ? 2, 4, 9, 3, 12, 6, 3, 6, 8, 11 n-1 comparisons in the worst case, n/2 on avg (if the key occurs!). If a searched key does not occur in the array, n-1 comparisons always. Search in a sorted array: O(log n) worst-case time (achieved lower bound – why?). Search in an unsorted array: O(n) worst-case time.

Array, cont’d Searching faster than binary search (on avg)?
Yes, possible (sometimes). In a sorted array, of course. Interpolation search When you lookup a word, e.g., ventriloquist in a dictionary (book), do you start right in a middle? No, that's not a human strategy. 'v' is close to the alphabet's end, so such will be the first visited position in the dictionary. And so on... Same idea can be used in a machine-oriented algorithm. In each search step, calculate where in the remaining search interval the sought item might be based on the key values at the bounds of the interval and the value of the sought key, usually via a linear interpolation.

Interpolation search time complexities
Array, cont’d Interpolation search time complexities On avg: O(log log n) – e.g. for uniformly random distribution. Worst case: O(n) time... It's possible to have O(log log n) time on avg, and O(log n) in the worst case, at the same time! How? Simple trick (Perl & Reingold, 1977; Santoro & Sidney, 1985): The binary and interpolation searches are interchanged, so that in even iterations of the search one step of the binary search is executed, and in odd iterations one step of interpolation search is executed.

We want to remove the highlighted 6. How?
Array, cont’d Soft spot for (sorted) arrays: handling updates (insertions / deletions). Deletions 1, 2, 3, 3, 4, 6, 6, 8, 9, 11, 12 We want to remove the highlighted 6. How? Shifting the following items back 1 position (O(n) time) is one idea. What else? Extra bit (flag) per item: position occupied or not. Binary search (slightly modified) still possible but both the search time and space occupancy (“holes” appear!) slowly deteriorate.

Array, cont’d Insertions – even worse.
Shifting the following items forward by 1 position (O(n) time; but what about the holes from deleted elements in one variant..?). And available space needed at the end of the array. If not available: reallocate & copy all. Conclusion: don’t use an array if you expect updates to your set. (Apart, perhaps, the case when the updates are going to be rare.)

Array, cont’d Array advantages: extreme simplicity;
no extra space / pointers (apart perhaps from those bit flags...) ; very practical for static data; quite flexible (other searches apart from binary possible too) ; constant-time access to a given (indexed) element. Array disadvantages: problematic / useless for the dynamic case; space waste when not filled up; block of contiguous memory needed.

Init an array in O(1) time
Can we fill an array in o(n) time (with some fixed value, e.g. zero), assuming its items use O(w) bits each? w – machine word size (in bits) Of course, this is impossible. But it is possible to cheat somehow to have O(1) “virtual” initialization. And it’s pretty easy. This trick was presented e.g. in A. V. Aho, J. E. Hopcroft, J. D. Ullman. Design & Analysis of Computer Algorithms. Addison-Wesley, But probably known earlier...

Init an array in O(1) time, cont’d
We insert elements to array A. We’ll need also array AUX and stack S. Example: Α[850]=71, Α[3]=5, Α[200]=20, Α[50]=34, ...

Init an array in O(1) time, cont’d
Wanna read A[i]? No physical zeroing (or filling with another value...), so if nothing has been written to A[i], there will be a random value at that pos. So: read S[AUX[i]] if only AUX[i] is not larger than the total number of items written in A (i.e., if only AUX[i]  TOP). (If this is larger then A[i] hasn’t been written.) Now, S[AUX[i]] will store i iff A[i] cell has been written.

Linked lists Variants: singly-linked lists; doubly-linked lists;
circular lists. From [Cormen, McGraw-Hill, 2000, p. 205] Doubly-linked list. (b) After inserting key 25, (c) after deleting key 4.

Linked lists, basic operations
Works with singly- and doubly-linked lists Insert for doubly-linked lists From [Cormen, McGraw-Hill, 2000, p. 205]

Linked lists, basic operations, cont’d
From [Cormen, McGraw-Hill, 2000, p. 206] Question: how to avoid checking the boundary conditions?

Singly-linked lists, 1. insert just before a pointed element? 2. delete?
LIST-INSERT(L, x, y) // insert x before y 1. temp = y // create a temp node 2. y = x 3. next[y] = temp Can we also delete a pointed x in O(1)-time? Or the list would have to be traversed from the start, i.e. O(n) time?

Lists in programming (1/2)
C++, STL (part of the standard library): list – doubly-linked list implementation

Lists in programming (2/2)
Java, JCF (Java Collection Framework): Interface List, two implementations: ArrayList and LinkedList. ArrayList – dynamic array; it has O(1) access to an indexed item, but slow (O(n)) update (insert / delete) from the middle of the structure. LinkedList – doubly-linked list; fast (constant-time) update in a pointed position, slow access (O(n) – needs traversal over the list).

Linked lists, cont’d Advantages: Disadvantages:
dynamic memory allocation; removing a pointed item in O(1) time; inserting a new item just after a pointed item in O(1) time. Disadvantages: no indexed lookup; search for a key in O(n) worst and avg case; extra space for pointers; handling a bit more complicated than of an array.

Linked lists, questions
Sorting on linked lists. Is it possible to adapt bubble, insertion and selection sorts to linked lists? Consider the case of singly- and doubly-linked lists separately. What about merge sort on lists? And quick sort..?

Virtual Cache Lines (Rubin et al., 1999)
Going through a sequence of updates a (plain) linked list may get scattered all over the memory  many cache misses.  An interesting idea: store small blocks of data (of a physical (L1) cache line size, e.g. 64 bytes) instead of single nodes. Assume a block can store 8 items (of size 8B each). To make updates faster on avg, we relax block density: e.g. from min = 5 to max = 8 items in each block. Reported ~2.5x faster performance (Pentium CPU, old paper…) than with plain (“scattered”) lists.

VCL, example (min = 5, max = 8) [ http://pages. cs. wisc

Stack Classic LIFO (last in, first out) structure.
Supported operations for stack S: push(S, x) pop(S) stack-empty(S) (boolean function) Implemented as a linked list (also a singly-linked list – how?) or in an array (beware of what..?).

From [Cormen, McGraw-Hill, 2000, p. 201]
Stack, cont’d From [Cormen, McGraw-Hill, 2000, p. 201]

Applications of a stack
local variables of a function/procedure (in a programming language, like C) are pushed onto a stack (note: if function A() calls function B(), and B() calls C(), then upon termination of C() we go back to B(), and then to A(); clearly: last in, first out...); expression in RPN (Reversed Polish Notation) are evaluated with a stack. The notation was devised by the Polish philosopher and mathematician Jan Łukasiewicz ( ) for use in symbolic logic. In RPN, brackets are unnecessary.

RPN, how to evaluate an expression with a stack
Example: (3 + 4) * (6 – 2) RPN: – * Push 3 onto the stack. Push 4 onto the stack. Apply the + operation: it needs two arguments. The top two items are taken off (popped from) the stack, addition is performed, and the resulting 7 is pushed onto the stack. Push 6 onto the stack. Push 2 onto the stack. 6. Apply the – operation. Pop two top numbers from the stack, subtract the top one from the one below, push the result (4) onto the stack. 7. Apply * operation in a similar manner (7*4), obtain 28, push onto the stack. Only a single item on the stack? Good, means the expression was correct.

Queues Straight queues (FIFO structures) and double-ended queues (deques) Like stacks, implemented as a linked list or in an array. The operations: Enqueue(Q, x), Dequeue(Q) From [Cormen, McGraw-Hill, 2000, p. 202]

Questions Consider the following hybrid of a linked list and an array: the elements of the list are not single items but small arrays of fixed size (e.g., having 32 “cells” each). Describe how insertion, deletion and search operations are performed with such a structure. What are its advantages over plain linked list and plain array? Does it have any weaknesses? A queue represented in an array may suffer from an overflow (i.e., trying to add (n+1)-th item if the array can store only n items). How would you fix the problem? [Cormen, ] How to store two stacks in one array A[1..n] with no overflows as long as BOTH stacks in total have no more than n items? Of course, push and pop operations should still be constant-time.

Sorting not based on comparisons (aka distribution based sorting)
If other operations than comparisons are used to determine the sorted order of elements, the set can be sorted in o(n log n) time. Even in O(n) in some cases. Counting sort If elements from 1..k range, we can sort them in O(n+k) time using O(k) extra space. If k=O(n), clearly O(n) time. Idea: for each input element x determine the # of elements less than x.

Counting sort, cont’d A – input array, B – output array, C – temp array. Keys from 1..k range.

Counting sort, example Array C in line 8 Array C in line 5

(LSD) radix sort (i.e., apply counting sort several times)
ra-dix (ray'diks) n. pl. <rad-i-ces>(rad'uh seez , ray'duh-) <ra-dix-es> 1. Math. a number taken as the base of a system of numbers, logarithms, or the like. From Webster’s dictionary, 1992

For k = O(1), we have linear overall time.
Radix sort complexity Let's have radix r = 2b. I.e. let's use b bits of the keys for a pass. What's the time for a single pass? We need to distribute and collect the keys. Distribution: O(n) time. Collecting: O(r + n) time. To have O(n) in total, we need r = O(n). I.e., b = O(log n). Now, if we need to have use k passes, and each pass takes linear time, the overall time complexity is O(kn). And the keys have (at most) k log n bits each. I.e. can be numbers from 0..nk – 1. For k = O(1), we have linear overall time. Is radix sort in-place?

The origin of radix sort [ http://ocw. mit
The origin of radix sort [ JFall-2005/77B7E85E-5A31-4FAA-A3A2-57C6CC5440BD/0/lec5.pdf ] Radix sort idea: 1890, by Herman Hollerith ( ). He used a most significant digit (MSD) radix variant. The 1880 U.S. Census took almost 10 years to process. While a lecturer at MIT, Hollerith prototyped punched-card technology. His machines, including a “card sorter”, allowed the 1890 census total to be reported in 6 weeks. He founded the Tabulating Machine Company in 1911, which merged with other companies in 1924 to form International Business Machines (IBM).

T(n) = T( n / 2 ) + T( n / 2 ) + (1).
Solving recurrences Complexity analysis often involves solving some recurrence (examples of algorithms known already..?) For example, if solving a problem for n items needs running the same procedure (recursively) for the first half (n/2 items) and for the second half, and then a constant time step, then the time cost formula is: T(n) = T( n / 2 ) + T( n / 2 ) + (1). How to solve it?

The substitution method (by guessing)
Merge sort recurrence Let’s conjecture T(n) = O(n lg n). Holds for any c  1.

The substitution method, cont’d
How to make a good guess? No rule, of course. Intuition and experience needed. You can prove loose upper and lower bounds and then make them tighter. Changing variables Consider this: T(n) = 2T(n1/2) + lg n. Set m = lg n. Then we have: T(2m) = 2T(2m/2) + m. Another rename: S(m) = T(2m), and thus S(m) = 2S(m/2) + m. S(m) = O(m lg m). Finally, T(n) = T(2m) = S(m) = O(m lg m) = O(lg n lg lg n). Looks familiar, huh?

The iteration method Idea: expand (iterate) the recurrence and express it as summation of terms dependent only on n and the initial conditions. From [Cormen, McGraw-Hill, 2000, p. 58] Example. Geometric series... should be easy to follow.

The iteration method Example, cont’d
From [Cormen, McGraw-Hill, 2000, p. 58] Of course, the O(n) bound cannot be improved. So we can write (n). Can we present visually this sort of recurrence? Think about a binary tree.

From [Cormen, McGraw-Hill, 2000, p. 60] Example: T(n) = 2T(n/2) + n2
Recusion tree From [Cormen, McGraw-Hill, 2000, p. 60] Example: T(n) = 2T(n/2) + n2

The master method To solve recurrences of the form T(n) = aT(n/b) + f(n), a  1, b  1 are constants, and f(n) is an asymptotically positive function. With those assumptions, T(n) can be bounded asymptotically as follows. From [Cormen, McGraw-Hill, 2000, p. 62]

The master method, examples
From [Cormen, McGraw-Hill, 2000, p. 63]

The master method is not a panacea
From [Cormen, McGraw-Hill, 2000, p. 63]

Algorithms and Data Structures lecture 3

Similar presentations

Presentation on theme: "Algorithms and Data Structures lecture 3"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Algorithms and Data Structures lecture 3

Similar presentations

Presentation on theme: "Algorithms and Data Structures lecture 3"— Presentation transcript:

Similar presentations

About project

Feedback