Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithm and data structure. Overview Programming and problem solving, with applications Algorithm: method for solving a problem Data structure: method.

Similar presentations


Presentation on theme: "Algorithm and data structure. Overview Programming and problem solving, with applications Algorithm: method for solving a problem Data structure: method."— Presentation transcript:

1 Algorithm and data structure

2 Overview Programming and problem solving, with applications Algorithm: method for solving a problem Data structure: method to store information CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

3 Agenda  Data types  stack, queue, priority queue  Sorting  bubble sort, merge sort, quick sort, heap sort, radix sort  Searching  binary search tree, red black tree, hash table  Graphs  Depth-first search, breadth-first search CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

4 Why study algorithms? Their impact is broad and far reaching Internet: Web search, packet routing, file sharing,… Computers: Compilers, file system,… Social media: News feeds, Advertisements,… To solve problems Example: Network connectivity problem Find a path to get from one place to another when we have a large set of pairwise connected places CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

5 Why study algorithms? To become a proficient programmer To understand the nature Computational models are replacing mathematical models Formula based  algorithm based People are more and more developing computational model and try to simulate what might be happening in nature in order to understand it For fun and profit better chance at interview CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

6 Steps to developing a usable algorithm  Model the problem  Find an algorithm to solve it  Fast enough? Fits in memory?  If not, why?  Address the problem  Iterate until satisfied CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

7 Dynamic connectivity problem Given a set of N objects  Union command: Connect two objects  Given two object, provide a connection between them  Find/connected query: is there a path connecting the two objects? CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 01234 56789 union(4,3) union(3,8) union(6,5) union(9,4) union(2,1) Connected(0,7) Connected(8,9)  indirect connection

8 Modeling the objects  Pixels in digital photos  Computers in network  Friends in a social media  … Convenient to name objects 0 to N-1  Use integers as array index CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

9 Modeling  Assumptions  “is connected to” is an equivalence relation:  Reflexive: p is connected to p  Symmetric: if p is connected to q, then q is connected to p.  Transitive: if p is connected to q and q is connected to r, then p is connected to r.  Connected components  Maximal set of objects that are mutually connected  {1,2} {5,6} {3,4,8,9} CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 01234 56789

10 Implementation  Find query: Check if two objects are in the same component  Union Command: Replace components containing two objects with their union CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 01234 56789 {1,2} {5,6} {3,4,8,9} {1,2} {5,6,3,4,8,9} Union(6,3) 01234 56789

11 Specify union-find data type Goal: design efficient data structure for union-find  Number of objects N can be huge  Number of operations M can be huge  Find queries and union commands may be intermixed CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton UnionFind UnionFind(int N) void union (int p, int q) boolean connected(int p, int q) int find (int p) int count()

12 Testing  read in number of objects N  Repeat:  read in pair of integers  if not connected yet, connect them, and print out pair CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton Public static void main(String[] args){ int N = StdIn.readInt(); UnionFind uf = new UnionFind(N); while(!StdIn.isEmpty()){ int p = StdIn.readInt(); int q = StdIn.readInt(); if(!uf.connected(p,q)){ uf.union(p,q); StdOut.println(p + “ ” +q); } }

13 Quick-find Data structure:  Integer array id[] of size N.  Interpretation: p and q are connected iff they have the same id.  Find: Check if p and q have the same id  Union: To merge components containing p and q, change all entries whose id equals id[p] to id[q] CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 0123456789 0118800188 Id[] Id[6]=0 ; id[1]=1 6 and 1 are not connected Problem: Many values can change 0123456789 0123456789 Id[]

14 QuickFind public class QuickFind{ private int[] id; public QuickFind(int N){ id = new int[N]; for (int i=0; i<N; i++){ id[i]=i; } public boolean connected (int p, int q){ return id[p]==id[q]; } public void union (int p, int q){ int pid = id[p]; int qid = id[q]; for (int i=0; i<id.length; i++) if(id[i]==pid) id[i]=qid; } CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton N Array accesses 2 Array accesses At most 2N+2 Array access

15 Quick-find is too slow Cost model: Number of array access (for read and write) Takes N 2 array access to process N union commands on N objects CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton algorithminitializeunionfindcost Quick-findNN1expensive order of growth of number of array accesses Quadratic

16 Quadratic algorithms don’t scale  billion operations per second  billion words of main memory  Touch all words in about 1 sec  Since 1950! Example: - 10 9 union command - 10 9 objects - Quick-find take 10 18 operations - +30 years CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

17 Quick-union (lazy approach) Data structure:  Integer array id[] of size N.  Interpretation: id[i] is parent of i  Root of i is id[id[….id[i]…..]]]  keep going until it doesn’t change, algorithm ensures no cycles  Find: Check if p and q have the same root  Union: To merge components containing p and q, set the id of p’s root to the id of q’s root. CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 0123456789 0194966789 Id[] 019678 24 3 5 0123456789 0194966786 Only one value changes

18 demo CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 0123456789 Id[]0123456789 union(4,3)0123356789 union(3,8)0128356789 union(6,5)0128355789 union(9,4)0128355788 Union(2,1)0118355788 connected(8,9)0118355788 connected(5,4)0118355788

19 QuickUnion public class QuickUnion{ private int[] id public QuickUnion(int N){ id = new int[N]; for (int i=0; i<N; i++){ id[i]=i; } private int findRoot(int i){ while(i != id[i] ) I = id[i]; return i; } public boolean connected (int p, int q){ return findRoot(p)==findRoot(q); } public void union (int p, int q){ int i = findRoot(p); int j = findRoot(q); id[i] = j; } CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton N Array accesses Depth of i array accesses Depth of p and q array accesses

20 Quick-union is also too slow Cost model: Number of array access (for read and write) Tree can get really tall Find too expensive (N array accesses) CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton algorithminitializeunionfindcost Quick-findNN1Expensive Quick-unionNNNExpensive Includes cost of finding root Worse case

21 Improvements Improvement 1: weighting Weighted quick-union:  Avoid tall tree  Keep track of size of each tree (number of objects)  Balance by linking the root of smaller tree to the larger tree  In quick-union we might put the larger tree lower  In weighted quick-union we always choose the better alternative CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

22 demo CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 0123456789 Id[]0123456789 union(4,3)0124456789 union(3,8)0124456749 union(6,5)0124466749 union(9,4)0124466744 Union(2,1)0224466744 connected(8,9)0224466744 connected(5,4)0224466744

23 Weighted QuickUnion public class QuickUnion{ private int[] id; public QuickUnion(int N){ id = new int[N]; for (int i=0; i<N; i++){ id[i]=i; } private int findRoot(int i){ while(i != id[i] ) I = id[i]; return i; } public boolean connected (int p, int q){ return findRoot(p)==findRoot(q); } public void union (int p, int q){ int i = findRoot(p); int j = findRoot(q); if(sz[i]<sz[j] {id[i] = j; sz[j] +=sz[i];} else {id[j] = i; sz[i] +=sz[j];} } CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton Maintain extra array sz[]

24 Weighted Quick-union analysis Running time:  Find: takes time proportional to depth of p and q.  Union: takes constant time, given roots Proposition. Depth of any node x is at most lg N. Proof: when does depth of x increase? Depth increase by 1 when tree T1 containing x is merged into another tree T2. The size of the tree containing x at least double since |T2| >= |T1| Size of tree containing x can double at most lg N times. Why? CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton x

25 Weighted QU CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton algorithminitializeunionfind Quick-findNN1 Quick-unionNNN Weighted QUNLg N Includes cost of finding root

26 Improvements Improvement 2: path compression Quick-union with path compression:  Just after computing the root of p, set the id of each examined node to point to that root.  Flatten the tree private int findRoot(int i){ while(i != id[i] ){ id[i] = id[id[i]]; I = id[i]; } return i; } CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

27 Analysis Proposition: Stating from an empty data structure, any sequence of M union-find operations on N objects makes <= c(N+M lg* N) array accesses.  So close to be linear-time.  Theory it is not  Practice it is CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton NLg* N 10 21 42 163 6553 6 4 2 65536 5 Iterate log function

28 Summary CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton algorithmfind Quick-findM N Quick-unionM N Weighted QUN+M Lg N QU + path compressionN+M Lg N Weighted QU + path compressionN+M Lg* N M union-find operations on a set of N objects Weighted QU + path compression reduces time from 30 years to 6 seconds Supercomputer wont help much; good algorithm is the key solution

29 Union-find Applications Dynamic connectivity Image processing Graph processing Understanding physical phenomena …. CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

30 Percolation A model of many physical systems: N by N grid of sites Each site is open with probability p (or blocked with probability 1-p) System percolates iff top and bottom are connected by open sites Percolate does not percolate CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton Open site Open site connected to top Blocked site No open site connected to top

31 Likelihood of Percolation Depends on site vacancy probability p.  p low ( 0.4 ) doesn’t percolate p>p*  p high (0.8) percolates p<p*  p medium (0.5) ??  So the question is how we know whether it percolates or not.  What is the value for p* Percolation phase transition (when N is large sharp threshold for p*) CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton P* 0 1

32 Monte Carlo Simulation Initialized N by N grid to be blocked Declare random sites open until top connected to bottom Vacancy percentage estimates p* Every time we add an open site we check and see if it makes the system percolate CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

33 Percolation threshold Q: What is the value for p*? A. About 0.592746 Fast Algorithm enables accurate answer to scientific question. CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton

34 Dynamic connectivity solutions CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 01234 56789 1011121314 1516171829 2021222324 Up to 4 calls to union() How to model opening a new site? A. Connect newly opened site to all of its adjacent open site Open this site

35 Dynamic connectivity solutions CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 01234 56789 1011121314 1516171829 2021222324  Create an object for each site and name then 0 to N 2 -1  Sites are in same component if connected by open sites  Percolates iff any site on bottom row is connected to site on top row Brute-force algorithm: N 2 calls to connected()

36 Dynamic connectivity solutions CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton 01234 56789 1011121314 1516171829 2021222324 Virtual site Clever trick: Introduce 2 virtual sites (and connections to top and bottom) Percolates iff virtual top site is connected to virtual bottom site Only one 1 call to connected()

37 Summary  Dynamic connectivity problem  Modeled the problem to understand what kind of data structure and algorithm needed to solve it  Introduced few easy and inaccurate algorithms to solve it  Learned how to improve them to get efficient algorithm  Applications that could not be solved without these efficient algorithms CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton


Download ppt "Algorithm and data structure. Overview Programming and problem solving, with applications Algorithm: method for solving a problem Data structure: method."

Similar presentations


Ads by Google