Download presentation
Presentation is loading. Please wait.
Published byKory Harris Modified over 9 years ago
1
Advanced Topics NP-complete reports. Continue on NP, parallelism
2
Reprise: Non-determinism Informal: add to any algorithm –taking a guess at one or more places –forking and pursuing one or more possibilities If there is a Non-deterministic algorithm, then there is a regular/standard algorithm –just try all the possibilities –may take a long time
3
Reprise: the class P … is all problems for which there exist an algorithm with complexity bounded by a polynomial.
4
Reprise: the class NP all problems for which there is an algorithm, possibly non-deterministic, that assuming you take the right paths, is bounded by a polynomial Alternative definition: you can check that the answer is correct in polynomial time.
5
Reprise: does P = NP? Is it possible to find actual standard algorithms for these NP problems? THE great problem of computer science. Proving it false would also be significant. Theoretical problem with considerable practical value.
6
NP complete A set of NP problems that can be translated into each other in polynomial time so… If one of the problems can be solved in polynomial time –aka tractible …. they all can.
7
NP-hard A problem is NP-hard if there is an NP- complete problem that can be translated into it in polynomial time. –but not necessarily the other way. NP-hard problems are at least as hard as NP-complete problems.
8
NP-hard example Robot path planning in a dynamic environment
9
Reports on NP-complete problems Tetris Knapsack problem Steiner Tree problem Graph coloring Minesweeper Subset problem
10
Note There are methods for getting answers to NP problems, but they aren't guaranteed to be optimal. Called heuristics or approximations
11
Distributed computing Approach to NP problems: fork a new process That is, use distributed computing to investigate the different choices Some problems may be embarrassingly parallelizable.
12
Sources Many Google: http://code.google.com/edu/parallel/mapre duce-tutorial.html http://code.google.com/edu/parallel/mapre duce-tutorial.html Note: there is controversy re: MapReduce –may be issue of patent –Is it the right framework –??
13
Concepts key/value pair Master / Worker nodes on network –may be one Master and many Workers hashing: quick way to find data (key/value data) piece / partition / split / shard
14
Example from Google tutorial Compute pi using many workers, each doing a calculation using pseudo-random function. –no data (NOT typical MapReduce problem) Worker picks a random point in the square. If it is in the circle, worker increments a counter. http://faculty.purchase.edu/jeanine.meyer/ processing/piEstimate/applet/http://faculty.purchase.edu/jeanine.meyer/ processing/piEstimate/applet/
15
Formulas Area_of_circle = pi * r2 Area_of_square containing circle = 4 * r2 So r2 = Area_of_square / 4 Let Ac be Area_of_circle and As be Area_of_square Then pi = 4 * Ac / As Estimate for pi is 4 * counter / Number_of_points_tried
16
Informal proof The chances of any point being in the circle is proportional to the ratio of the areas. Choosing many points randomly carries out this test. We could [simply] use for-loops and do the calculation for every point.
17
MapReduce Model for distributed (aka parallel) computing There are different products that implement MapReduce. From a google search: –Google –Apache Hadoop: Open source –Teradata –Amazon –Greenplum –Platform
18
MapReduce Programmers sets up program for Master and for Workers. Typically, the Master program sets up and partitions input array(s). Typically, data is key/value pairs. Programmers write –Map functions that process data, possibly making use of functions in the MapReduce library –Reduce functions that combine the results Workers work on Map tasks and/or Reduce tasks. The Map task is applied to the worker's piece (aka shard) of the input array.
19
MapReduce for pi estimate Not typical in that there is no data The map function does the calculation When all done, the reduce function adds up all the individual counters and calculates the estimate for pi
20
Speed up for pi estimate Suppose –each step (getting the 2 random values and determining if in circle) takes K steps –suppose 1000 workers calculating all together 1000000 values –suppose adding 2 numbers takes 1 time unit Time without distributed computing: 1000000*K Time with distributed computing 1000*K + 1000 Speed up is slightly less than 1000
21
Follow-up Look up examples using MapReduce Note: one example is Google maintaining its keyword index by scanning (crawling) the web
22
Speaker Twitter: @kmwinterfield IBM Smarter Cities Social media for political campaigns World Community Grid
23
Homework Prepare question for Kevin –follow on twitter and send message OR –post on moodle Continue with postings Research unique NP complete problem and post summary and source!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.