Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Randomization in Complete Tree Search.

Similar presentations


Presentation on theme: "Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Randomization in Complete Tree Search."— Presentation transcript:

1 Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes gomes@cs.cornell.edu Module: Randomization in Complete Tree Search Algorithms Wrap-up of Search!

2 Carla P. Gomes CS4700 Randomization in Local Search Randomized strategies are very successful in the area of local search. –Random Hill Climbing – Simulated annealing –Genetic algorithms –Tabu Search –Gsat and variants. Key Limitation? Inherent incomplete nature of local search methods.

3 Carla P. Gomes CS4700 Introduce randomness in a tree search method e.g., by randomly breaking ties in variable and/or value selection. Why would we do that? Can we also add a stochastic element to a systematic (tree search) procedure without losing completeness? Randomization in Tree Search

4 Carla P. Gomes CS4700 Backtrack Search ( a OR NOT b OR NOT c ) AND ( b OR NOT c) AND ( a OR c)

5 Carla P. Gomes CS4700 Backtrack Search Two Different Executions ( a OR NOT b OR NOT c ) AND ( b OR NOT c) AND ( a OR c)

6 Carla P. Gomes CS4700 The fringe of the search space The fringe of search space

7 Carla P. Gomes CS4700 Latin Square Completion: Randomized Backtrack Search (*) no solution found - reached cutoff: 2000 Time:(*)3011(*)7 Easy instance – 15 % pre-assigned cells Gomes et al. 97

8 Median = 1! sample mean 3500! 500 2000 number of runs (on the same instance) Erratic Mean Behavior

9 1

10 Carla P. Gomes CS4700 75%<=30 Proportion of cases Solved F(x) 5%>100000 Number backtracks

11 Carla P. Gomes CS4700 Run Time Distributions The runtime distributions of some of the instances reveal interesting properties: I Erratic behavior of mean. II Distributions have “heavy tails”.

12 Carla P. Gomes CS4700 Heavy-Tailed Distributions … infinite variance … infinite mean Introduced by Pareto in the 1920’s --- “probabilistic curiosity.” Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena. Examples: stock-market, earth-quakes, weather,...

13 Carla P. Gomes CS4700 Decay of Distributions Standard --- Exponential Decay e.g. Normal: Heavy-Tailed --- Power Law Decay e.g. Pareto-Levy:

14 Normal, Cauchy, and Levy Normal - Exponential Decay Cauchy -Power law Decay Levi -Power law Decay

15 Carla P. Gomes CS4700 Tail Probabilities (Standard Normal, Cauchy, Levy)

16 Carla P. Gomes CS4700 Fat tailed distributions Kurtosis = second central moment (i.e., variance) fourth central moment Normal distribution  kurtosis is 3 Fat tailed distribution  when kurtosis > 3 (e.g., exponential, lognormal)

17 Carla P. Gomes CS4700 Fat and Heavy-tailed distributions Exponential decay for standard distributions, e.g. Normal, Logonormal, exponential: Heavy-Tailed Power Law Decay e.g. Pareto-Levy: Normal 

18 Carla P. Gomes CS4700 Pareto Distribution Density Function f(x) = P[ X = x ] –f(x) =  / x (  + 1) for x  1 Distribution Function F(x) = P[ X  x ] –F(x) = 1 - 1 / x  for x  1 Survival Function (Tail probability S(x) = 1 – F(x) = P[X>x] –S(x) = 1 / x  for x  1 where  > 0 is a shape parameter

19 Carla P. Gomes CS4700 Pareto Distribution Moments E(X n ) =  / (  - n) if n <  E(X n ) =  if n  . Mean  E(X) =  / (  - 1) if  > 1. E(X) =  if   1. Variance  var(X) =  / [(  - 1) 2 (  - 2)] if  > 2 var(X) =  if   2.

20 Carla P. Gomes CS4700 How to Check for “Heavy Tails”? Power-law decay of tail  Log-Log plot of tail of distribution (Survival function or 1-F(x): e.g for the Pareto S(x) = 1 / x  for x  1 )  should be approximately linear. Slope gives value of infinite mean and infinite variance infinite mean and infinite variance infinite variance infinite variance

21 Carla P. Gomes CS4700 Pareto  =1 Lognormal 1,1 X f(x) Infinite mean and infinite variance. Lognormal(1,1) Pareto(1)

22 Carla P. Gomes CS4700 How to Visually Check for Heavy-Tailed Behavior Log-log plot of tail of distribution exhibits linear behavior.

23 Carla P. Gomes CS4700 Survival Function: Pareto and Lognormal

24 Carla P. Gomes CS4700 Example of Heavy Tailed Model Random Walk: Start at position 0 Toss a fair coin: with each head take a step up (+1) with each tail take a step down (-1) X --- number of steps the random walk takes to return to position 0.

25 Carla P. Gomes CS4700 The record of 10,000 tosses of an ideal coin (Feller) Zero crossing Long periods without zero crossing

26 Random Walk Heavy-tails vs. Non-Heavy-Tails Normal (2,1000000) Normal (2,1) O,1%>200000 50% 2 Median=2 1-F(x) Unsolved fraction X - number of steps the walk takes to return to zero (log scale)

27 Number backtracks (log) (1-F(x))(log) Unsolved fraction => Infinite mean Heavy-Tailed Behavior in Latin Square Completion Problem 18% unsolved 0.002% unsolved

28 Carla P. Gomes CS4700 Walsh 99 How Toby Walsh Fried his PC (Graph Coloring)

29 Carla P. Gomes CS4700 To Be or Not To Be Heavy-Tailed

30 Carla P. Gomes CS4700 Random Binary CSP Models Model E N – number of variables; D – size of the domains: p – proportion of forbidden pairs (out of D 2 N ( N-1)/ 2) (Achlioptas et al 2000) N – from 15 to 50;

31 Carla P. Gomes CS4700 Typical Case Analysis: Model E Constrainedness Computational Cost (Mean) % of solvable instances Phase Transition Phenomenon: Discriminating “easy” vs. “hard” instances Hogg et al 96

32 Carla P. Gomes CS4700 Runtime distributions

33 Towards phase transition

34 Carla P. Gomes CS4700 Explaining and Exploiting Fat and Heavy-Tailed

35 Formal Models of Heavy and Fat Tails in Combinatorial Search How to explain short runs? Heavy/Fat Tails – wide range of solution times very short and very long runtimes Backdoors Hidden tractable substructure in real-world problems subset of the “critical” variables such that once assigned a value the instance simplifies to a tractable class practical consequences

36 After setting 5 backdoor vars After setting 12 backdoor vars Logistics planning problem formula- 843 vars, 7,301 constraints – 16 backdoor variables (visualization by Anand Kapur, 4701 project) Initial Constraint Graph Logistics Planning – instances with O(log(n)) backdoors

37 Carla P. Gomes CS4700 Exploiting Backdoors

38 Carla P. Gomes CS4700 Algorithms Three kinds of strategies for dealing with backdoors: A complete backtrack-search deterministic algorithm A complete randomized backtrack-search algorithm Provably better performance over the deterministic one A heuristicly guided complete randomized backtrack-search algorithm Assumes existence of a good heuristic for choosing variables to branch on We believe this is close to what happens in practice Williams, Gomes, Selman 03/04

39 Carla P. Gomes CS4700 Deterministic Generalized Iterative Deepening

40 Generalized Iterative Deepening x 1 = 0x 1 = 1 All possible trees of depth 1 x 2 = 0x 2 = 1 (…) x n = 0x n = 1

41 Generalized Iterative Deepening Level 2 x 1 = 0x 1 = 1 x 2 = 0 x 2 = 1 x 2 = 0 x 2 = 1 All possible trees of depth 2

42 Generalized Iterative Deepening Level 2 x n-1 = 0X n-1 = 1 x n = 0 x n = 1 x n = 0 x n = 1 Level 3, level 4, and so on … All possible trees of depth 2

43 Carla P. Gomes CS4700 Randomized Generalized Iterative Deepening Assumption: There exists a backdoor whose size is bounded by a function of n (call it B(n)) Idea: Repeatedly choose random subsets of variables that are slightly larger than B(n), searching these subsets for the backdoor

44 Deterministic Versus Randomized Deterministic strategy Randomized strategy Suppose variables have 2 possible values (e.g. SAT) k For B(n) = n/k, algorithm runtime is c n c Det. algorithm outperforms brute-force search for k > 4.2

45 Carla P. Gomes CS4700 Complete Randomized Depth First Search with Heuristic Assume we have the following. DFS, a generic depth first search randomized backtrack search solver with: (polytime) sub-solver A Heuristic H that (randomly) chooses variables to branch on, in polynomial time  H has probability 1/h of choosing a backdoor variable (h is a fixed constant) Call this ensemble (DFS, H, A)

46 Carla P. Gomes CS4700 Polytime Restart Strategy for (DFS, H, A) Essentially: If there is a small backdoor, then (DFS, H, A) has a restart strategy that runs in polytime.

47 Carla P. Gomes CS4700 Runtime Table for Algorithms DFS,H,A B(n) = upper bound on the size of a backdoor, given n variables When the backdoor is a constant fraction of n, there is an exponential improvement between the randomized and deterministic algorithm Williams, Gomes, Selman 03/04

48 Carla P. Gomes CS4700 How to avoid the long runs in practice? Restarts provably eliminate heavy-tailed behavior. Use restarts or parallel / interleaved runs to exploit the extreme variance performance.

49 Restarts 70% unsolved 1-F(x) Unsolved fraction Number backtracks (log) no restarts restart every 4 backtracks 250 (62 restarts) 0.001% unsolved

50 Example of Rapid Restart Speedup (planning) 20 2000 ~100 restarts Cutoff (log) Number backtracks (log) ~10 restarts 100000

51 Carla P. Gomes CS4700 XXXXX solved 10 Sequential: 50 +1 = 51 seconds Parallel: 10 machines --- 1 second 51 x speedup Super-linear Speedups Interleaved (1 machine): 10 x 1 = 10 seconds 5 x speedup

52 Carla P. Gomes CS4700 Sketch of proof of elimination of heavy tails Let’s truncate the search procedure after m backtracks. Probability of solving problem with truncated version: Run the truncated procedure and restart it repeatedly.

53 Carla P. Gomes CS4700 Y - does not have Heavy Tails

54 Paramedic Crew Assignment Paramedic crew assignment is the problem of assigning paramedic crews from different stations to cover a given region, given several resource constraints.

55 Deterministic Search

56 Restarts

57 Carla P. Gomes CS4700 Restart Strategies Restart with increasing cutoff - e.g., used by the Satisfiability and Constraint Programming community; cutoff increases linearly: Randomized backtracking – (Lynce et al 2001)  randomizes the target decision points when backtracking (several variants) Random jumping (Zhang 2002)  the solver randomly jumps to unexplored portions of the search space; jumping decisions are based on analyzing the ratio between the space searched vs. the remaining search space; solved several open problems in combinatorics; Geometric restarts – (Walsh 99) – cutoff is increased geometrically; Learning restart strategies – (Kautz et al 2001 and Ruan et. al 2002) – results on optimal policies for restarts under particular scenarios. Huge area for further research. Universal restart strategies (Luby et al 93) – seminal paper on optimal restart strategies for Las Vegas algorithms (theoretical paper)

58 Carla P. Gomes CS4700 Walsh 99 Notes on Randomizing Backtrack Search Can we replay a “randomized” run?  yes since we use pseudo random numbers; if we save the “seed”, we can then repeat the run with the same seed; “Deterministic randomization” (Wolfram 2002) – the behavior of some very complex deterministic systems is so unpredictable that it actually appears to be random (e.g., adding learned clauses or cutting constraints between restarts  used in the satisfiability community) What if we cannot randomized the code? Randomize the input – Randomly rename the variables (Motwani and Raghavan 95) (Walsh (99) applied this technique to study the runtime distributions of graph-coloring using a deterministic algorithm based on DSATUR implemented by Trick)

59 Carla P. Gomes CS4700 Portfolios of Algorithms

60 Carla P. Gomes CS4700 Portfolio of Algorithms A portfolio of algorithms is a collection of algorithms running interleaved or on different processors. Goal: to improve the performance of the different algorithms in terms of: –expected runtime –“risk” (variance) Efficient Set or Pareto set: set of portfolios that are best in terms of expected value and risk.

61 Depth-First: Average - 18000;St. Dev. 30000 Branch & Bound for MIP Depth-first vs. Best-bound Cumulative Frequencies Number of nodes 30% Best bound Best-Bound: Average-1400 nodes; St. Dev.- 1300 Optimal strategy: Best Bound 45% Depth-first

62 Heavy-tailed behavior of Depth-first

63 Portfolio for 6 processors 0 DF / 6 BB 6 DF / 0BB Expected run time of portfolios 5 DF / 1BB 3 DF / 3 BB 4 DF / 2 BB Efficient set Standard deviation of run time of portfolios

64 Portfolio for 20 processors 0 DF / 20 BB 20 DF / 0 BB Expected run time of portfolios Standard deviation of run time of portfolios The optimal strategy is to run Depth First on the 20 processors! Optimal collective behavior emerges from suboptimal individual behavior.

65 Carla P. Gomes CS4700 Compute Clusters and Distributed Agents With the increasing popularity of compute clusters and distributed problem solving / agent paradigms, portfolios of algorithms --- and flexible computation in general --- are rapidly expanding research areas.

66 Carla P. Gomes CS4700 Stochastic search methods (complete and incomplete) have been shown very effective. Restart strategies and portfolio approaches can lead to substantial improvements in the expected runtime and variance, especially in the presence of heavy-tailed phenomena. Randomization is therefore a tool to improve algorithmic performance and robustness. Summary Take home message: you should always “randomize” your complete search method.

67 Carla P. Gomes CS4700 Exploiting Structure using Randomization: Summary Very exciting new research area with successful stories  E.g., state of the art complete Sat and CP solvers use randomization and restarts. Very effective when combined with learning More later…

68 Carla P. Gomes CS4700 Local Search - Summary Surprisingly efficient search method. Wide range of applications. –any type of optimization / search task Handles search spaces that are too large –(e.g., 10 1000 ) for systematic search Often best available algorithm when lack of global information. Formal properties remain largely elusive. Research area will most likely continue to thrive.

69 Carla P. Gomes CS4700 Summary: Search Uninformed search: DFS / BFS / Uniform cost search time / space complexity size search space: up to approx. 10 11 nodes. Informed Search: use heuristic function guide to goal Greedy best-first search A* search / provably optimal Search space up to approximately 10 25

70 Carla P. Gomes CS4700 Summary: Search (contd.) Special case: Constraint Satisfaction / CSPs generic framework that uses a restricted, structured format for representing states and goal: variables & constraints, backtrack search (DFS); propagation (forward-checking / arc-consistency, global constraints, variable / value ordering / randomized backtrack-search).

71 Local search Greedy / Hillclimbing Simulated annealing Genetic Algorithms / Genetic Programming search space 10 100 to 10 1000 Aversarial Search / Game Playing minimax Up to ~10 10 nodes, 6–7 ply in chess. alpha-beta pruning Up to ~10 20 nodes, 14 ply in chess. provably optimal Summary: Search (Contd)

72 Carla P. Gomes CS4700 Search and AI Why such a central role? Basically, because lots of tasks in AI are intractable. Search is “the only” way to handle them. Many applications of search, in e.g., Learning / Reasoning / Planning / NLU / Vision Good thing: much recent progress (10 30 quite feasible; sometimes up to 10 1000 ). Qualitative difference from only a few years ago!


Download ppt "Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Randomization in Complete Tree Search."

Similar presentations


Ads by Google