# Local Search and Optimization

## Presentation on theme: "Local Search and Optimization"— Presentation transcript:

Local Search and Optimization
CS 2710, ISSP 2610 Chapter 3, Part 3 Heuristic Search Chapter 4 Local Search and Optimization

Beam Search Cheap, unpredictable search
For problems with many solutions, it may be worthwhile to discard unpromising paths Greedy best first search that keeps a fixed number of nodes on the fringe

Beam Search def beamSearch(fringe,beamwidth):
while len(fringe) > 0: cur = fringe[0] fringe = fringe[1:] if goalp(cur): return cur newnodes = makeNodes(cur, successors(cur)) for s in newnodes: fringe = insertByH(s, fringe) fringe = fringe[:beamwidth] return []

Beam Search Optimal? Complete? Hardly! Space?
O(b) (generates the successors) Often useful

Creating Heuristics

Combining Heuristics If you have lots of heuristics and none dominates the others and all are admissible… Use them all! H(n) = max(h1(n), …, hm(n))

Relaxed Heuristic Relaxed problem
A problem with fewer restrictions on the actions The cost of an optimal solution to a relaxed problem is an admissible heuristic for the original problem.

Relaxed Problems Exact solutions to different (relaxed) problems
H1 (# of misplaced tiles) is perfectly accurate if a tile could move to any square H2 (sum of Manhattan distances) is perfectly accurate if a tile could move 1 square in any direction The cost of an optimal solution to a relaxed problem is an admissible heuristic for the original problem. The optimal solution in the original problem is, by def, also a solution in the relaxed problem and therefore must be at least as expensive as the optimal solution in the relaxed problem

Relaxed Problems If problem is defined formally as a set of constraints, relaxed problems can be generated automatically Absolver (Prieditis, 1993) Discovered a better heuristic for 8 puzzle and the first useful heuristic for Rubik’s cube Generates heuristics using relaxed problem and other techniques. Discovered a better heuristic for 8 puzzle and the first useful heuristic for Rubik’s cube. Note depending on the problem, the heuristic can be derived directly. Or you might have to allow h to run a breadth-first search

Systematic Relaxation
Precondition List A conjunction of predicates that must hold true before the action can be applied Add List A list of predicates that are to be added to the description of the world-state as a result of applying the action Delete List A list of predicates that are no longer true once the action is applied and should, therefore, be deleted from the state description Primitive Predicates ON(x, y) : tile x is on cell y CLEAR(y) : cell y is clear of tiles ADJ(y, z) : cell y is adjacent to cell z

Here is the full definition of s move for the n-puzzle
Move(x, y, z): precondition list ON(x, y), CLEAR(z), ADJ(y, z) add list ON(x, z), CLEAR(y) delete list ON(x, y), CLEAR(z)

Misplaced distance is 1+1=2 moves
(1) Removing CLEAR(z) and ADJ(y, z) gives “# tiles out of place”. Misplaced distance is 1+1=2 moves 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Manhattan distance is 6+3=9 moves
(2) Removing CLEAR(z) gives “Manhattan distance”. Manhattan distance is 6+3=9 moves 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pattern Database Heuristics
The idea behind pattern database heuristics is to store exact solution costs for every possible sub-problem instance.

Solve part of the problem, ignoring the other tiles
3 7 11 12 13 14 15 3 7 11 12 13 14 15 3 7 11 12 13 14 15

Pattern Databases optimal solution cost of the subproblem <= optimal solution cost of the full problem. Run exhaustive search to find optimal solutions for every possible configuration of 3, 7, 11, 12, 13, 14, 15, and store the results Do the same for the other tiles (maybe in two 4-tile subsets) Do this once before any problem solving is performed. Expensive, but can be worth it, if the search will be applied to many problem instances (deployed) The cost of an optimal solution to the subproblem in this case is less than or equal to the optimal solution cost of the full problem.

Pattern Databases Recall, in our example, we have three subproblems (subsets of 7, 4, and 4 tiles) State S has specific configurations of those subsets h(s)?

h(s)? Look up the exact costs for s’s configurations of the 7, 4, and 4 tiles in the database Take the max! The max of a set of admissible heuristics is admissible What if it isn’t feasible to have entries for all possibilities? ….

What if it isn’t feasible to have entries for all possibilities? ….
Take the max of: The exact costs we do have, and the Manhattan distance for those we don’t have

Sums of admissible heurstics
We would like to take the sum rather than the max, since the result is more informed In general, adding two admissible heuristics might not be admissible For example, moves that solve one subproblem might help another subproblem But we can choose patterns that are disjoint, so we can sum them

Disjoint Pattern Database Heuristics
Patterns that have no tiles in common. (As in our example) When calculating costs for a pattern, only count moves of the tiles in the pattern Add together the heuristic values for the individual patterns. The sum is admissible and more informed than taking the max

Examples for Disjoint Pattern Database Heuristics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 20 moves needed to solve red tiles 25 moves needed to solve blue tiles Overall heuristic is sum, or 20+25=45 moves

A trivial example of disjoint pattern database heuristics is Manhattan Distance in the case that we view every tile as a single pattern database 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Overall heuristic is sum of the Manhattan Distance of each tile which is 39 moves.

For your interest http://idm-lab.org/bib/abstracts/Koen07g.html
P. Haslum, A. Botea, M. Helmert, B. Bonet and S. Koenig. Domain-Independent Construction of Pattern Database Heuristics for Cost-Optimal Planning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages , 2007.

Linear Conflict Heuristic Function
Def. Linear Conflict Heuristic --Two tiles tj and tk are in a linear conflict if tj and tk are the same line, the goal positions of tj and tk are both in that line, tj is to the right of tk, and goal position of tj is to the left of the goal position of tk.

Linear Conflict Example
3 1 1 3 Manhattan distance is 2+2=4 moves

Linear Conflict Example
3 1 1 3 Manhattan distance is 2+2=4 moves

Linear Conflict Example
3 1 3 1 Manhattan distance is 2+2=4 moves

Linear Conflict Example
3 1 3 1 Manhattan distance is 2+2=4 moves

Linear Conflict Example
3 1 3 1 Manhattan distance is 2+2=4 moves

Linear Conflict Example
1 3 1 3 Manhattan distance is 2+2=4 moves

Linear Conflict Example
1 3 1 3 Manhattan distance is 2+2=4 moves Add a penalty for each linear conflict

Other Sources of Heuristics
Ad-hoc, informal, rules of thumb (guesswork) Approximate solutions to problems (algorithms course) Learn from experience (solving lots of 8-puzzles). Each optimal solution is a learning example (node,actual cost to goal) Learn heuristic function, E.G. H(n) = c1x1(n) + c2x2(n) x1 = #misplaced tiles; x2 = #adj tiles also adj in the goal state. c1 & c2 learned (best fit to the training data)

Search Types Backtracking state-space search
Local Search and Optimization Constraint satisfaction search Adversarial search

Local Search and Optimization
Previous searches: keep paths in memory, and remember alternatives so search can backtrack. Solution is a path to a goal. Path may be irrelevant, if the final configuration only is needed (8-queens, IC design, network optimization, …)

Local Search Use a single current state and move only to neighbors.
Use little space Can find reasonable solutions in large or infinite (continuous) state spaces for which the other algorithms are not suitable Iterative improvement: start with a complete configuration and make modifications to improve it

Optimization Local search is often suitable for optimization problems. Search for best state by optimizing an objective function.

Visualization States are laid out in a landscape
Height corresponds to the objective function value Move around the landscape to find the highest (or lowest) peak Only keep track of the current states and immediate neighbors

Local Search Alogorithms
Two strategies for choosing the state to visit next Hill climbing Simulated annealing Then, an extension to multiple current states: Genetic algorithms

Hillclimbing (Greedy Local Search)
Generate nearby successor states to the current state Pick the best and replace the current state with that one. Loop

Hill-climbing search problems
Local maximum: a peak that is lower than the highest peak, so a bad solution is returned Plateau: the evaluation function is flat, resulting in a random walk Ridges: slopes very gently toward a peak, so the search may oscillate from side to side Ridge: sequence of local maxima. See page 114 – that picture is good Local maximum Plateau Ridge

Random restart hill-climbing
Start different hill-climbing searches from random starting positions stopping when a goal is found Save the best result from any search so far If all states have equal probability of being generated, it is complete with probability approaching 1 (a goal state will eventually be generated). Finding an optimal solution becomes the question of sufficient number of restarts Surprisingly effective, if there aren’t too many local maxima or plateaux There are lots of varients of hill climbing.

Simulated Annealing Based on a metallurgical metaphor
Start with a temperature set very high and slowly reduce it. Run hillclimbing with the twist that you can occasionally replace the current state with a worse state based on the current temperature and how much worse the new state is. Annealing: process used to temper or harden metals and glass by heating them to a high temperature and then gradually cooling them.

Simulated Annealing Annealing: harden metals and glass by heating them to a high temperature and then gradually cooling them At the start, make lots of moves and then gradually slow down

Simulated Annealing More formally…
Generate a random new neighbor from current state. If it’s better take it. If it’s worse then take it with some probability proportional to the temperature and the delta between the new and old states.

Simulated annealing Probability of a move decreases with the amount ΔE by which the evaluation is worsened A second parameter T is also used to determine the probability: high T allows more worse moves, T close to zero results in few or no bad moves Schedule input determines the value of T as a function of the completed cycles

function Simulated-Annealing(problem, schedule) returns a solution state
inputs: problem, a problem schedule, a mapping from time to “temperature” current ← Make-Node(Initial-State[problem]) for t ← 1 to ∞ do T ← schedule[t] if T=0 then return current next ← a randomly selected successor of current ΔE ← Value[next] – Value[current] if ΔE > 0 then current ← next else current ← next only with probability eΔE/T

Intuitions Hill-climbing is incomplete
Pure random walk, keeping track of the best state found so far, is complete but very inefficient Combine the ideas: add some randomness to hill-climbing to allow the possibility of escape from a local optimum

Intuitions the algorithm wanders around during the early parts of the search, hopefully toward a good general region of the state space Toward the end, the algorithm does a more focused search, making few bad moves

Theoretical Completeness
There is a proof that if the schedule lowers T slowly enough, simulated annealing will find a global optimum with probability approaching 1 In practice, that may be way too many iterations In practice, though, SA can be effective at finding good solutions

Local Beam Search Keep track of k states rather than just one, as in hill climbing In comparison to beam search we saw earlier, this algorithm is state-based rather than node-based.

Local Beam Search Begins with k randomly generated states
At each step, all successors of all k states are generated If any one is a goal, alg halts Otherwise, selects best k successors from the complete list, and repeats

Local Beam Search Successors can become concentrated in a small part of state space Stochastic beam search: choose k successors, with probability of choosing a given successor increasing with value Like natural selection: successors (offspring) of a state (organism) populate the next generation according to its value (fitness)

Genetic Algorithms Variant of stochastic beam search
Combine two parent states to generate successors

function GA (pop, fitness-fn)
Repeat new-pop = {} for i from 1 to size(pop): x = rand-sel(pop,fitness-fn) y = rand-sel(pop,fitness-fn) child = reproduce(x,y) if (small rand prob): child mutate(child) add child to new-pop pop = new-pop Until an indiv is fit enough, or out of time Return best indiv in pop, according to fitness-fn

function reproduce(x,y)
n = len(x) c = random num from 1 to n return: append(substr(x,1,c),substr(y,c+1,n)

Example: n-queens Put n queens on an n × n board with no two queens on the same row, column, or diagonal

Genetic Algorithms Notes
Representation of individuals Classic approach: individual is a string over a finite alphabet with each element in the string called a gene Usually binary instead of AGTC as in real DNA Selection strategy Random Selection probability proportional to fitness Selection is done with replacement to make a very fit individual reproduce several times Reproduction Random pairing of selected individuals Random selection of cross-over points Each gene can be altered by a random mutation

Genetic Algorithms When to use them?
Genetic algorithms are easy to apply Results can be good on some problems, but bad on other problems Genetic algorithms are not well understood