Heuristics CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

Presentation on theme: "Heuristics CPSC 386 Artificial Intelligence Ellen Walker Hiram College."— Presentation transcript:

Heuristics CPSC 386 Artificial Intelligence Ellen Walker Hiram College

Informed Search Strategies Also called heuristic search All are variations of best-first search –The next node to expand is the one “most likely” to lead to a solution –Priority queue, like uniform cost search, but priority is based on additional knowledge of the problem –The priority function for the priority queue is usually called f(n)

Heuristic Function Heuristic, from Greek for “good” Heuristic function, h(n) = estimated cost from the current state to the goal Therefore, our best estimate of total path cost is g(n) + h(n) –Recall, g(n) is cost from initial state to current state

In A*, better h means better search When h = cost to the goal, –Only nodes on correct path are expanded –Optimal solution is found When h < cost to the goal, –Additional nodes are expanded –Optimal solution is found When h > cost to the goal –Optimal solution can be overlooked

Pruning the Search Tree In A* search, if h is too big, it will prevent the node (and its successors, grand-successors, etc.) from ever being expanded This is called “pruning” (like removing branches from a tree) Pruning the tree reduces the search below exponential –Only if a good heuristic is available

Costs of A* Time –The better the heuristic, the less time Best case: h is perfect, O(d) Worst admissible case: h is 0, O(b d ), i.e. bfs Space –All nodes (open and closed list) are saved in case of repetition –This is exponential (b d or worse). –A* generally runs out of space before it runs out of time

Memory-bounded Heuristic Search Iterative Deepening A* (IDA*) –Like iterative deepening, but cutoff at (g+h)>max, rather than depth >max –At each iteration, cutoff is first f-cost that exceeds the cost of the node at the previous iteration. Recursive BFS (see textbook, fig 4.5) Simple Memory Bounded A* (SMA*) –Set max memory bound –If memory is “full”, to add a node drop the worst (g+h) node that’s already stored –Expands newest best leaf, deletes oldest worst leaf

Backed-up Values The (real) f-value of any node in a path is the same as the f-value of the solution Therefore, you can update f of parent to best f of a child. (This also helps when revisiting a node from a different parent) If you have to “forget” deeper nodes, their consequences are remembered in the parent (This concept is used more prominently in adversary games)

Comparing Heuristic Functions An admissible heuristic function never overestimates the distance to the goal. The function h=0 is the least useful admissible function. Given 2 admissible heuristic functions (h1 and h2), h1 dominates h2 if h1(n)≥ h2(n) for any node n The perfect h function is dominant over all other admissible heuristic functions Dominant admissible heuristic functions are better

Combining Heuristic Functions Every admissible heuristic is <= the actual distance to goal Therefore, if you have 2 admissible heuristics, the higher value is closer to the goal. If you have 2 or more heuristics, you can therefore combine them into a better one by taking the maximum value for any state. Useful when you have a set of heuristics where no one is dominant

Finding Heuristic Functions: Relaxed Problems Remove constraints from the original problem to generate a “relaxed problem” Cost of optimal solution to relaxed problem is admissable heuristic for original problem –Because a solution to the original problem also solves the relaxed problem (at a cost ≥ relaxed solution cost)

8-puzzle examples Number of tiles out of place –Relax constraint that tiles must move into empty squares, and that tiles must move into adjacent squares Manhattan distance to solution –Relax (only) constraint that tiles must move into empty squares

Finding Heuristic Functions: Subproblems Consider solving only part of the problem –Example: getting 1,2,3 and 4 of 8-puzzle into place Again, exact solutions to subproblems are admissable heuristics Store subproblem solutions in a pattern database, look up heuristic –# patterns is much smaller than state space! –Generate database by working backwards from the solution –If multiple subproblems apply, take the max –If multiple disjoint subproblems apply, heuristics can be added

Finding Heuristic Functions: Learning Take experience and learn a function Each “experience” is a start state and the actual cost of the solution Learn from “features” of a state that are relevant to a solution, rather than the state itself (helps generalization) –Generate “many” states with a given feature and determine average distance –Combine information from multiple features h(n) = c1 * x1(n) + c2 * x2(n)… where x1, x2 are features

Local Search Algorithms Instead of considering the whole state space, consider only the current state Limits necessary memory; paths not retained Amenable to large or continuous (infinite) state spaces where exhaustive algorithms aren’t possible Local search algorithms can’t backtrack!

Optimization Given measure of goodness (of fit) Find optimal parameters (e.g correspondences) That maximize goodness measure (or minimize badness measure) Optimization techniques –Direct (closed-form) –Search (generate-test) –Heuristic search (e.g Hill Climbing) –Genetic Algorithm

Direct Optimization The slope of a function at the maximum or minimum is 0 –Function is neither growing nor shrinking –True at global, but also local extreme points Find where the slope is zero and you find extrema! (If you have the equation, use calculus (first derivative=0) but watch out for “shoulders”

Hill Climbing Consider all possible successors as “one step” from the current state on the landscape. At each iteration, go to –The best successor (steepest ascent) –Any uphill move (first choice) –Any uphill move but steeper is more probable (stochastic) All variations get stuck at local maxima

Issues in Hill Climbing Local maxima = no uphill step –Algorithms on previous slide fail (not complete) –Allow “random restart” which is complete, but might take a very long time Plateau = all steps equal (flat or shoulder) –Must move to equal state to make progress, but no indication of the correct direction Ridge = narrow path of maxima, but might have to go down to go up (e.g. diagonal ridge in 4-direction space)

Simulated Annealing Figure 4.14, simulate gradual cooling to low-energy crystalline state Algorithm is randomized: take a step if random number is less than a value based on both the objective function and the Temperature. When Temperature is high, chance of going toward a higher value of optimization function J(x) is greater. Note higher dimension: “perturb parameter vector” vs. “look at next and previous value”.

Local Beam Search Keep track of K local searches at once At each step, generate all successors and keep the best K (Localized version of memory-bounded A*) Stochastic: choose K states at random, but probability of state being chosen is proportional to its goodness

Genetic Algorithm Quicker but randomized searching for an optimal parameter vector Operations –Crossover (2 parents -> 2 children) –Mutation (one “bit”) Basic structure –Create population –Perform crossover & mutation (on fittest) –Keep only fittest children

Example: “Hello, World” Initial population is 2048 random strings of length 12 Fitness of an individual is calculated by comparing each letter to its corresponding letter in the target phrase and adding up the differences Top 10% of population is retained, remaining 90% is created by crossover of top 50% of population with 25% chance of mutation –Crossover: choose a random position and swap substrings –Mutation: choose a random position and replace by a random character Source: http://generation5.org/content/2003/gahelloworld.asp

Crossover and Mutation Crossover –Parents: “Habxcq, oorld” and “Yellav, adjfd” –Children: “Hablav, adjfd” and “Yelxcq, oorld” Mutation –Before: “Habxcq, oorld” –After: “Habxrq, oorld”

Genetic Algorithm: Why does it work? Children carry parts of their parents’ data Only “good” parents can reproduce –Children are at least as “good” as parents? No, but “worse” children don’t last long Large population allows many “current points” in search –Can consider several regions (watersheds) at once

Genetic Algorithm: Issues & Pitfalls Representation –Children (after crossover) should be similar to parent, not random –Binary representation of numbers isn’t good - what happens when you crossover in the middle of a number? –Need “reasonable” breakpoints for crossover (e.g. between R, xcenter and ycenter but not within them) “Cover” –Population should be large enough to “cover” the range of possibilities –Information shouldn’t be lost too soon –Mutation helps with this issue

Experimenting With Genetic Algorithms Be sure you have a reasonable “goodness” criterion Choose a good representation (including methods for crossover and mutation) Generate a sufficiently random, large enough population Run the algorithm “long enough” Find the “winners” among the population Variations: multiple populations, keeping vs. not keeping parents, “immigration / emigration”, mutation rate, etc.

Summary: Search Techniques Exhaustive –Depth-first, Breadth First –Uniform cost –Iterative Deepening Best-first (heuristic) –Greedy –A* –Memory-bounded (beam, mbA*) Local heuristic –Hill-climbing (steepest, any upward, random restart) –Simulated annealing (stochastic) –Genetic Algorithm (highly parallel, stochastic)