Presentation on theme: "Artificial Intelligence I nformed search and exploration Fall 2008 professor: Luigi Ceccaroni."— Presentation transcript:
Artificial Intelligence I nformed search and exploration Fall 2008 professor: Luigi Ceccaroni
2 Heuristic search Heuristic function, h’(n): it estimates the lowest cost from the n state to a goal state. At each moment, most promising states are selected. Finding a solution (if it exist) is not guaranteed. Types of heuristic search: –BB (Branch & Bound), Best-first search –A, A* –IDA* –Local search: hill climbing simulated annealing genetic algorithms 2
3 Branch & Bound For each state, the cost is kept of getting from the initial state to that state (g). The global, minimum cost is kept, too, and guides the search. A branch is abandoned if its cost is greater than the current minimum. 3
Best-first search Priority is given by the heuristic function (estimation of the cost from a given state to a solution). At each iteration the node with minimum heuristic function is chosen. The optimal solution is not guaranteed.
Importance of the estimator BCDEFGHAABCDEFGH Initial state H1 = 4 H2 = -28 Final state H1 = 8 H2 = 28 (= 7+6+5+4+3+2+1) Operations: - put a free block on the table - put a free block on another free block Estimator H1: - add 1 for each block which is on the right block - subtract 1 otherwise Estimator H2: - for each block, if the underlying structure is correct, add 1 for each block of that structure - otherwise, subtract 1 for each block of the structure
Heuristic functions 2 8 3 1 6 4 7 5 Initial state 1 2 3 8 4 7 6 5 Final state Possible heuristic functions: h(n) = w(n) = #misplaced h(n) = p(n) = sum of distances to the final position h(n) = p(n) + 3 ·s(n) where s(n) is obtained cycling over non-central positions and adding 2 if a tile is not followed by the right one and adding 1 if there is a tile in the center
A and A* algorithms The estimation function (h’) has two components: 1)Minimum cost to get from an initial state to the current state 2)Minimum (estimated) cost to get from the current state to a solution f’(n) = g(n) + h’(n) f’ is an estimation of the total cost. h’ is an estimation of the real cost to get to a solution (h). g is a real cost of the minimum path to the current state. Priority is given to nodes with lower f’. In case of same h, priority is given to the node with lower h’. If h’(n) never overestimates the real cost, that is ∀ n h’(n) ≤ h(n), it can be demonstrated that the algorithm can find the optimal path. A A*
The priority is given by the estimation function f’(n)=g(n)+h’(n). At each iteration, the best estimated path is chosen (the first element in the queue). A* is an instance of best-first search algorithms. A* is complete when the branching factor is finished and each operator has a constant, positive cost.
13 A*: treatment of repeated states If the repeated state is in the structure of open nodes: –If its new cost (g) is lower, the new cost is used; possibly changing its position in the structure of open nodes. –If its new cost (g) is equal or greater, the node is forgotten. If the repeated state is in the structure of closed nodes: –If its new cost (g) is lower, the node is reopened and inserted in the structure of open nodes with the new cost. Nothing is done with its successors; they will be reopened if necessary. –If its new cost (g) is equal or greater, the node is forgotten. 13
A*: admissibility The A algorithm, depending on the heuristic function, finds or not an optimal solution. If the heuristic function is admissible, the optimization is granted. A heuristic function is admissible if it satisfies the following property: ∀ n 0 ≤ h’(n) ≤ h(n) h’(n) has to be an optimistic estimator; it never has to overestimate h(n). Using an admissible heuristic function guarantees that a node on the optimal path never seems too bad and that it is considered at some point in time.
Optimality of A* A * expands nodes in order of increasing f value. Gradually adds "f-contours" of nodes. Contour i has all nodes with f=f i, where f i < f i+1.
More and less informed algorithms Given a problem, there exist as many A* to solve it as heuristic functions we can define. A 1 *, A 2 * admissible ∀ n≠final: 0 ≤ h’ 2 (n) < h’ 1 (n) ≤ h(n) ⇒ A 1 * more informed than A 2 *
More informed algorithms A 1 * more informed than A 2 * ⇓ n expanded by A 1 * ⇒ n expanded by A 2 * ⇐ ⇓ A 1 * expands less nodes than A 2 * ⇓ The more informed, the better
More informed algorithms Compromise between: –Calculation time in h’ h’ 1 (n) could require more calculation time than h’ 2 (n) ! –Re-expansions A 1 * may re-expand more nodes than A 2 * ! –Not if A 1 * is consistent –Not if trees (instead of graphs) are considered Loss of admissibility –Working with non admissible heuristic functions can be interesting to gain speed.
8-puzzle (operation’s cost = 1) h’ 0 (n)=0 breadth first, A 0 * admissible, many generations and expansions h’ 1 (n)= # wrongly placed tiles A 1 * admissible, A 1 * more informed than A 0 * 13 4 5 2 67 8 132 4 567 8
h’ 2 (n)=Σ i ∈ [1,8] d i d i ≡ distance between current and final position of tile i distance ≡ minimum number of moves between two positions A 2 * admissible. Statistically, A 2 * is better than A 1 *, but it can’t be formally said that it is more informed.
h’ 3 (n)=Σ i ∈ [1,8] d i + 3 S(n) S(n)=Σ i ∈ [1,8] s i s i = 0 if tile i is not in the center and its successor is correct 1 if tile i is in the center 2 if tile i is not in the center and its successor is incorrect h’ 3 (n)=1+3(2+1)=10 h(n)=1 A 3 * cannot be compared to A 1 * o A 2 *, but it is faster (even if the h’ to be calculated requires more time). 13 24 567 8 } A 3 * no admissible
Memory bounded search The A* algorithm solves problems in which it is necessary to find the best solution. Its cost in space and time, in average and if the heuristic function is adequate, is better than that of blind-search algorithms. There exist problems in which the size of the search space does not allow a solution with A*. There exist algorithms which allow to solve problems limiting the memory used: –Iterative deepening A* (IDA*) –Recursive best-first –Simplified memory-bounded A* (SMA*)
Iterative deepening A* (IDA*) Iterative deepening A* is similar to the iterative deepening (ID) technique. In ID the limit is given by a maximum depth. In IDA* the limit is given by a maximum value of the f-cost. Important: The search is a standard depth-first; the f-cost is used only to limit the expansion. Starting limit = f (initial) g + h’ (deepening expansion limit) 0+2 1+1 1+2 2+1 3+1 2+1 3+1 4+0 5+0 4+1 goal
Iterative deepening A* (IDA*) Iterative deepening A* is similar to the iterative deepening (ID) technique. In ID the limit is given by a maximum depth. In IDA* the limit is given by a maximum value of the f-cost. Important: The search is a standard depth-first; the f-cost is used only to limit the expansion. Starting limit = f (initial) (1,3,8) (2,4,9) (5,10) (11) (6,12) (7,13) (14) (15) g + h’ (deepening expansion limit) 0+2 1+1 1+2 2+1 3+1 2+1 3+1 4+0 5+0 4+1 goal
Algorithm IDA* depth=f(Initial_state) While not is_final?(Current) do Open_states.insert(Initial_state) Current= Open_states.first() While not is_final?(Current) and not Open_states.empty?() do Open_states.delete_first() Closed_states.insert(Current) Successors= generate_successors(Current, depth) Successors= process_repeated(Successors, Closed_states, Open_states) Open_states.insert(Successors) Current= Open_states.first() eWhile depth=depth+1 Open_states.initialize() eWhile eAlgorithm The function generate_successors only generate those with an f-cost less or equal to the cutoff limit of the iteration. The OPEN structure is a stack (depth-first search). If repeated nodes are processed there is no space saving. Only the current path (tree branch) is saved in memory.
35 Other memory-bounded algorithms IDA*’s re-expansions can represent a high temporal cost. There are algorithms which, in general, expand less nodes. Their functioning is based on eliminating less promising nodes and saving information which allows to re-expand them (in necessary). Examples: –Recursive best-first –Memory bound A* (MA*) 35
36 Recursive best-first It is a recursive implementation of best- first, with lineal spatial cost. It forgets a branch when its cost is more than the best alternative. The cost of the forgotten branch is stored in the parent node as its new cost. The forgotten branch is re-expanded if its cost becomes the best one again. 36
39 Recursive best-first In general, it expands less nodes than IDA*. Not being able to control repeated states, its cost in time can be high if there are loops. Sometimes, memory restrictions can be relaxed. 39
40 Memory bound A* (MA*) It imposes a memory limit: number of nodes which can be stored. A* is used for exploration and nodes are stored while there is memory space. When there is no more space, the worst nodes are eliminated, keeping the best cost of forgotten descendants. The forgotten branches are re-expanded if their cost becomes the best one again. MA* is complete if the solution path fits in memory. 40
Local search algorithms and optimization problems In local search (LS), from a (generally random) initial configuration, via iterative improvements (by operators application), a state is reached from which no better state can be attained. LS algorithms (or meta-heuristics or local optimization methods) are prone to find local optima, which are not the best possible solution. The global optimum is generally impossible to be reached in limited time. In LS, there is a function to evaluate the quality of the states, but this is not necessarily related to a cost.
These algorithms do not systematically explore all the state space. The heuristic (or evaluation) function is used to reduce the search space (not considering states which are not worth being explored). Algorithms do not usually keep track of the path traveled. The memory cost is minimal. This total lack of memory can be a problem (i.e., cycles).
Hill-climbing search Standard hill-climbing search algorithm –It is a simple loop which search for and select any operation that improves the current state. Steepest-ascent hill climbing or gradient search –The best move (not just any one) that improves the current state is selected.
Algorithm Hill Climbing Current= Initial_state end= false While ¬end do Children= generate_successors(Current) Children= order_and_eliminate_worse_ones(Children, Current) if ¬empty?(Children) then Current = Select_best(Children) else end=true eWhile eAlgorithm
Hill climbing Children are considered only if their evaluation function is better than the one of the parent (reduction of the search space). A stack could be used to save children which are better than the parent, to be able to backtrack; but in general the cost of this is prohibitive. The characteristics of the heuristic function determine the success and the rapidity of the search. It is possible that the algorithm does not find the best solution: –Local optima: no successor has a better evaluation –Plateaux: all successors has the same evaluation
Simulated annealing It is a stochastic hill-climbing algorithm (stochastic local search, SLS): –A successor is selected among all possible successors according to a probability distribution. –The successor can be worse than the current state. Random steps are taken in the state space. It is inspired by the physical process of controlled cooling (crystallization, metal annealing): –A metal is heated up to a high temperature and then is progressively cooled in a controlled way. –If the cooling is adequate, the minimum-energy structure (a global minimum) is obtained.
Simulated annealing Aim: to avoid local optima, which represent a problem in hill climbing.
Simulated annealing Solution: to take, occasionally, steps in a different direction from the one in which the increase (or decrease) of energy is maximum.
Simulated annealing methodology Terminology from the physical problem is often used. Temperature (T) is a control parameter. Energy (f or E) is a heuristic function about the quality of a state. A function F decides about the selection of a successor state and depends on T and the difference between the quality of the nodes (Δf ): the lower the temperature, the lower the probability of selecting worse successors. The cooling strategy determines: –maximum number of iterations in the search process –temperature-decrease steps –number of iterations for each step
An initial temperature is defined. While the temperature is not zero do /* Random moves in the state space */ For a predefined number of iterations do L new =Generate_successor(L current ) if F(f(L actual )- f(L new ),T) > 0 then L actual =L new eFor The temperature is decreased. eWhile
Main idea: Steps taken in random directions do not decrease (but actually increase) the ability of finding a global optimum. Disadvantage: The very structure of the algorithm increases the execution time. Advantage: The random steps possibly allow to avoid small “hills”. Temperature: It determines (through a probability function) the amplitude of the steps, long at the beginning, and then shorter and shorter. Annealing: When the amplitude of the random step is sufficiently small not to allow to descend the hill under consideration, the result of the algorithm is said to be annealed.
Simulated annealing It is possible to demonstrate that, if the “temperature” of the algorithm is reduced very slowly, a global maximum will be found with a probability close to 1: –Value of the energy function (E) in the global maximum = m –Value of E in the best local maximum = l < m –There will be some temperature t high enough to allow to descend from the local maximum, but not from the global maximum. –If the temperature is reduced very slowly, the algorithm will work enough time with a temperature close to t, until it finds and ascend the global maximum and the solution will stay there because there won’t be available steps long enough to descend from it.
Simulated annealing Conclusion: During the resolution of a search problem, a node should occasionally be explored, which appears substantially worse than the best node in the L list of OPEN nodes.
Minimum spanning tree (MST, SST): A minimum-weight tree in a weighted graph which contains all of the graph's vertices. Traveling salesman (TSP): Find a path through a weighted graph which starts and ends at the same vertex, includes every other vertex exactly once, and minimizes the total cost of edges. Hill climbing and simulated annealing: applications
Simulated annealing: applications: TSP Search space: N! Operators define transformations between solutions: inversion, displacement, interchange… Energy function: sum of distances between cities, ordered according to the solution. An initial temperature is defined. (It may need some experimentation.) The number of iteration per temperature value is defined. The way to decrease the temperature is defined.
Hill climbing and simulated annealing: applications Euclidean Steiner tree: A tree of minimum Euclidean distance connecting a set of points, called terminals, in the plane. This tree may include points other than the terminals. Steiner tree: A minimum-weight tree connecting a designated set of vertices, called terminals, in an undirected, weighted graph or points in a space. The tree may include non-terminals, which are called Steiner vertices or Steiner points.
Hill climbing and simulated annealing: applications Bin packing problem: Determine how to put the most objects in the least number of fixed space bins. More formally, find a partition and assignment of a set of objects such that a constraint is satisfied or an objective function is minimized (or maximized). There are many variants, such as, 3D, 2D, linear, pack by volume, pack by weight, minimize volume, maximize value, fixed shape objects. Knapsack problem: Given items of different values and volumes, find the most valuable set of items that fit in a knapsack of fixed volume.
Simulated annealing: conclusions It is suitable for problems in which the global optimum is surrounded by many local optima. It is suitable for problems in which it is difficult to find a good heuristic function. Determining the values of the parameters can be a problem and requires experimentation.
Local beam search Keeping just one node in memory may be an extreme reaction to the problem of memory limits. The local beam search keeps track of k nodes: –It starts with k states randomly generated –At each step all successors of the k states are generated. –It is checked if some state is a goal. –If not, the k best successors from the complete list are selected and the process is repeated. 63
64 Local beam search There is a difference between k random executions in parallel and in sequence! –If a state generates various good successors, the algorithm quickly abandons unsuccessful searches and moves its resources where major progress is made. It can suffer from a lack of diversity among the k states (concentrated in a small region of the state space) and become an expensive version of hill climbing. 64
65 Stochastic beam search It is similar to local beam search. Instead of choosing the k best successors, k successors are chosen randomly: –The probability of choosing a successor grows with its fitness function. Analogy with the process of natural selection: successors (descendants) of a state (organism) populate the following generation as a function of their value (fitness, health, adaptability). 65
66 Genetic algorithms A genetic algorithm (GA) is a variant of stochastic beam search, in which two parent states are combined. Inspired by the process of natural selection: –Living beings adapt to the environment thanks to the characteristics inherited from their parents. –The possibility of survival and reproduction are proportional to the goodness of these characteristics. –The combination of “good” individuals can produce better adapted individuals. 66
Genetic algorithms To solve a problem via GAs requires: –The representation of the states (individuals): Each individual is represented as a string over a finite alphabet (usually, a string of 0s and 1s) –A function, which measure the fitness of the states –Operators, which combine states to obtain new states Cross-over and mutation operators –The size of the initial population: GAs start with a set of k states randomly generated –A strategy to combine individuals 67
Genetic algorithms: representation of individuals The coding of individuals defines the size of the search space and the kind of operators. If the state has to specify the position of n queens, each one in a column of n squares, n·log 2 n bits are required (8, in the example). The state could be represented also as n digits [1,n]. 68
Genetic algorithms: fitness function A fitness function is calculated for each state. It should return higher values for better states. In the n-queens problem, the number of non-attacking pairs of queens could be used. In the case of 4 queens, the fitness function has a value of 6 for a solution.
Genetic algorithms: operators The combination of individuals is carried out via cross-over operators. The basic operator is crossing over one (random) point and combining using that point as reference:
Genetic algorithms: operators Other possible exchange-operators: –Crossing over 2 points –Random exchange of bits –Ad hoc operators Mutation operators: –Analogy with gene combination: Sometimes the information of part of the genes randomly changes –A typical case (in the case of binary strings) consists of changing the sign of each bit with some (very low) probability.
Genetic algorithms: combination Each iteration of the search is a new generation of individuals –The size of the population is in general constant (N). To get to the following population, the individuals that “reproduce” have to be chosen (intermediate population).
Genetic algorithms: combination Selection of individuals: –Individuals are selected according to their fitness function value –N random tournaments of pairs of individuals, choosing the winner of each one –… –There will always be individuals represented more than once and other individuals not represented in the intermediate population
Genetic algorithms: algorithm Steps of the basic GA algorithm: 1.N individuals from current population are selected to form the intermediate population (according to some predefined criteria). 2.Individuals are paired and for each pair: a)The crossover operator is applied with probability P_c and two new individuals are obtained. b)New individuals are mutated with a probability P_m. 3.The resulting individuals form the new population.
Genetic algorithms: algorithm The process is iterated until the population converges or a specific number of iteration has passed. The crossover probability influences the diversity of the new population. The mutation probability is always very low.
76 Genetic algorithms: 8-queens problem An 8-queens state must specify the positions of 8 queens, each in a column of 8 squares, and so requires 8 x log 8 = 24 bits. Alternatively, the state can be represented as 8 digits, each in the range from 1 to 8. Each state is rated by the evaluation function or the fitness function. A fitness function should return higher values for better states, so, for the 8-queens problem the number of non-attacking pairs of queens is used (=28 for a solution). 76 2
77 Genetic algorithms: 8-queens problem The initial population in (a) is ranked by the fitness function in (b), resulting in pairs for mating in (c). They produce offspring in (d), which are subject to mutation in (e). 77
78 Genetic algorithms Like stochastic beam search, GAs combine: –an uphill tendency –random exploration –exchange of information among parallel search threads The primary advantage of GAs comes from the crossover operation. Yet it can be shown mathematically that, if the genetic code is permuted initially in a random order, crossover conveys no advantage. 78
79 Genetic algorithms: application In practice, GAs have had a widespread impact on optimization problems, such as: –circuit layout –scheduling At present, it is not clear whether the appeal of GAs arises: –from their performance or –from their aesthetically pleasing origins in the theory of evolution. 79