# in Artifitial Intelligence

## Presentation on theme: "in Artifitial Intelligence"— Presentation transcript:

in Artifitial Intelligence
Heuristic Search in Artifitial Intelligence Course written by Richard E. Korf, UCLA. The slides were made by students of this course from Bar-ilan University, Tel-Aviv, Israel.

Problems and Problem Spaces
Chapter 1 Problems and Problem Spaces

Problems There are 3 general categories of problems in AI:
Single-agent pathfinding problems. Two-player games. Constraint satisfaction problems.

Single Agent Pathfinding Problems
In these problems, in each case, we have a single problem-solver making the decisions, and the task is to find a sequence of primitive steps that take us from the initial location to the goal location. Famous examples: Rubik’s Cube (Erno Rubik, 1975). Sliding-Tile puzzle. Navigation - Travelling Salesman Problem.

Two-Player Games In a two-player game, one must consider the moves of an opponent, and the ultimate goal is a strategy that will guarantee a win whenever possible. Two-player perfect information have received the most attention of the researchers till now. But, nowadays, researchers are starting to consider more complex games, many of them involve an element of chance. The best Chess, Checkers, and Othello players in the world are computer programs!

Constraint-Satisfaction Problems
In these problems, we also have a single-agent making all the decisions, but here we are not concerned with the sequence of steps required to reach the solution, but simply the solution itself. The task is to identify a state of the problem, such that all the constraints of the problem are satisfied. Famous Examples: Eight Queens Problem. Number Partitioning.

Problem Spaces A problem space consists of a set of states of a problem and a set of operators that change the state. State : a symbolic structure that represents a single configuration of the problem in a sufficient detail to allow problem solving to proceed. Operator : a function that takes a state and maps it to another state.

Problem Spaces 8-Puzzle: Chess:
Not all operators are applicable to all states. The conditions that must be true in order for an operator to be legally applied to a state are known as the preconditions of the operator. Examples: 8-Puzzle: states: the different permutations of the tiles. operators: moving the blank tile up, down, right or left. Chess: states: the different locations of the pieces on the board. operators: legal moves according to chess rules.

[single\set of goal state(s)]  [explicit\implicit].
Problem Spaces A problem instance: consists of a problem space, an initial state, and a set of goal states. There may be a single goal state, or a set of goal states, anyone of which would satisfy the goal criteria. In addition, the goal could be stated explicitly or implicitly, by giving a rule of determining when the goal has been reached. All 4 combinations are possible: [single\set of goal state(s)]  [explicit\implicit].

Problem Spaces For Constraint Satisfaction Problems, the goal will always be represented implicitly, since an explicit description is the solution itself. Example: 4-Queens has 2 different goal states.Here the goal is stated explicitly. Q

Problem Representation
For some problems, the choice of a problem space is not so obvious. The choice of representation for a problem can have an enormous impact on the efficiency of solving the problem. There are no algorithms for problem representation. One general rule is that a smaller representation, in the sense of fewer states to search, is often better then a larger one.

Problem Representation
For example, in the 8-Queens problem, when every state is an assignment of the 8 queens on the board: The number of possibilities with all 8 queens on the board is 64 choose 8, which is over 4 billion. The solution of the problem prohibits more then one queen per row, so we may assign each queen to a separate row, now we’ll have 88 > 16 million possibilities. Same goes for not allowing 2 queens in the same column either, this reduces the space to 8!, which is only 40,320 possibilities.

Problem-Space Graphs A Problem-Space Graph is a mathematical abstraction often used to represent a problem space: The states are represented by nodes of the graph. The operators are represented by edges between nodes. Edges may be undirected or directed.

Problem-Space Graphs Example: a small part of the 8-puzzle problem-space graph:

Problem-Space Graphs In most problems spaces there is more then one path between a pair of nodes. Detecting when the same state has been regenerated via a different path requires saving all the previously generated states, and comparing newly generated states against the saved states. Many search algorithms don’t detect when a state has previously been generated. The cost of this is that any state that can be reached by 2 different paths will be represented by duplicate nodes. The benefits are memory savings and simplicity.

Branching Factor and Solution Depth
The branching factor of a node : is the number of children it has, not counting its parent if the operator is reversible. is a function of the problem space. The branching factor of a problem space : is the average number of children of the nodes in the space. The solution depth in a single-agent problem: is the length of the shortest path from the initial node to a goal node. is a function of the particular problem instance.

Eliminating Duplicate Nodes
In many cases we can reduce the size of the search tree, by eliminating some simple duplicate paths. In general, we never apply an operator and it’s inverse in succession, since no optimal path can contain such a sequence. Therefore we never list the parent of a node as one of his children. This reduces the branching factor of the problem by approximately 1.

Types of Problem Spaces
There are several types of problem spaces: State space Problem Reduction Space AND/OR Graphs

State Space The states represent situations of the problem.
The operators represent actions in the world. forward search: the root of the problem space represents the start state, and the search proceeds forward to a goal state. backward search : the root of the problem space represents the goal state, and the search proceeds backward to the initial state. For example: in Rubik’s Cube and the Sliding-Tile Puzzle, either a forward or backward search are possible.

Problem Reduction Space
In a problem reduction space, the nodes represent problems to be solved or goals to be achieved, and the edges represent the decomposition of the problem into subproblems. This is best illustrated by the example of the Towers of Hanoi problem. C A B

Problem Reduction Space
The root node, labeled “3AC” represents the original problem of transferring all 3 disks from peg A to peg C. The goal can be decomposed into three subgoals: 2AB, 1AC, 2BC. In order to achieve the goal, all 3 subgoals must be achieved. 2AB 3AC 1AC 2BC 1AB 1CB 1BA 1BC

Problem Reduction Space

Problem Reduction Space
2AB 1AC

Problem Reduction Space
2AB 1AC 1AB

Problem Reduction Space
2AB 1AC 1AB 1CB

Problem Reduction Space
2AB 1AC 1AB 1CB

Problem Reduction Space
2AB 1AC 1AB 1CB 2BC 1BA

Problem Reduction Space
2AB 1AC 1AB 1CB 2BC 1BA 1BC

Problem Reduction Space
2AB 1AC 1AB 1CB 2BC 1BA 1BC

AND/OR Graphs An AND graph consists entirely of AND nodes, and in order to solve a problem represented by it, you need to solve the problems represented by all of his children (Hanoi towers example). An OR graph consists entirely of OR nodes, and in order to solve the problem represented by it, you only need to solve the problem represented by one of his children (Eight Puzzle Tree example).

AND/OR Graphs An AND/OR graph consists of both AND nodes and OR nodes.
י"ט/ניסן/תשע"ז AND/OR Graphs An AND/OR graph consists of both AND nodes and OR nodes. One source of AND/OR graphs is a problem where the effect of an action cannot be predicted in advanced, as in an interaction with the physical world. Example: the counterfeit-coin problem.

Two-Player Game Trees The most common source of AND/OR graphs is player perfect-information games. Example: Game Tree for 5-Stone Nim: 5 4 3 2 1 OR nodes AND nodes x

Solution subgraph for AND/OR trees
In general, a solution to an AND/OR graph is a subgraph with the following properties: It contains the root node. For every OR node included in the solution subgraph, one child is included. For every OR node included in the solution subgraph, all the children are included. Every terminal node in the solution subgraph is a solved node.

Solutions The notion of a solution is different for the different problem types: For a path-finding problem, an optimal solution is a solution of lowest cost. For a CSP, if there is a cost function associated with a state of the problem, an optimal solution would again be one of lowest cost. For a 2-player game: If the solution is simply a move to be made, an optimal solution would be the best possible move that can be made in a given situation. If the solution is considered a complete strategy subgraph, then an optimal solution might be one that forces a win in the fewest number of moves in the worst case.

Combinatorial Explosion
The number of different states of the problems above is enormous, and grows extremely fast as the problem size increases. Examples for the number of different possibilities:

Combinatorial Explosion
The combinatorial explosion of the number of possible states as a function of problem size is a key characteristic that separates artificial intelligence search algorithms in other areas of computer science. Techniques that rely on storing all possibilities in memory, or even generating all possibilities, are out of the question except for the smallest of these problems. As a result, the problem-space graphs of AI problems are usually represented implicitly by specifying an initial state and a set of operators to generate new states.

Search Algorithms This course will focus on systematic search algorithms that are applicable to the different problem types, so that a central concern is their efficiency. There are 3 primary measures of efficiency of a search algorithm: The quality of the solution returned, is it optimal or not. The running time of the algorithm. The amount of memory required by the algorithm

The Next Chapters Chapter 2 : brute force searches.
Chapter 3 : heuristic search algorithms. Chapter 4 : search algorithms that run in linear space. Chapter 5 : search algorithms for the case where individual moves of a solution must be executed in the real world before a complete optimal solution can be computed. Chapter 6 : methods for deriving the heuristic function Chapter 7 : 2-player perfect-information games. Chapter 8 : analysis of alpha-beta minimax. Chapter 9 : games with more then 2 players. Chapter 10: the decision quality of minimax. Chapter 11: automatic learning of heuristic functions for 2-player games. Chapter 12: Constraint Satisfaction Problems. Chapter 13: parallel search algorithms.

Chapter 2 Brute-Force Search

Brute-Force Search The most general search algorithms are Brute-Force searches, that do not use any domain specific knowledge. It requires: a state description a set of legal operators an initial state a description of the goal state. We will assume that all edges have unit cost. To generate a node means to create the data structure corresponding to the that node. To expand a node means to generate all the children of that node.

BFS expands nodes in order of their depth from the root. Generating one level of the tree at a time. Implemented by first-in first-out (FIFO) queue. At each cycle the node at the head of the queue is removed and expanded, and its children are placed at the end of the queue.

The numbers represent the order generated by BFS 1 2 c 3 4 5 6 7 8 13 14 9 10 11 12

Solution Quality BFS continues until a goal node is generated.
Two ways to report the actual solution path: Store with each node the sequence of moves made to reach that node. Store with each node a pointer back to his parent - more memory efficient. If a goal exists in the tree BFS will find a shortest path to a goal.

Time Complexity We assume : N(b,d) - total number of nodes generated.
י"ט/ניסן/תשע"ז Time Complexity We assume : each node can be generated in constant time function of the branching factor b and the solution depth d number of nodes depends on where at level d the goal node is found. the worst case - have to generate all the nodes at level d. N(b,d) - total number of nodes generated.

Time Complexity of BFS is
י"ט/ניסן/תשע"ז Time Complexity Time Complexity of BFS is O(bd)

Space Complexity=Time Complexity= O(bd)
To report the solution we need to store all nodes generated. Example: Machine speed = 100 MHz Generated a new state in 100 Instruction 106 nodes/sec node size = 4 bytes total memory = 1GB=109 byte nodes’ capacity=109/4=250*106 After 250 sec’ = 4 minutes the memory is exhausted ! Space Complexity=Time Complexity= O(bd)

Space Complexity The previous example based on current technology.
The problem won’t go away since as memories increase in size, processors get faster and our appetite to solve larger problem grows. BFS and any algorithm that must store all the nodes are severely space-bound and will exhaust the memory in minutes.

Depth-First Search (DFS)
DFS generates next a child of the deepest node that has not been completely expanded yet. First Implementation is by last in first out (LIFO) stack. At each cycle the node at the head of the stack is removed and expanded, and its children are placed on top of the stack.

DFS - stack implementation
The numbers represent the order generated by DFS 1 2 c 3 4 9 10 5 6 13 14 7 8 11 12

Depth-First Search (DFS)
Second Implementation is recursive. The recursive function takes a node as an argument and perform DFS below that node. This function will loop through each of the node’s children and make a recursive call to perform a DFS below each of the children in turn.

DFS - recursive implementation
The numbers represent the order generated by DFS 1 8 c 2 5 9 12 3 4 13 14 6 7 10 11

Space Complexity The space complexity is linear in the maximum search depth. d is the maximum depth of search and b is the Branching Factor. Depth-first generation stores O(d) nodes. Depth-first expansion stores O(bd) nodes. DFS is time-limited rather than space-limited.

Time Complexity and Solution Quality
DFS generate the same set of nodes as BFS. However, on infinite tree DFS may not terminate. For example: Eight puzzle contain 181,440 nodes but every path is infinitely long and thus DFS will never end. Time Complexity of DFS is O(bd)

Time Complexity and Solution Quality
The solution for infinite tree is to impose an artificial Cutoff depth on the search. If the chosen cutoff depth is less than d, the algorithm won’t find a solution. If the cutoff depth is greater than d, time complexity is larger than BFS. The first solution DFS found may not be the optimal one.

Depth-First Iterative-Deepening (DFID)
Combines the best features of BFS and DFS. DFID first performs a DFS to depth one. Than starts over executing DFS to depth two. Continue to run DFS to successively greater depth until a solution is found.

Depth-First Iterative-Deepening (DFID)
The numbers represent the order generated by DFID 1,3,9 2,6,16 c 4,10 5,13 7,17 8,20 11 12 21 22 14 15 18 19

Solution Quality DFID never generates a node until all shallower nodes have already been generated. The first solution found by DFID is guaranteed to be along a shortest path.

The space complexity is only O(d)
Like DFS, at any given point DFID saving only a stack of nodes. The space complexity is only O(d)

י"ט/ניסן/תשע"ז Time Complexity DFID do not waste a great deal of time in the iterations prior to the one that finds a solution. This extra work is usually insignificant. The ratio of the number of nodes generated by DFID to those generated by BFS on a tree is: The total number of nodes generated by DFID is

Optimality of DFID Steps of proof:
Theorem 2.1 : DFID is asymptotically optimal in terms of time and space among all brute-force shortest-path algorithms on a tree with unit edge costs. Steps of proof: verify that DFID is optimal in terms of: solution quality time complexity space complexity

Optimality of DFID- Solution Quality
Since DFID generates all nodes at given level before any nodes at next deeper level, the first solution it finds is arrived at via an optimal path.

Optimality of DFID- Time Complexity
Assume the contrary that Algorithm A is: Running on Problem P. Finding a shortest path to a goal. Running less than b^d . Since its running time is less than b^d and there are b^d nodes at depth d, there must be at least one node n at depth d that A doesn’t generate when solve P.

Optimality of DFID- Time Complexity
New Problem P’. P’ identical to P except that n is the goal. A examines the same nodes in both P and P’. A doesn’t examine the node n. A fail to solve P’ since n is the only goal node. There is no Algorithm runs better than O(b^d ). Since DFID takes O(b^d ) time, its time complexity is asymptotically optimal.

Optimality of DFID- Space Complexity
There is a well-known result from C.S that: Any algorithm that takes f(n) time must use at least logf(n) space. We have already seen that any brute-force search must take at least bd time, any such algorithm must use at least log(b^d) space, witch is O(d) space. Since DFID uses O(d) space, it’s asymptotically optimal in space.

י"ט/ניסן/תשע"ז Graph with Cycles On graph with cycles BFS can be more efficient because it can detect all duplicate nodes whereas a DFS can’t. The complexity of BFS grows only as a numbers of nodes at a given depth.

Graph with Cycles The complexity of DFS depends on the numbers of paths of a given length. In a graph with a large number of very short cycles, BFS is preferable to DFS, if sufficient memory is available. In a square grid with radius r, there is O(r2) nodes and O(4r) paths.

Pruning duplicate Nodes in DFS
Eliminate the parent of each node as one of its children. Easily done with FSM. Reduce the branching factor from 4 to 3. start right up left down

Pruning duplicate Nodes in DFS
More Efficient FSM allowed sequences of moves up only or down only . And sequences of moves left only or right only. Time complexity of DFS controlled by this FSM, like BFS, is O(r2). start right left up down

Node Generation Times BFS, DFS, DFID generates asymptotically the same number of nodes on a tree. DFS, DFID are more efficient than BFS. The amount of time to generate a node is proportional to the size of the state representation. If DFS is implemented as a recursive program, a move would require only a constant time, instead of time linear in the number of tiles. This advantage of DFS, becomes increasingly significant the larger state description.

Backward Chaining/Search
The root node represent the goal state, and we could search backward until we reach the initial state. Requirements: The goal state represented explicitly. We be able to reason backwards about the operators.

Bidirectional Search Main idea:
Simultaneously search forward from the initial state and backward from the goal state, until the two search frontiers meet at a common state.

Solution Quality Bidirectional search guarantees finding a shortest path from the initial state to the goal state, if one exist. Assume that there is a solution of length d and the both searches are breadth-first. When the forward search has proceeded to depth k, its frontier will contain all nodes at depth k from the initial state.

Solution Quality When the backward search has proceeded to depth d-k, its frontier will contain all states at depth d-k from the goal state. State s reached along an optimal solution path at depth k from the initial state and at depth d-k from the goal state. The state s is in the frontier of both searches and the algorithm will find the match and return the optimal solution.

The total number of nodes generated is O(2bd/2) = O(bd/2).
Time Complexity If the two search frontiers meet in the middle, each search proceeds to depth d/2 before they meet. But this isn’t the asymptotic time complexity because we have to compare every new node with the opposite search frontier. Naively, compare each node with the all opposite search frontier cost us O(bd/2). The total number of nodes generated is O(2bd/2) = O(bd/2).

Time Complexity The time complexity of the whole algorithm becomes O(bd). More efficiently is using hash tables. In the average case: The time to do hashing and compare will be constant. the asymptotically time complexity is O(bd/2).

Space Complexity The simplest implementation of bidirectional is to use one search in BFS, and the search in other direction can be DFS such as DFID. At least one of the frontiers must be sorted in memory. The space complexity of bidirectional search is dominated by BFS search and is O(bd/2). Bidirectional search is space bound. Bidirectional search is much more time efficient than unidirectional search.