Two Player Games Competitive rather than cooperative – One player loses, one player wins Zero sum game – One player wins what the other one loses – See game theory for the mathematics Getting an agent to play a game – Boils down to how it plays each move – Express this as a search problem Cannot backtrack once a move has been made (episodic)
(Our) Basis of Game Playing: Search for best move every time Initial Board State Board State 2 Board State 3 Board State 4 Board State 5 Search for Opponent Move 1 Moves Search for Opponent Move 3 Moves
Lookahead Search If I played this move – Then they might play that move Then I could do that move – And they would probably do that move – Or they might play that move Then I could do that move – And they would play that move Or I could play that move – And they would do that move If I played this move…
Lookahead Search (best moves) If I played this move – Then their best move would be Then my best move would be – Then their best move would be – Or another good move for them is… Then my best move would be – Etc.
Minimax Search Like children sharing a cake Underlying assumption – Opponent acts rationally Each player moves in such a way as to – Maximise their final winnings, minimise their losses – i.e., play the best move at the time Method: – Calculate the guaranteed final scores for each move Assuming the opponent will try to minimise that score – Choose move that maximises this guaranteed score
Example Trivial Game Deal four playing cards out, face up Player 1 chooses one, player 2 chooses one – Player 1 chooses another, player 2 chooses another And the winner is…. – Add the cards up – The player with the highest even number Scores that amount (in pounds sterling from opponent)
For Trivial Games Draw the entire search space Put the scores associated with each final board state at the ends of the paths Move the scores from the ends of the paths to the starts of the paths – Whenever there is a choice use minimax assumption – This guarantees the scores you can get Choose the path with the best score at the top – Take the first move on this path as the next move
Entire Search Space
Moving the scores from the bottom to the top
Moving a score when there’s a choice Use minimax assumption – Rational choice for the player below the number you’re moving
Choosing the best move
For Real Games Search space is too large – So we cannot draw (search) the entire space For example: chess has branching factor of ~35 – Suppose our agent searches 1000 board states per second – And has a time limit of 150 seconds So can search 150,000 positions per move – This is only three or four ply look ahead Because 35 3 = 42,875 and 35 4 = 1,500,625 – Average humans can look ahead six-eight ply
Cutoff Search Must use a heuristic search Use an evaluation function – Estimate the guaranteed score from a board state Draw search space to a certain depth – Depth chosen to limit the time taken Put the estimated values at the end of paths Propagate them to the top as before Question: – Is this a uniform path cost, greedy or A* search?
Evaluation Functions Must be able to differentiate between – Good and bad board states – Exact values not important – Ideally, the function would return the true score For goal states Example in chess – Weighted linear function – Weights: Pawn=1, knight=bishop=3, rook=5, queen=9
Example Chess Score Black has: – 5 pawns, 1 bishop, 2 rooks Score = 1*(5)+3*(1)+5*(2) = = 18 White has: – 5 pawns, 1 rook Score = 1*(5)+5*(1) = = 10 Overall scores for this board state: black = = 8 white = = -8
Evaluation Function for our Game Evaluation after the first move – Count zero if it’s odd, take the number if its even Evaluation function here would choose 10 – But this would be disastrous for the player
Problems with Evaluation Functions Horizon problem – Agent cannot see far enough into search space Potentially disastrous board position after seemingly good one Possible solution – Reduce the number of initial moves to look at Allows you to look further into the search space Non-quiescent search – Exhibits big swings in the evaluation function – E.g., when taking pieces in chess – Solution: advance search past non-quiescent part
Pruning Want to visit as many board states as possible – Want to avoid whole branches (prune them) Because they can’t possibly lead to a good score – Example: having your queen taken in chess (Queen sacrifices often very good tactic, though) Alpha-beta pruning – Can be used for entire search or cutoff search – Recognize that a branch cannot produce better score Than a node you have already evaluated
Alpha-Beta Pruning for Player 1 1.Given a node N which can be chosen by player one, then if there is another node, X, along any path, such that (a) X can be chosen by player two (b) X is on a higher level than N and (c) X has been shown to guarantee a worse score for player one than N, then the parent of N can be pruned. 2. Given a node N which can be chosen by player two, then if there is a node X along any path such that (a) player one can choose X (b) X is on a higher level than N and (c) X has been shown to guarantee a better score for player one than N, then the parent of N can be pruned.
Example of Alpha-Beta Pruning Prune player 1 player 2 Depth first search a good idea here – See notes for explanation
Games with Chance Many more interesting games – Have an element of chance – Brought in by throwing a die, tossing a coin Example: backgammon – See Gerry Tesauro’s TD-Gammon program In these cases – We can no longer calculate guaranteed scores – We can only calculate expected scores Using probability to guide us
Expectimax Search Going to draw tree and move values as before Whenever there is a random event – Add an extra node for each possible outcome which will change the board states possible after the event – E.g., six extra nodes if each roll of die affects state Work out all possible board states from chance node When moving score values up through a chance node – Multiply the value by the probability of the event happening Add together all the multiplicands – Gives you expected value coming through the chance node
More interesting (but still trivial) game Deal four cards face up Player 1 chooses a card Player 2 throws a die – If it’s a six, player 2 chooses a card, swaps it with player 1’s and keeps player 1’s card – If it’s not a six, player 2 just chooses a card Player 1 chooses next card Player 2 takes the last card
Games Played by Computer Games played perfectly: – Connect four, noughts & crosses (tic-tac-toe) – Best move pre-calculated for each board state Small number of possible board states Games played well: – Chess, draughts (checkers), backgammon – Scrabble, tetris (using ANNs) Games played badly: – Go, bridge, soccer
Philosophical Questions Q1. Is how computers plays chess – More fundamental than how people play chess? In science, simple & effective techniques are valued – Minimax cutoff search is simple and effective – But this is seen by some as stupid and “non-AI” Drew McDermott: – "Saying Deep Blue doesn't really think about chess is like saying an airplane doesn't really fly because it doesn't flap its wings” Q2. If aliens came to Earth and challenged us to chess… – Would you send Deep Blue or Kasparov into battle?
Introduction Robots are physical agents that perform tasks by manipulating the physical world. Robots are also equipped with sensors, and enviromment including cameras and ultrasound measure the environment, and gyroscopes and accelerometers to measure the robot's own motion. 31
Introduction Robot categories : Manipulators or robot arms Mobile robot : using wheels, legs, or similar mechanisms; unmanned lalnd velhicle (ULV), unmanned air vehicles (UAV), autonomous underwater vehicles (AUV), Humanoid robot (Hybrid): a mobile robot equipped with manipulators 32
Introduction 33 (a) NASA's Sojourner, a mobile robot that explored the surface of Mars in July (b) Honda's P3 and Asimo humanoid robots.
Robot Hardware The success of real robots depends at least as much on the design of sensors and effectors that are appropriate for the task. 1. Sensors are the perceptual interface between robots and their environments. Passive sensors such as cameras Active sensors such as sonar Range finders imaging sensors proprioceptive sensors Inertial sensors, such as gyroscopes 34
Robot Hardware Effectors are the means by which robots move and change the shape of their bodies. dynamically stable meaning that it can remain upright while hopping around. statically stable A robot can remain upright without moving its legs. 35
Robot Hardware 36 (a) The Stanford Manipulator, an early robot arm with five revolute joiints (R) and one prismatic joint (P), for a total of six degrees of freedom. (b) Motion of a nonholonomic four-wheeled vehicle with front-wheel steering.
Robot Hardware 37 (a) The SICK LMS laser range scanner; a popular range sensor for mobile robots. (b) Range scan obtained with a horizontally mounted sensor, projected onto a two dimensional environment map.
Robotic Perception Perception is the process by which robots map sensor measurements into internal representations of the environment. Perception is difficult because in general the sensors are noisy, and the environment is partially observable, unpredictable, and often dynamic. 38
Robotic Perception 1.Localization is a generic example of robot perception. It is the problem of determining where things are. Example, robot manipulators must know the location of objects they manipulate. 2.Mapping : the robot mapping problem is often referred to as simultaneous localization and mapping, abbreviated as SLAM. 39
Robotic Perception 40 the robot's positional uncertainty is increasing, as is its uncertainty about the landmarks it encounters. stages during which the robot encounters new landmarks, which are mapped with increasing uncertainty.
Planning to Move decisions ultimately involve motion of effectors. The point-to-point motion problem is to deliver the robot or its end-effector to a designated target location. The path planning problem is to find a path from one configuration to another in configuration space. 41
Planning to Move 42 (a) Workspace representation of a robot arm with 2 DOFs. The workspace is a box with a flat obstacle hanging from the ceiling. (b) Configuration space of the same robot. Only white regions in the space are configurations that are free of collisions. The dot in this diagram corresponds to the configuration of the robot shown on the left.
Planning to Move 1.Cell decomposition methods : first approach to path planning uses cell decomposition- 2.Skeletonization methods : path-planning algorithms is based on the idea of skeletonization. 43
Planning uncertain movements None of the robot motion planning algorithms discussed thus far addresses a key characteristic of robotics problems: uncertain. The planning of robot motion is usually done in configuration space, where each point specifies the location and orientation of the robot and its joint angles. Configuration spaces search algorithms include cell decomposition techniques, which decompose the space of all configurations into finitely many cells, and skeletonization techniques, which project configuration spaces onto lower- dimensional manifolds. The motion planning problem is then solved using search in these simpler structures. 44
Moving Dynamics and control : dynamic state, which extends the kinematic state of a robot by modeling a robot's velocities. For example, in addition to the angle of a robot joint, the dynamic state also captures the rate of change of the angle. – A common technique to compensate for the limitations of kinematic plans is to use a separate mechanism, a controller, for keeping the robot on track. Potential field control : We introduced potential fields as an additional cost f~~nctioinn robot motion planning, but they can also be used for generating robot motion directly, dispensing with the path planning phase altogether. 45
Moving 46 Potential field control. The robot ascends a potential field composed of repelling forces asserted from the obstacles, and an attracting force that corresponds to the target configuration. (a) Successful path. (b) Local optimum. (a)(b)
Moving Reactive control : we have consider control decisions that require some model of the environment for constructing either a reference path or a potential field. – First, models that are sufficiently accurate are often difficult to obtain, especially in complex or remote environments, such as the surface of Mars. – Second, even in cases where we can devise a model with sufficient accuracy, computational difficulties and localization error might render these techniques impractical. 47
Moving 48 Retract, left higher Set down (a) (b) (a) A hexapod robot (b) An augmented finite state machine (AFSM) for the control of a single leg. Notice that this AFSM reacts to sensor feedback: if a leg is stuck during the forward swinging phase, it will be lifted increasingly higher.
Moving 49 Robot arm control using (a) proportional control with gain factor 1.0, (b) proportional control with gain factor 0.1, and (c) PD control with gain factors 0.3 for the proportional and 0.8 for the differential component. In all cases the robot arm tries to follow the path shown in gray. (a)(b)(c)
Robotic Software Architecture A methodology for structuring algorithms An architecture usually includes languages and tools for writing programs. Subsumption architecture (Brooks, 1986) is a framework for assembling reactive controllers out of finite state machines. Three-layer architecture : Reactive layer provides low-level control to the robot. Executive layer (or sequencing layer) serves as the glue between the reactive and the deliberate layer. Deliberate layer generates global solutions to complex tasks using planning. 50
Robotic Software Architecture Robotic programming languages : Behavior language defined by Brooks (1990). This language is a rule-based real-time control language that compiles into AFSM controllers. Generic robot language, or GRL (Horswill, 2000). GRL is a functional programming language for programming large modular control systems such as C. GOLOG(L evesque et al., 1997b) ALisp (Andre and Russell, 2002) is an extension of Lisp. ALisp allows programmers to specify nondeterministic choice points, similar to the choice points in GOLOG. 51
Application Domains Industry and Agriculture Transportation Hazardous environments Exploration Health care 52
Other Lecture e=PlayList&v=0yD3uBshJB0&list=PL65CC038 4A1798ADF 53