Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lirong Xia Tuesday, May 6, 2014 Review of Introduction to AI.

Similar presentations


Presentation on theme: "Lirong Xia Tuesday, May 6, 2014 Review of Introduction to AI."— Presentation transcript:

1 Lirong Xia Tuesday, May 6, 2014 Review of Introduction to AI

2 When: Tues, 5/13, 3-6pm Where: Low 4050 Same rule as midterm –open book and lecture notes –simple calculators are allowed –cannot use smartphone/laptops/wifi No Joe’s OH tomorrow 5/9 in class office hours –please bring your HW2 1 About the final exam

3 Search –uninformed search –informed search –CSP –planning –minimax, alpha-beta pruning –expectimax Probabilistic inference Machine learning 2 Outline

4 Search Problems 3 A search problem consists of: –A state space …… –A successor function (with actions, costs) –A start state and a goal test A solution is a sequence of actions (a plan) which transforms the start state to a goal state

5 Uninformed search –BFS –DFS Informed search –UCS –Best first (greedy) –A* 4 Search algorithms

6 State Graphs vs. Search Trees 5 State graphs: a representation of the search problem –each node is an abstract of the state of the world Search tree: a tool that helps us to find the solution –each node represents an entire path in the graph –tree nodes are constructed on demand and we construct as little as possible State graph Search trees

7 Never expand a node whose state has been visited Fringe can be maintained as a First-In-First-Out (FIFO) queue (class Queue in util.py) Maintain a set of visited states fringe := {node corresponding to initial state} loop: –if fringe empty, declare failure –choose and remove the top node v from fringe –check if v’s state s is a goal state; if so, declare success –if v’s state has been visited before, skip –if not, expand v, insert resulting nodes whose states have not been visited into fringe 6 Fixed BFS

8 A*: Combining UCS and Greedy 7 Uniform-cost orders by path cost Greedy orders by goal proximity, or forward cost A* search orders by the sum:

9 Admissible Heuristics 8 A heuristic is admissible (optimistic) if: where is the true cost to a nearest goal Examples: Coming up with admissible heuristics is most of what’s involved in using A* in practice

10 Consistency of Heuristics 9 Stronger than admissibility Definition: real cost cost implied by heuristic Consequences: The f value along a path never decreases

11 Standard search problems: –State is a “black box”: arbitrary data structure –Goal test: any function over states –Successor function can be anything Constraint satisfaction problems (CSPs): –A special subset of search problems –State is defined by variables with values from a domain (sometimes depends on ) –Goal test is a set of constraints specifying allowable combinations of values for subsets of variables Constraint Satisfaction Problems 10

12 Binary CSP: each constraint relates (at most) two variables Binary constraint graph: nodes are variables, arcs show constraints General-purpose CSP algorithms use the graph structure to speed up search. E.g., Tasmania is an independent subproblem! Constraint Graphs 11

13 A special search problem –constraints presented by a graph Backtracking search –DFS with fixed order, choose one value in every step Improvements of backtracking search 12 CSP algorithms

14 Arc Consistency of a CSP 13 A simple form of propagation makes sure all arcs are consistent: If V loses a value, neighbors of V need to be rechecked! Arc consistency detects failure earlier than forward checking Can be run as a preprocessor or after each assignment Might be time-consuming Delete from tail! X XX

15 General-purpose ideas give huge gains in speed Ordering: –Minimum remaining values (MRV) –least constraining value Filtering: Can we detect inevitable failure early? –forward checking search Structure of the problem –constraint graph is a tree Improving Backtracking 14

16 STRIPS language –state of the world: conjunction of positive, ground, function-free literals Action –Preconditions: a set of activating literals –Effects: updates of active literals 15 Planning problems

17 Blocks world Start: On(B, A), On(A, Table), On(D, C), On(C, Table), Clear(B), Clear(D) Move(x,y,z) –Preconditions: On(x,y), Clear(x), Clear(z) –Effects: On(x,z), Clear(y), NOT(On(x,y)), NOT(Clear(z)) MoveToTable(x,y) –Preconditions: On(x,y), Clear(x) –Effects: On(x,Table), Clear(y), NOT(On(x,y)) A B C D

18 Blocks world example Goal: On(A,B) AND Clear(A) AND On(C,D) AND Clear(C) A plan: MoveToTable(B, A), MoveToTable(D, C), Move(C, Table, D), Move(A, Table, B) A B C D

19 Adversarial Games 18 Deterministic, zero-sum games: –Tic-tac-toe, chess, checkers –The MAX player maximizes result –The MIN player minimizes result Minimax search: –A search tree –Players alternate turns –Each node has a minimax value: best achievable utility against a rational adversary

20 Alpha-Beta Pruning 19 General configuration –We’re computing the MIN-VALUE at n –We’re looping over n’s children –n’s value estimate is dropping – α is the best value that MAX can get at any choice point along the current path –If n becomes worse than α, MAX will avoid it, so can stop considering n’s other children –Define β similarly for MIN – α is usually smaller than β Once α >= β, return to the upper layer

21 Expectimax Search Trees 20 Expectimax search –Max nodes (we) as in minimax search –Chance nodes Need to compute chance node values as expected utilities

22 Search Probabilistic inference –Bayesian networks probability representation conditional independence (d-separation) inference (variable elimination) –Markov decision process value iteration policy iteration –Hidden Markov models filtering Machine learning 21 Outline

23 Bayesian networks 22 Definition of Bayesian network (Bayes’ net or BN) A set of nodes, one per variable X A directed, acyclic graph A conditional distribution for each node –A collection of distributions over X, one for each combination of parents’ values p(X| a 1,…, a n ) –CPT: conditional probability table A Bayesian network = Topology (graph) + Local Conditional Probabilities

24 Probabilities in BNs 23 Bayesian networks implicitly encode joint distributions –As a product of local conditional distributions –Example: This lets us reconstruct any entry of the full joint Not every BN can represent every joint distribution –The topology enforces certain conditional independencies

25 Reachability (D-Separation) 24 Question: are X and Y conditionally independent given evidence vars {Z}? –Yes, if X and Y “separated” by Z –Look for active paths from X to Y –No active paths = independence! A path is active if each triple is active: –Causal chain where B is unobserved (either direction) –Common cause where B is unobserved –Common effect where B or one of its descendents is observed All it takes to block a path is a single inactive segment

26 Variable elimination From the factor Σ n p(n|+R)p(+D|n,g) we sum out n to obtain a factor only depending on g [Σ n p(n|+R)p(+D|n,+G)] = p(+N|+R)P(+D|+N,+G) + p(-N|+R)p(+D|-N,+G) =.3*.9+.7*.5 =.62 [Σ n p(n|+R)p(+D|n,-G)] = p(+N|+R)p(+D|+N,-G) + p(-N|+R)p(+D|-N,-G) =.3*.4+.7*.3 =.33 Continuing to the left, g will be summed out next, etc. (continued on board) Rained Sprinklers were on Grass wet Dog wet Neighbor walked dog p(+R) =.2 p(+N|+R) =.3 p(+N|-R) =.4 p(+S) =.6 p(+G|+R,+S) =.9 p(+G|+R,-S) =.7 p(+G|-R,+S) =.8 p(+G|-R,-S) =.2 p(+D|+N,+G) =.9 p(+D|+N,-G) =.4 p(+D|-N,+G) =.5 p(+D|-N,-G) =.3

27 Markov Decision Processes 26 An MDP is defined by: –A set of states s ∈ S –A set of actions a ∈ A –A transition function T( s, a, s ’) Prob that a from s leads to s ’ i.e., p( s ’| s, a ) sometimes called the model – A reward function R( s, a, s ’) Sometimes just R( s ) or R( s ’) –A start state (or distribution) –Maybe a terminal state MDPs are a family of nondeterministic search problems –Reinforcement learning (next class): MDPs where we don’t know the transition or reward functions

28 Defining MDPs 27 Markov decision processes: –States S –Start state s 0 –Actions A –Transition p(s’|s,a) (or T(s,a,s’)) –Reward R(s,a,s’) (and discount ) MDP quantities so far: –Policy = Choice of action for each (MAX) state –Utility (or return) = sum of discounted rewards

29 The Bellman Equations 28 Definition of “optimal utility” leads to a simple one-step lookahead relationship amongst optimal utility values: Optimal rewards = maximize over first and then follow optimal policy Formally:

30 Value Iteration 29 Idea: –Start with V 1 (s) = 0 –Given V i, calculate the values for all states for depth i+1: –Repeat until converge –Use V i as evaluation function when computing V i+1

31 Policy Iteration 30 Alternative approach: –Step 1: policy evaluation: calculate utilities for some fixed policy (not optimal utilities!) –Step 2: policy improvement: update policy using one-step look-ahead with resulting converged (but not optimal!) utilities as future values –Repeat steps until policy converges

32 Markov Models 31 A Markov model is a chain-structured BN –Conditional probabilities are the same (stationarity) –Value of X at a given time is called the state –As a BN: –Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs) p(X 1 ) p(X|X -1 )

33 Hidden Markov Models 32 Markov chains not so useful for most agents –Eventually you don’t know anything anymore –Need observations to update your beliefs Hidden Markov models (HMMs) –Underlying Markov chain over state X –You observe outputs (effects) at each time step –As a Bayes’ net:

34 HMM weather example: Filtering s c r You have been stuck in the lab for three days (!) On those days, your labmate was dry, wet, wet, respectively What is the probability that it is now raining outside? p(X 3 = r | E 1 = d, E 2 = w, E 3 = w) p(w|s) =.1 p(w|c) =.3 p(w|r) =.8

35 The forward algorithm –Elapse of time compute p(X t+1 |X t,e 1:t ) from p(X t |e 1:t ) –Observe compute p(X t+1 |e 1:t+1 ) from p(X t+1 |e 1:t ) –Renormalization 34 Formal algorithm for filtering

36 Elapse of time B’(X t )= Σ x t-1 p(X t |x t-1 )B(x t-1 ) Observe B(X t ) ∝ p(e t |X t )B’(X t ) Renormalize B(x t ) sum up to 1 35 Forward algorithm vs. particle filtering Forward algorithm Particle filtering Elapse of time x--->x’ Observe w(x’)=p(e t |x) Resample resample N particles

37 Search Probabilistic inference Machine learning –supervised learning Parametric –generative: Naïve Bayes –discriminative method: perceptrons and MIRA Non-parametric: K-NN –unsupervised learning k-means –reinforcement learning Q-learning 36 Outline

38 Important Concepts 37 Data: labeled instances, e.g. s marked spam/ham –Training set –Held out set (we will give examples today) –Test set Features: attribute-value pairs that characterize each x Experimentation cycle –Learn parameters (e.g. model probabilities) on training set –(Tune hyperparameters on held-out set) –Compute accuracy of test set –Very important: never “peek” at the test set! Evaluation –Accuracy: fraction of instances predicted correctly Overfitting and generalization –Want a classifier which does well on test data –Overfitting: fitting the training data very closely, but not generalizing well

39 General Naive Bayes 38 A general naive Bayes model: We only specify how each feature depends on the class Total number of parameters is linear in n

40 Estimation: Laplace Smoothing 39 Laplace’s estimate (extended): –Pretend you saw every outcome k extra times –What’s Laplace with k=0? –k is the strength of the prior Laplace for conditionals: –Smooth each condition independently:

41 Generative vs. Discriminative 40 Generative classifiers: –E.g. naive Bayes –A causal model with evidence variables –Query model for causes given evidence Discriminative classifiers: –No causal model, no Bayes rule, often no probabilities at all! –Try to predict the label Y directly from X –Robust, accurate with varied features –Loosely: mistake driven rather than model driven

42 Linear Classifiers (perceptrons) 41 Inputs are feature values Each feature has a weight Sum is the activation If the activation is: –Positive: output +1 –Negative, output -1

43 Learning: Multiclass Perceptron 42 Start with all weights = 0 Pick up training examples one by one Predict with current weights If correct, no change! If wrong: lower score of wrong answer, raise score of right answer

44 MIRA 43 Idea: adjust the weight update to mitigate these effects MIRA*: choose an update size that fixes the current mistake *Margin Infused Relaxed Algorithm

45 Parametric / Non-parametric 44 Parametric models: –Fixed set of parameters –More data means better settings Non-parametric models: –Complexity of the classifier increases with data –Better in the limit, often worse in the non-limit (K)NN is non-parametric

46 K-Means 45 An iterative clustering algorithm –Pick K random points as cluster centers (means) –Alternate: Assign data instances to closest mean Assign each mean to the average of its assigned points –Stop when no points’ assignments change

47 K-Means as Optimization 46 Consider the total distance to the means: Each iteration reduces phi Two states each iteration: –Update assignments: fix means c, change assignments a –Update means: fix assignments a, change means c points assignments means

48 Similar to MDP Don’t know T and/or R, but can observe R –Learn by doing –can have multiple episodes (trials) 47 Reinforcement learning

49 MDPs vs. RL 48 Things we know how to do: If we know the MDP –Compute V*, Q*, π* exactly –Evaluate a fixed policy π If we don’t know the MDP –If we can estimate the MDP then solve –We can estimate V for a fixed policy π –We can estimate Q*(s,a) for the optimal policy while executing an exploration policy Techniques: Computation Value and policy iteration Policy evaluation Model-based RL sampling Model-free RL: Q-learning

50 Q-Learning 49 Q-Learning: sample-based Q-value iteration Learn Q*(s,a) values –Receive a sample (s,a,s’,R) –Consider your old estimate: Q(s,a) –Consider your new sample estimate: –Incorporate the new estimate into a running average

51 Exploration / Exploitation 50 Several schemes for forcing exploration –Simplest: random actions (ε greedy) Every time step, flip a coin With probability ε, act randomly With probability 1-ε, act according to current policy –Problems with random actions? You do explore the space, but keep thrashing around once learning is done One solution: lower ε over time


Download ppt "Lirong Xia Tuesday, May 6, 2014 Review of Introduction to AI."

Similar presentations


Ads by Google