Decision making in episodic environments We have just looked at decision making in sequential environments Now let’s consider the “easier” problem of episodic environments The agent gets a series of unrelated problem instances and has to make some decision or inference about each of them This is what most of “machine learning” is about
Example: Image classification input desired output apple pear tomato cow dog horse
Example: Spam Filter
Example: Seismic data Earthquakes Surface wave magnitude Nuclear explosions Body wave magnitude
The basic classification framework y = f(x) Learning: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the parameters of the prediction function f Inference: apply f to a never before seen test example x and output the predicted value y = f(x) output classification function input
Example: Training and testing Training set (labels known) Test set (labels unknown) Key challenge: generalization to unseen examples
Naïve Bayes classifier A single dimension or attribute of x
Decision tree classifier Example problem: decide whether to wait for a table at a restaurant, based on the following attributes: Alternate: is there an alternative restaurant nearby? Bar: is there a comfortable bar area to wait in? Fri/Sat: is today Friday or Saturday? Hungry: are we hungry? Patrons: number of people in the restaurant (None, Some, Full) Price: price range ($, $$, $$$) Raining: is it raining outside? Reservation: have we made a reservation? Type: kind of restaurant (French, Italian, Thai, Burger) WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
Decision tree classifier
Decision tree classifier
Nearest neighbor classifier Training examples from class 2 Training examples from class 1 Test example f(x) = label of the training example nearest to x All we need is a distance function for our inputs No training required!
Linear classifier Find a linear function to separate the classes f(x) = sgn(w1x1 + w2x2 + … + wDxD) = sgn(w x)
Perceptron . Input Weights x1 w1 x2 w2 Output: sgn(wx + b) x3 w3 wD xD
Linear separability
Multi-Layer Neural Network Can learn nonlinear functions Training: find network weights to minimize the error between true and estimated labels of training examples: Minimization can be done by gradient descent provided f is differentiable This training method is called back-propagation
Differentiable perceptron Input Weights x1 w1 x2 w2 Output: (wx + b) x3 w3 . Sigmoid function: wd xd
Review: Types of classifiers Naïve Bayes Decision tree Nearest neighbor Linear classifier Nonlinear classifier