2Important ConceptsData: labeled instances, e.g. s marked spam/hamTraining setHeld out set (we will give examples today)Test setFeatures: attribute-value pairs that characterize each xExperimentation cycleLearn parameters (e.g. model probabilities) on training set(Tune hyperparameters on held-out set)Compute accuracy of test setVery important: never “peek” at the test set!EvaluationAccuracy: fraction of instances predicted correctlyOverfitting and generalizationWant a classifier which does well on test dataOverfitting: fitting the training data very closely, but not generalizing well
3General Naive Bayes A general naive Bayes model: We only specify how each feature depends on the classTotal number of parameters is linear in n
4Tuning on Held-Out Data Now we’ve got two kinds of unknownsParameters: the probabilities p(Y|X), p(Y)Hyperparameters, like the amount of smoothing to do: k,αWhere to learn?Learn parameters from training dataFor each value of the hyperparameters, train and test on the held-out dataChoose the best value and do a final test on the test data
5Today’s schedule Generative vs. Discriminative Binary Linear ClassifiersPerceptronMulti-class Linear ClassifiersMulti-class Perceptron
7Generative vs. Discriminative Generative classifiers:E.g. naive BayesA causal model with evidence variablesQuery model for causes given evidenceDiscriminative classifiers:No causal model, no Bayes rule, often no probabilities at all!Try to predict the label Y directly from XRobust, accurate with varied featuresLoosely: mistake driven rather than model driven
8Outline Generative vs. Discriminative Binary Linear Classifiers PerceptronMulti-class Linear ClassifiersMulti-class Perceptron
9Some (Simplified) Biology Very loose inspiration: human neurons
10Linear Classifiers Inputs are feature values Each feature has a weight Sum is the activationIf the activation is:Positive: output +1Negative, output -1
11Classification: Weights Binary case: compare features to a weight vectorLearning: figure out the weight vector from examples
12Linear classifiers Mini Exercise 1.Draw the 4 feature vectors and the weight vector w2.Which feature vectors are classified as +? As -?3.Draw the line separating feature vectors being classified + and -.
13Linear classifiers Mini Exercise 2---Bias Term 1.Draw the 4 feature vectors and the weight vector w2.Which feature vectors are classified as +? As -?3.Draw the line separating feature vectors being classified + and -.
14Linear classifiers Mini Exercise 3---adding features 1.Draw the 4 1-dimensional feature vectorsf(x1)=-2, f(x2)=-1, f(x3)=1, f(x4)=22.Is there a separating vector w thatclassifies x1 and x4 as +1 andclassifies x2 and x3 as -1?3. can you add one more feature to make this happen?
15Binary Decision Rule In the space of feature vectors Examples are pointsAny weight vector is a hyperplaneOne side corresponds to Y = +1Other corresponds to Y = -1
16Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron: how to find the weight vector w from data.Multi-class Linear ClassifiersMulti-class Perceptron
17Learning: Binary Perceptron Start with weights = 0For each training instance:Classify with current weightsIf correct (i.e. y=y*), no change!If wrong: adjust the weight vector by adding or subtracting the feature vector. Subtract if y* is -1.
18Outline Generative vs. Discriminative Binary Linear Classifiers PerceptronMulti-class Linear ClassifiersMulti-class Perceptron
19Multiclass Decision Rule If we have multiple classes:A weight vector for each class:Score (activation) of a class y:Prediction highest score winsBinary = multiclass where the negative class has weight zero
20Example Exercise --- Which Category is Chosen? “win the vote”
21Exercise: Multiclass linear classifier for 2 classes and binary linear classifier Consider the multiclass linear classifier for two classes withIs there an equivalent binary linear classifier, i.e., one that classifies all points x = (x1,x2) the same way?
22Outline Generative vs. Discriminative Binary Linear Classifiers PerceptronMulti-class Linear ClassifiersMulti-class Perceptron: learning the weight vectors wi from data
23Learning: Multiclass Perceptron Start with all weights = 0Pick up training examples one by onePredict with current weightsIf correct, no change!If wrong: lower score of wrong answer, raise score of right answer
24Example: Multiclass Perceptron “win the vote”“win the election”“win the game”