Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Similar presentations


Presentation on theme: "Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith."— Presentation transcript:

1 Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

2 Using Data Action Model Data estimation; regression; learning; training classification; decision pattern classification machine learning statistical inference...

3 Probabilistic Models Let X and Y be random variables. (continuous, discrete, structured,...) Goal: predict Y from X. A model defines P(Y = y | X = x). 1. Where do models come from? 2. If we have a model, how do we use it?

4 Using a Model We want to classify a message, x, as spam or mail: y ε {spam, mail}. Model x P(spam | x) P(mail | x)

5 Bayes’ Rule what we said the model must define likelihood: one distribution over complex observations per y prior normalizes into a distribution:

6 Naive Bayes Models Suppose X = (X 1, X 2, X 3,..., X m ). Let

7 Naive Bayes: Graphical Model Y X1X1 X2X2 X3X3 XmXm...

8 Part II Where do the model parameters come from?

9 Using Data Action Model Data estimation; regression; learning; training

10 Warning This is a HUGE topic. We will barely scratch the surface.

11 Forms of Models Recall that a model defines P(x | y) and P(y). These can have a simple multinomial form, like P(mail) = 0.545, P(spam) = 0.455 Or they can take on some other form, like a binomial, Gaussian, etc.

12 Example: Gaussian Suppose y is {male, female}, and one observed variable is H, height. P(H | male) ~ N (μ m, σ m 2 ) P(H | female) ~ N (μ f, σ f 2 ) How to estimate μ m, σ m 2, μ f, σ f 2 ?

13 Maximum Likelihood Pick the model that makes the data as likely as possible max P(data | model)

14 Maximum Likelihood (Gaussian) Estimating the parameters μ m, σ m 2, μ f, σ f 2 can be seen as  fitting the data  estimating an underlying statistic (point estimate)

15 Using the model

16

17 Example: Regression Suppose y is actual runtime, and x is input length. Regression tries to predict some continuous variables from others.

18 Regression Linear: assume linear relationship, fit a line. We can turn this into a model!

19 Linear Model Given x, predict y. y = β 1 x + β 0 + N (0, σ 2 ) true regression line random deviation

20 Principle of Least Squares Minimize the sum of squared vertical deviations. Unique, closed form solution! vertical deviation

21 Other kinds of regression transform one or both variables (e.g., take a log) polynomial regression  (least squares → linear system) multivariate regression logistic regression

22 Example: text categorization Bag-of-words model:  x is a histogram of counts for all words   y is a topic

23 MLE for Multinomials “Count and Normalize”

24 The Truth about MLE You will never see all the words. For many models, MLE isn’t safe. To understand why, consider a typical evaluation scenario.

25 Evaluation Train your model on some data. How good is the model? Test on different data that the system never saw before.  Why?

26 Tradeoff overfits the training data low variance doesn’t generalizelow accuracy

27 Text categorization again Suppose ‘v1@gra’ never appeared in any document in training, ever. What is the above probability for a new document containing ‘v1@gra’ at test time?

28 Solutions Regularization  Prefer less extreme parameters Smoothing  “Flatten out” the distribution Bayesian Estimation  Construct a prior over model parameters, then train to maximize P(data | model) × P(model)

29 One More Point Building models is not the only way to be empirical.  Neural networks, SVMs, instance- based learning MLE and smoothed/Bayesian estimation are not the only ways to estimate.  Minimize error, for example (“discriminative” estimation)

30 Assignment 3 Spam detection We provide a few thousand examples Perform EDA and pick features Estimate probabilities Build a Naive-Bayes classifier


Download ppt "Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith."

Similar presentations


Ads by Google