LECTURE 06: CLASSIFICATION PT. 2 February 10, 2016 SDS 293 Machine Learning.

Slides:

Advertisements

Similar presentations

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Advertisements

Continued Psy 524 Ainsworth

Linear Regression.

Unsupervised Learning

Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.

Logistic Regression.

Classification and risk prediction

GRA 6020 Multivariate Statistics; The Linear Probability model and The Logit Model (Probit) Ulf H. Olsson Professor of Statistics.

Statistical Methods Chichang Jou Tamkang University.

Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.

Linear Methods for Classification

Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.

1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.

Review of Lecture Two Linear Regression Normal Equation

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Logistic Regression- Dichotomous Dependent Variables March 21 & 23, 2011.

Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.

Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.

Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:

Logistic Regression Database Marketing Instructor: N. Kumar.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.

April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.

Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.

Logistic Regression. Linear Regression Purchases vs. Income.

Multiple Logistic Regression STAT E-150 Statistical Methods.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.

CHAPTER 10: Logistic Regression. Binary classification Two classes Y = {0,1} Goal is to learn how to correctly classify the input into one of these two.

Logistic Regression Analysis Gerrit Rooks

Machine Learning 5. Parametric Methods.

Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.

Logistic Regression Saed Sayad 1www.ismartsoft.com.

CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.

Review of statistical modeling and probability theory Alan Moses ML4bio.

LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.

LECTURE 04: LINEAR REGRESSION PT. 2 February 3, 2016 SDS 293 Machine Learning.

LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.

Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.

1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.

LECTURE 16: BEYOND LINEARITY PT. 1 March 28, 2016 SDS 293 Machine Learning.

LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.

LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.

LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.

LECTURE 09: CLASSIFICATION WRAP-UP February 24, 2016 SDS 293 Machine Learning.

LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.

LECTURE 21: SUPPORT VECTOR MACHINES PT. 2 April 13, 2016 SDS 293 Machine Learning.

LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.

Logistic Regression: Regression with a Binary Dependent Variable.

LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.

Data Modeling Patrice Koehl Department of Biological Sciences

Classification Methods

Logistic Regression APKC – STATS AFAC (2016).

Logistic Regression.

Lecture 04: Logistic Regression

Logistic Regression CSC 600: Data Mining Class 14.

ECE 5424: Introduction to Machine Learning

Scatter Plots of Data with Various Correlation Coefficients

Generally Discriminant Analysis

Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges

Classification Methods

Presentation transcript:

LECTURE 06: CLASSIFICATION PT. 2 February 10, 2016 SDS 293 Machine Learning

Outline Motivation Bayes classifier K-nearest neighbors Logistic regression - Logistic model - Estimating coefficients with maximum likelihood - Multivariate logistic regression - Multiclass logistic regression - Limitations Linear discriminant analysis (LDA) - Bayes’ theorem - LDA on one predictor - LDA on multiple predictors Comparing Classification Methods

Flashback: LR vs. KNN for regression Linear Regression Parametric  We assume an underlying functional form for f(X) Pros: - Coefficients have simple interpretations - Easy to do significance testing Cons: - Wrong about the functional form  poor performance K-Nearest Neighbors Non-parametric  No explicit assumptions about the form for f(X) Pros: - Doesn’t require knowledge about the underlying form - More flexible approach Cons: - Can accidentally “mask” the underlying function Could a parametric approach work for classification?

Example: default dataset defaultstudentbalanceincome 1No $729.52$44, NoYes$817.18$12, ⋮⋮⋮⋮⋮ 1503Yes $ $ ⋮⋮⋮⋮⋮ Want to estimate:

Example: default dataset How should we model: We could try a linear approach: Then we could just use linear regression!

Example: default dataset Yes No What’s the problem?

The problem with linear models True probability of default must be in [0,1], but our line extends beyond that range This behavior isn’t unique to this dataset Whenever we fit a straight line to data, we can always (in theory) predict arbitrarily large/small values Question: What do we need to fix it?

Logistic regression Answer: we need a function that gives outputs in [0,1] for ALL possible predictor values Lots of possible functions meet this criteria (like what?) In logistic regression, we use a logistic function: is the natural logarithm base

Example: default dataset Yes No

Wait… why a logistic model?

p(happens) p(doesn’t) “odds” “log odds” a.k.a. logit

Estimating coefficients In linear regression, we used least squares to estimate our coefficients: What’s the intuition here?

Estimating coefficients In logistic regression, we want coefficients such that yield: - values close to 1 (high probability) for observations in the class - values close to 0 (low probability) for observations not in the class Can formalize this intuition mathematically using a likelihood function: Want coefficients that maximize this function

Fun fact Charnes, Abraham, E. L. Frome, and Po-Lung Yu. "The equivalence of generalized least squares and maximum likelihood estimates in the exponential family." Journal of the American Statistical Association (1976):

Back to our default example

Making predictions

Example: default dataset Yes No

Example: default dataset defaultstudentbalanceincome 1No $729.52$44, NoYes$817.18$12, ⋮⋮⋮⋮⋮ 1503Yes $ $ ⋮⋮⋮⋮⋮ defaultstudentbalanceincome 1No $729.52$44, NoYes$817.18$12, ⋮⋮⋮⋮⋮ 1503Yes $ $ ⋮⋮⋮⋮⋮ How do we handle qualitative predictors?

Flashback: qualitative predictors ({student:“yes”}, {student:“yes”}, {student:“no”},…) ({student[yes]:1}, {student[yes]:1}, {student[yes]:0},…)

Qualitative predictors: default

Multivariate logistic regression

Multivariate logistic regression: default

What’s going on here? Students Non-students

What’s going on here? StudentsNon-students

What’s going on here? Students tend to have a higher balance than non-students Higher balances are associated with higher default rates (regardless of student status) But if we hold balance constant, a student is actually lower risk than a non-student! This phenomenon is known as “confounding”

Lab: Logistic Regression To do today’s lab in R: nothing special To do today’s lab in python: statsmodels Instructions and code: Full version can be found beginning on p. 156 of ISLR

Recap: binary classification logistic regression

What if we have multiple classes?

Multiclass logistic regression The approach we discussed today has a straightforward extension to more than two classes R will build you multiclass logistic models, no problem In reality: they’re almost never used Want to know why? Come to class tomorrow

Coming up Wednesday: Linear discriminant analysis - Bayes’ theorem - LDA on one predictor - LDA on multiple predictors A1 due tonight by 11:59pm moodle.smith.edu/mod/assign/view.php?id=90237 A2 posted, due Feb. 22 nd by 11:59pm

Backup: logit derivation