Introduction to Linear and Logistic Regression. Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals.

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Correlation and regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Ch11 Curve Fitting Dr. Deshi Ye
Regression Basics Predicting a DV with a Single IV.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Correlation 2 Computations, and the best fitting line.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Statistics Psych 231: Research Methods in Psychology.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
REGRESSION AND CORRELATION
Ch. 14: The Multiple Regression Model building
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Multiple Regression Research Methods and Statistics.
Correlation and Regression Analysis
Relationships Among Variables
Least Squares Regression Line (LSRL)
Lecture 15 Basics of Regression Analysis
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
CORRELATION & REGRESSION
Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Regression. Population Covariance and Correlation.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Correlation & Regression Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Nonparametric Statistics
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Quiz.
Nonparametric Statistics
The simple linear regression model and parameter estimation
Chapter 8 Linear Regression.
Topics
Chapter 14 Introduction to Multiple Regression
Regression Analysis AGEC 784.
Part 5 - Chapter
Kin 304 Regression Linear Regression Least Sum of Squares
BPK 304W Regression Linear Regression Least Sum of Squares
Regression Analysis Week 4.
Nonparametric Statistics
Linear regression Fitting a straight line to observations.
Product moment correlation
Multivariate Analysis Regression
Ch 4.1 & 4.2 Two dimensions concept
Presentation transcript:

Introduction to Linear and Logistic Regression

Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals Curve Fitting Logistic Regression Odds and Probability

Basic Ideas Jargon IV = X = Predictor (pl. predictors) DV = Y = Criterion (pl. criteria) Regression of Y on X Linear Model = relations between IV and DV represented by straight line. A score on Y has 2 parts – (1) linear function of X and (2) error. (population values)

Basic Ideas (2) Sample value: Intercept – place where X=0 Slope – change in Y if X changes 1 unit If error is removed, we have a predicted value for each person at X (the line): Suppose on average houses are worth about Euro a square meter. Then the equation relating price to size would be Y’=0+50X. The predicted price for a 2000 square meter house would be 250,000 Euro

Linear Transformation 1 to 1 mapping of variables via line Permissible operations are addition and multiplication (interval data) Add a constantMultiply by a constant

Linear Transformation (2) Centigrade to Fahrenheit Note 1 to 1 map Intercept? Slope? Degrees C Degrees F 32 degrees F, 0 degrees C 212 degrees F, 100 degrees C Intercept is 32. When X (Cent) is 0, Y (Fahr) is 32. Slope is 1.8. When Cent goes from 0 to 100 (run), Fahr goes from 32 to 212 (rise), and = 180. Then 180/100 =1.8 is rise over run is the slope. Y = X. F=32+1.8C.

Standard Deviation and Variance Square root of the variance, which is the sum of squared distances between each value and the mean divided by population size (finite population) Example 1,2,15 Mean=6  =6.37

Correlation Analysis Correlation coefficient (also called Pearson’s product moment coefficient) If r X,Y > 0, X and Y are positively correlated (X ’ s values increase as Y ’ s). The higher, the stronger correlation. r X,Y = 0: independent; r X,Y < 0: negatively correlated

Regression of Weight on Height HtWt N=10 mean=67mean=150  =4.57  = Correlation (r) =.94. Regression equation: Y’= X

Predicted Values & Residuals NHtWtY'RS mean  V Numbers for linear part and error. Y’ is called the predicted value Y-Y’ the residual (RS) The residual is the error Mean of Y’ and Y is the same Variance of Y is equal to the variance Y’ + RS

Finding the Regression Line Need to know the correlation, standard deviation and means of X and Y To find the intercept, use: Suppose r XY =.50,  X =.5, mean X = 10,  Y = 2, mean Y = 5. Slope Intercept

Line of Least Squares Assume linear relations is reasonable, so the 2 variables can be represented by a line. Where should the line go? Place the line so errors (residuals) are small The line we calculate has a sum of errors = 0 It has a sum of squared errors that are as small as possible; the line provides the smallest sum of squared errors or least squares

Minimize sum of the quadratic residuals Derivation equal 0

The coefficients a and b are found by solving the following system of linear equations

Curve Fitting Linear Regression Exponential Curve Logarithmic Curve Power Curve

The coefficients a and b are found by solving the following system of linear equations

with Linear Regression Exponential Curve Logarithmic Curve Power Curve

Multiple Linear Regression The coefficients a, b and c are found by solving the following system of linear equations

Polynomial Regression The coefficients a, b and c are found by solving the following system of linear equations

Logistic Regression Variable is binary (a categorical variable that has two values such as "yes" and "no") rather than continuous binary DV (Y) either 0 or 1 For example, we might code a successfully kicked field goal as 1 and a missed field goal as 0 or we might code yes as 1 and no as 0 or admitted as 1 and rejected as 0 or Cherry Garcia flavor ice cream as 1 and all other flavors as zero.

If we code like this, then the mean of the distribution is equal to the proportion of 1s in the distribution. For example if there are 100 people in the distribution and 30 of them are coded 1, then the mean of the distribution is.30, which is the proportion of 1s The mean of a binary distribution so coded is denoted as P, the proportion of 1s The proportion of zeros is (1-P), which is sometimes denoted as Q The variance of such a distribution is PQ, and the standard deviation is Sqrt(PQ)

Suppose we want to predict whether someone is male or female (DV, M=1, F=0) using height in inches (IV) We could plot the relations between the two variables as we customarily do in regression. The plot might look something like this

None of the observations (data points) fall on the regression line They are all zero or one

Predicted values (DV=Y)correspond to probabilities If linear regression is used, the predicted values will become greater than one and less than zero if one moves far enough on the X- axis Such values are theoretically inadmissible

Linear vs. Logistic regression

Odds and Probability Linear regression!

Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals Curve Fitting Logistic Regression Odds and Probability