Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
Kin 304 Regression Linear Regression Least Sum of Squares
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Chapter 10 Simple Regression.
Linear Regression with One Regression
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Interactions in Regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900
The Basics of Regression continued
Intro to Statistics for the Behavioral Sciences PSYC 1900
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Relationships Among Variables
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Lecture 5 Correlation and Regression
Correlation & Regression
Correlation and Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Correlation and Linear Regression
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Chapter 15 Correlation and Regression
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Chapter 4-5: Analytical Solutions to OLS
Ch4 Describing Relationships Between Variables. Pressure.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
1Spring 02 First Derivatives x y x y x y dy/dx = 0 dy/dx > 0dy/dx < 0.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
General Linear Model.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Correlation & Regression Analysis
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Inference about the slope parameter and correlation
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
REGRESSION G&W p
Lecture #26 Thursday, November 17, 2016 Textbook: 14.1 and 14.3
3.1 Examples of Demand Functions
Kin 304 Regression Linear Regression Least Sum of Squares
Chapter 11 Simple Regression
BPK 304W Regression Linear Regression Least Sum of Squares
BPK 304W Correlation.
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Simple Linear Regression
Product moment correlation
Regression & Correlation (1)
Presentation transcript:

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression

Linear Regression The underlying analysis is the same as that of correlation, but with a prediction of a causal relation. The underlying analysis is the same as that of correlation, but with a prediction of a causal relation. The results will be the same, but the framework is used when we are anticipating through theory or experimentation that one variable will influence another. The results will be the same, but the framework is used when we are anticipating through theory or experimentation that one variable will influence another. The regression framework can also be extended to analyze multiple causes and to separate their unique levels of influence on a dependent variable. The regression framework can also be extended to analyze multiple causes and to separate their unique levels of influence on a dependent variable.

Breast Cancer and Solar Radiation Let’s return to our example of breast cancer rate as a function of solar radiation. Here, the direction of causality can be inferred, though without conducting an experiment, it cannot be proven.

Breast Cancer and Solar Radiation As in correlation, the regression line is the line that minimizes residuals (i.e., errors of prediction).

Fitting a Regression Line A linear function (i.e. straight line) is defined with two parameters A linear function (i.e. straight line) is defined with two parameters The intercept: a The intercept: a The predicted value of Y when X=0 The predicted value of Y when X=0 The slope: b The slope: b The change in Y associated with a one unit change in X The change in Y associated with a one unit change in X Y ‘hat’ is the predicted value of Y estimated with the regression equation. Y ‘hat’ is the predicted value of Y estimated with the regression equation.

Breast Cancer and Solar Radiation Here, the residuals are defined as: To fit the line, we want to minimize errors, but given randomly distributed errors, the sum will equal zero. So, we will minimize squared errors.

Calculating Regression Coefficients The formulas to calculate the intercept and slope are derived from criteria meant to minimize the squared residuals The formulas to calculate the intercept and slope are derived from criteria meant to minimize the squared residuals Often termed OLS regression Often termed OLS regression Ordinary Least Squares Ordinary Least Squares

What’s the predicted cancer rate for an area with solar radiation of 425?

Standardized Regression When we are working with standardized variables, both the calculations and the association with correlation become clearer. When we are working with standardized variables, both the calculations and the association with correlation become clearer. In this case, both variables are z- transformed to be distributions with means of zero and sd’s of 1. In this case, both variables are z- transformed to be distributions with means of zero and sd’s of 1.

Standardized Regression Here, the intercept and slope are referred to as alpha (  ) and beta (  ), respectively. Here, the intercept and slope are referred to as alpha (  ) and beta (  ), respectively. Note that  =0 and  must range from -1 to 1 as in correlation. Note that  =0 and  must range from -1 to 1 as in correlation. In fact,  =r In fact,  =r Note that b is in sd units. Note that b is in sd units. What does a b=.25 mean? What does a b=.25 mean? For every 1 sd change in X, the predicted Y score increases.25 sd’s. For every 1 sd change in X, the predicted Y score increases.25 sd’s.

Accuracy of Prediction Simply fitting a regression line with a given intercept and slope provides little information with respect to the accuracy of prediction. Simply fitting a regression line with a given intercept and slope provides little information with respect to the accuracy of prediction. The points could be close or far from the line. The points could be close or far from the line. Note when using standardized scores, distance from the line is a function of slope. Note when using standardized scores, distance from the line is a function of slope. We need a measure of fit that is sensitive to the magnitude of residuals. We need a measure of fit that is sensitive to the magnitude of residuals.

Standard Error of the Estimate In arriving at a measure of fit, we can begin with the idea of a standard deviation. In arriving at a measure of fit, we can begin with the idea of a standard deviation. If we did not know anything about a person’s score on X, the best guess for a score on Y would be the mean of Y. If we did not know anything about a person’s score on X, the best guess for a score on Y would be the mean of Y. Standard deviation of Y would provide a measure of accuracy of the guess. Standard deviation of Y would provide a measure of accuracy of the guess.

Standard Error of the Estimate If we want to make a prediction of Y based on a person’s X score (using the regression equation), we can now calculate deviations from the predicted value as opposed to the mean. If we want to make a prediction of Y based on a person’s X score (using the regression equation), we can now calculate deviations from the predicted value as opposed to the mean. This is the standard error of the estimate. This is the standard error of the estimate. It’s square is the error variance. It’s square is the error variance. That portion of total variance in Y not explained by scores on X. That portion of total variance in Y not explained by scores on X.

Squared Correlation Coefficient Following the preceding logic, r 2 can be interpreted as the amount of variance in Y explained by X (i.e., deviations from the mean of X) Following the preceding logic, r 2 can be interpreted as the amount of variance in Y explained by X (i.e., deviations from the mean of X) SS means the sum of squared deviations. SS means the sum of squared deviations.

Influence of Extreme Values Extreme values will bias regression coefficients in the same manner as correlation coefficients. Extreme values will bias regression coefficients in the same manner as correlation coefficients. They pull the line of best fit with inordinate strength. They pull the line of best fit with inordinate strength. Applet Applet

Hypothesis Testing in Regression The null hypothesis is simply that the slope equals zero. The null hypothesis is simply that the slope equals zero. This is equivalent to testing  =0 in correlation. This is equivalent to testing  =0 in correlation. If the correlation is significant, so must the slope be. If the correlation is significant, so must the slope be. The actual significance of the slope is tested using a t-distribution. The actual significance of the slope is tested using a t-distribution. The logic is similar to all hypothesis testing. The logic is similar to all hypothesis testing. We compare the magnitude of the slope (b) to its standard error (i.e., the variability of slopes drawn from a population where the null is true). We compare the magnitude of the slope (b) to its standard error (i.e., the variability of slopes drawn from a population where the null is true).

Hypothesis Testing in Regression The formula to calculate the t value is: The formula to calculate the t value is: We then determine how likely it would be that we found a slope as large as we did using a t distribution (similar to the normal distribution). We then determine how likely it would be that we found a slope as large as we did using a t distribution (similar to the normal distribution).