1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Regression and correlation methods
Introduction to Predictive Modeling with Examples Nationwide Insurance Company, November 2 D. A. Dickey.
Chapter 12 Simple Linear Regression
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
Chapter 12 Simple Linear Regression
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Multiple regression analysis
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
REGRESSION AND CORRELATION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Linear Regression/Correlation
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Correlation and Linear Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
1 Experimental Statistics - week 7 Chapter 15: Factorial Models (15.5) Chapter 17: Random Effects Models.
1 Experimental Statistics - week 6 Chapter 15: Randomized Complete Block Design (15.3) Factorial Models (15.5)
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
STAT 1301 Chapter 8 Scatter Plots, Correlation. For Regression Unit You Should Know n How to plot points n Equation of a line Y = mX + b m = slope b =
Chapter 10 Correlation and Regression
Relationship between two variables Two quantitative variables: correlation and regression methods Two qualitative variables: contingency table methods.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
1 Experimental Statistics - week 9 Chapter 17: Models with Random Effects Chapter 18: Repeated Measures.
1 Experimental Statistics Spring week 6 Chapter 15: Factorial Models (15.5)
Experimental Statistics - week 3
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Experimental Statistics - week 9
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
1 Experimental Statistics - week 8 Chapter 17: Mixed Models Chapter 18: Repeated Measures.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Chapter 13 Simple Linear Regression
Regression and Correlation
Chapter 5 STATISTICS (PART 4).
6-1 Introduction To Empirical Models
Experimental Statistics - week 8
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday

2 2-Factor with Repeated Measure -- Model type subject within type time type by time interaction NOTES: type and time are both fixed effects in the current example - we say “subject is nested within type” - Expected Mean Squares given on page 1032

3 The GLM Procedure Dependent Variable: conc Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE conc Mean Source DF Type III SS Mean Square F Value Pr > F type subject(type) <.0001 time <.0001 type*time < Factor Repeated Measures – ANOVA Output

4 2-factor Repeated Measures Source Type III Expected Mean Square type Var(Error) + 5 Var(subject(type)) + Q(type,type*time) subject(type) Var(Error) + 5 Var(subject(type)) time Var(Error) + Q(time,type*time) type*time Var(Error) + Q(type*time) The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: conc Source DF Type III SS Mean Square F Value Pr > F * type Error Error: MS(subject(type)) * This test assumes one or more other fixed effects are zero. Source DF Type III SS Mean Square F Value Pr > F subject(type) <.0001 * time <.0001 type*time <.0001 Error: MS(Error)

5 NOTE: Since time x type interaction is significant, and since these are fixed effects we DO NOT test main effects – we compare cell means (using MSE) C T Cell Means

6 The write-up related to the SAS output should be something like the following. Note, that even though we get a significant variance component due to subject(within group) I did not estimate the variance component itself. (I did not give this particular variance component estimation formula.) Note also that since there is a significant interaction between the fixed effects type and time, we do not test the main effects.

7 Dealing with Normality/Equal Variance Issues Normalizing Transformations: - log - square root - Box-Cox transformations Note: the normalizing transformations sometimes also produce variance stabilization

8 Nonparametric “ANOVA” Man-Whitney U – for comparing 2 samples Kruskal-Wallis Test – for comparing >2 samples Friedman’s Test – nonparametric alternative to randomized complete block/ 1-factor repeated measures design

HistogramHistogram displays distribution of 1 variable Scatter Diagram (Scatterplot) Scatter Diagram (Scatterplot) displays joint distribution of 2 variables plots data as “points” in the“x-y plane.”

10

11

Association Between Two Variables – indicates that knowing one helps in predicting the other Linear Association – our interest in this course – points “swarm” about a line Correlation Analysis – measures the strength of linear association

13

14 (association)

Regression Analysis We want to predict the dependent variable - response variable using the independent variable - explanatory variable - predictor variable DependentVariable(Y) Independent Variable (X) More than one independent variable – Multiple Regression

Correlation Analysis

Correlation Coefficient - measures linear association perfect no perfect negative linear positive relationship relationship relationship

Positive Correlation - - high values of one variable are associated with high values of the other Examples: - father’s height, son’s height - daily grade, final grade r = 0.93 for plot on the left

19 EXAMS I and II

Negative Correlation - - high with low, low with high Examples: - car age, selling price - days absent, final grade r = for plot shown here

21

Zero Correlation - - no linear relationship Examples: - height, IQ score r = 0.0 for plot here

23

, 0,.5,.99

25

26 Calculating the Correlation Coefficient

27 Notation: So --

28 Study Time Exam (hours) Score (X) (Y) The data below are the study times and the test scores on an exam given over the material covered during the two weeks. Find r

29 DATA one; INPUT time score; DATALINES; ; PROC CORR; Var score time; TITLE ‘Study Time by Score'; RUN; PROC PLOT; PLOT time*score; RUN; PROC GPLOT; PLOT time*score; RUN;

30 The CORR Procedure 2 Variables: score time Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum score time Pearson Correlation Coefficients, N = 8 Prob > |r| under H0: Rho=0 score time score time Study Time by Score

31 Plot of score*time. Legend: A = 1 obs, B = 2 obs, etc. score ‚ ‚ 92 ˆ A ‚ 91 ˆ ‚ 90 ˆ ‚ 89 ˆ ‚ 88 ˆ ‚ 87 ˆ ‚ 86 ˆ ‚ 85 ˆ A ‚ 84 ˆ A A ‚ 83 ˆ ‚ 82 ˆ ‚ 81 ˆ A ‚ 80 ˆ A A ‚ 79 ˆ ‚ 78 ˆ ‚ 77 ˆ ‚ 76 ˆ ‚ 75 ˆ ‚ 74 ˆ A ‚ Šƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒ time

32

33 Rejection Region Test Statistic t > t  2 or t <  t  /2 df  n  2 Testing Statistical Significance of Correlation Coefficient

34 Correlation Between Study Time and Score H 0 : There is No Correlation Between Study Time and Score H a : There is a Correlation Between Study Time and Score Rejection Region Test Statistic Conclusion P-value

35

36 Correlation measures the strength of the linear relationship between two variables. Correlation requires that both variables be quantitative. r does not change when we change the units of measurement of x, y, or both. Correlation makes no distinction between explanatory and response variables. The correlation coefficient is not resistant to outliers. Properties of Correlations

37 The CORR Procedure Pearson Correlation Coefficients, N = 20 Prob > |r| under H0: Rho=0 math reading math <.0001 reading <.0001 Math vs Reading Scores

38 The CORR Procedure Pearson Correlation Coefficients, N = 20 Prob > |r| under H0: Rho=0 math reading math reading Math vs Reading Scores with Outlier

39 Pearson Correlation Coefficients, N = 14 Prob > |r| under H0: Rho=0 math reading math reading

40 Pearson Correlation Coefficients, N = 14 Prob > |r| under H0: Rho=0 math reading math reading

41 Divorce Rate (per 1000) % in prison on Drug Offenses

IMPORTANT NOTE: Correlation DOES NOT Imply Causation strong association between 2 variables is not enough to justify conclusions about cause and effect best way to get evidence that X causes Y is through a controlled experiment

Regression Analysis

44

45 Goal of Regression Analysis: Predict Y from knowledge of X For data such as the Father-Son data, it seems reasonable to assume a model of the form i.e. the conditional means of Y given x follow a straight line

46 Alternative mathematical expression for the “regression model”: In practice, we want to estimate this line from the data.

Which line is “closest” to the points ?

Criterion for measuring “closeness” --- the sum of squared vertical distances from the points to the line Regression (Least Squares) Line --- the line for which this sum-of-squared distance is a minimum

55 Notation Theoretical Model Regression line

56 Data we write

57 NOTE: - this is a calculus problem

58 Least Squares Estimates Computation Formula

59 Study Time Exam (hours) Score (X) (Y) The data below are the study times and the test scores on an exam given over the material covered during the two weeks. Find the equation of the regression line for prediction exam score from study time.

60 The GLM Procedure Dependent Variable: score Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE score Mean Source DF Type I SS Mean Square F Value Pr > F time Source DF Type III SS Mean Square F Value Pr > F time Standard Parameter Estimate Error t Value Pr > |t| Intercept <.0001 time PROC GLM; MODEL score=time; RUN;