Presentation is loading. Please wait.

Presentation is loading. Please wait.

R EGRESSION Jennifer Kensler. Laboratory for Interdisciplinary Statistical Analysis Collaboration From our website request a meeting for personalized.

Similar presentations


Presentation on theme: "R EGRESSION Jennifer Kensler. Laboratory for Interdisciplinary Statistical Analysis Collaboration From our website request a meeting for personalized."— Presentation transcript:

1 R EGRESSION Jennifer Kensler

2 Laboratory for Interdisciplinary Statistical Analysis Collaboration From our website request a meeting for personalized statistical advice Great advice right now: Meet with LISA before collecting your data Short Courses Designed to help graduate students apply statistics in their research Walk-In Consulting Monday—Friday 12-2PM for questions requiring <30 mins All services are FREE for VT researchers. We assist with research—not class projects or homework. LISA helps VT researchers benefit from the use of Statistics www.lisa.stat.vt.edu Experimental Design Data Analysis Interpreting Results Grant Proposals Software (R, SAS, JMP, SPSS...)

3 T OPICS Simple Linear Regression Multiple Linear Regression Regression with Categorical Variables 3

4 T YPES OF S TATISTICAL A NALYSES Explanatory Variable(s) CategoricalContinuousCategorical & Continuous Response Variable Categorica l Contingency Table or Logistic Regression Continuou s ANOVARegressionANCOVA or Regression with categorical variables 4

5 S IMPLE L INEAR R EGRESSION 5

6 Simple Linear Regression (SLR) is used to model the relationship between two continuous variables. Scatterplots are used to graphically examine the relationship between two quantitative variables. 6 Sullivan (pg. 193)

7 T YPES OF R ELATIONSHIPS B ETWEEN T WO C ONTINUOUS V ARIABLES Positive and negative linear relationships 7

8 T YPES OF R ELATIONSHIPS B ETWEEN T WO C ONTINUOUS V ARIABLES Curved Relationship No Relationship 8

9 C ORRELATION 9

10 P ROPERTIES OF THE C ORRELATION C OEFFICIENT 10

11 S IMPLE L INEAR R EGRESSION 11 Can we describe the behavior between the two variables with a linear equation? The variable on the x-axis is often called the explanatory or predictor variable. The variable on the y-axis is called the response variable.

12 S IMPLE L INEAR R EGRESSION Objectives of Simple Linear Regression Determine the significance of the predictor variable in explaining variability in the response variable. (i.e. Is per capita GDP useful in explaining the variability in life expectancy?) Predict values of the response variable for given values of the explanatory variable. (i.e. if we know the per capita GDP can we predict life expectancy?) Note: The predictor variable does not necessarily cause the response. 12

13 S IMPLE L INEAR R EGRESSION M ODEL 13

14 SLR E STIMATION OF P ARAMETERS 14

15 T HE R ESIDUAL 15

16 S IMPLE L INEAR R EGRESSION A SSUMPTIONS 16

17 D IAGNOSTICS : R ESIDUAL P LOT A residual plot is used to check the assumption of constant variance and to check model fit (is a line a good fit). Good residual plot: no pattern 17

18 D IAGNOSTICS Left: Residuals show non-constant variance. Right: Residuals show non-linear pattern. 18

19 D IAGNOSTICS : N ORMAL Q UANTILE P LOT Left: Residuals are not normal Right: Normality assumption appropriate 19

20 ANOVA T ABLE FOR S IMPLE L INEAR R EGRESSION SourceSSdfMSFP-value Regression1 Errorn-2 Totaln-1 20

21 T EST FOR P ARAMETERS 21

22 C OEFFICIENT OF D ETERMINATION 22

23 M USCLE M ASS E XAMPLE A nutritionist randomly selected 15 women from each ten year age group beginning with age 40 and ending with age 79. The nutritionist recorded the age and muscle mass of each women. The nutritionist would like to fit a model to explore the relationship between age and muscle mass. (Kutner et al. pg. 36) 23

24 JMP: M AKING A S CATTERPLOT To analyze the data click Analyze and then select Fit Y by X. 24

25 JMP: M AKING A S CATTERPLOT As shown below Y, Response: Muscle Mass X, Factor: Age 25

26 JMP: S CATTERPLOT This results in a scatter plot. 26

27 JMP: S IMPLE L INEAR R EGRESSION To perform the simple linear regression click on the Red Arrow and then select Fit Line. 27

28 S IMPLE L INEAR R EGRESSION R ESULTS The results on the right are displayed. 28

29 JMP: D IAGNOSTICS 29 Click on the Red Arrow next to Linear Fit and select Plot Residuals.

30 D IAGNOSTIC P LOTS The plots to the right are then added to the JMP output. 30

31 M ULTIPLE L INEAR R EGRESSION 31

32 M ULTIPLE L INEAR R EGRESSION Similar to simple linear regression, except now there is more than one explanatory variable. Body fat can be difficult to measure. A researcher would like to come up with a model that uses the more easily obtained measurements of triceps skinfold thickness, thigh circumference and midarm circumference to predict body fat. (Kutner et al. pg. 256) 32

33 F IRST O RDER M ULTIPLE L INEAR R EGRESSION M ODEL 33

34 M ULTIPLE L INEAR R EGRESSION ANOVA T ABLE SourceSSdfMSFP-value Regressionp-1 Errorn- p Totaln- 1 34

35 C OEFFICIENT OF M ULTIPLE D ETERMINATION 35

36 A SSUMPTIONS OF M ULTIPLE L INEAR R EGRESSION 36

37 C OMMERCIAL R ENTAL R ATES A real estate company would like to build a model to help clients make decisions about properties. The company has information about rental rate (Y), age (X 1 ), operating expenses and taxes (X 2 ), vacancy rates (X 3 ), and total square footage (X 4 ). The information is regarding luxury real estate in a specific location. (Kutner et al. pg. 251) 37

38 JMP: C OMMERCIAL R ENTAL R ATES First, examine the data. Click Analyze, then Multivariate Methods, then Multivariate. 38

39 JMP: S CATTERPLOT M ATRIX For Y, Columns enter Y, X1, X2, X3 and X4. Then click OK. 39

40 JMP: C ORRELATIONS AND S CATTERPLOT M ATRIX 40

41 JMP: F ITTING T HE R EGRESSION M ODEL Click Analyze and then select Fit Model. 41

42 JMP: F ITTING THE R EGRESSION M ODEL Y: Y, Highlight X1, X2, X3 and X4 and click Add. Then click Run. 42

43 F ITTING THE M ODEL Examining the parameter estimates we see that X3 is not significant. Fit a new model this time omitting X3. 43

44 S OME JMP O UTPUT 44

45 JMP: C HECKING A SSUMPTIONS Included output Need residuals: Click the red arrow next to Y Response → Save Columns → Residuals 45

46 JMP: C HECK N ORMALITY A SSUMPTION Analyze → Distribution → Y, Columns: Residual Y Click the red arrow next to Distribution Residual Y and select Normal Quantile Plot. 46

47 JMP: C HECKING R ESIDUALS VS. I NDEPENDENT V ARIABLES Analyze → Fit Y by X → Y, Columns: Residual Y X, Factor: X1, X2, X4 47

48 O THER M ULTIPLE L INEAR R EGRESSION I SSUES Outliers Higher Order Terms Interaction Terms Multicollinearity Model Selection 48

49 R EGRESSION WITH C ATEGORICAL V ARIABLES 49

50 R EGRESSION WITH C ATEGORICAL V ARIABLES Sometimes there are categorical explanatory variables that we would like to incorporate into our model. Suppose we would like to model the profit or loss of banks last year based on bank size and type of bank (commercial, mutual savings, or savings and loan). (Kutner et al. pg. 340) 50

51 R EGRESSION M ODEL WITH C ATEGORICAL V ARIABLES 51

52 R EGRESSION WITH C ATEGORICAL V ARIABLES A school district would like to determine if a new reading program improves student reading. The school district is also interested in the effect of days absent on reading improvement. Approximately half the students are assigned to the treatment group (new reading program) and half to the control group (traditional method). The students are tested at the beginning and end of the school year and the change in their score is recorded. 52

53 JMP I NSTRUCTIONS Analyze  Fit Model Y: Score Change Add: Treatment Days Absent Run Model Response Score Change  Estimates  Show Prediction Expression 53

54 JMP O UTPUT 54 Treatment and days absent had significant effects on improvement.

55 D IAGNOSTICS : C ONSTANT V ARIANCE Residual by Predicted plot produced automatically. 55

56 D IAGNOSTICS : C ONSTANT V ARIANCE Residual by Factor Plots First Save Residuals: Response Score Change  Save Columns  Residuals Produce Plots: Analyze  Fit Y by X  Y, Response: Residuals Score Change; X, Factor: Treatment, Days Absent 56

57 D IAGNOSTICS : N ORMALITY 57 Analyze  Distribution  Y, Columns: Residual Score Change

58 C ONCLUSIONS Simple linear regression allows us to find the best fit line between a continuous explanatory variable and a continuous response variable. Multiple linear regression allows use to explore the relationship between a continuous response variable and multiple explanatory variables. (Also allows for higher order terms to be introduced.) Regression with categorical variables allows us to incorporate categorical predictor variables into the model. 58

59 SAS, SPSS AND R For information about using SAS, SPSS and R to do regression: http://www.ats.ucla.edu/stat/sas/topics/regression.ht m http://www.ats.ucla.edu/stat/spss/topics/regression.ht m http://www.ats.ucla.edu/stat/r/sk/books_pra.htm 59

60 R EFERENCES Michael Sullivan III. Statistics Informed Decisions Using Data. Upper Saddle River, New Jersey: Pearson Education, 2004. Michael H. Kutner, Christopher J. Nachtsheim, John Neter and William Li. Applied Linear Statistical Models. New York: McGraw-Hill Irwin, 2005. 60


Download ppt "R EGRESSION Jennifer Kensler. Laboratory for Interdisciplinary Statistical Analysis Collaboration From our website request a meeting for personalized."

Similar presentations


Ads by Google