Presentation on theme: "STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113."— Presentation transcript:
STAT E-100 Section 2—Monday, September 23, 2013 5:30-6:30PM, SC113
Housekeeping Homework assignments are due at the beginning of each class (@ 5:30PM on Tuesday) Comments on homework If you have any questions about the homework, email me at email@example.com@fas.harvard.edu Section review can be found on course website under “Sections” tab
General Questions General Questions? Questions about HW assignments 1 or 2. Questions about SPSS?
Sample Question 2 A survey was given in a section of stat 104 students last fall. It measured the students to report their self- recorded heart rate and asked them the number of hours they exercised per week. Here are the histograms of the 2 variables.
Sample Question 2 (cont’d) The first (exercise) is skewed to the right, with much of the data falling between 0 and 10. The second (heartrate) is a bit skewed to the left, with data falling more evenly around both sides of the mean (which is different than the first).
Sample Question 2 (cont’d) Below is the scatterplot of y = heart rate vs. x = exercise. b) What do you think the correlation coefficient is between these 2 variables? Is the coefficient positive or negative? In other words, when there is an increase in hours/wk of exercise, what happens to heartrate?
Sample Question 2 (cont’d) Here is some SPSS output to describe the relationship between the variables exercise and heartrate:
Sample Question 2 (cont’d) c) What would be the correlation between heartrate and exercise if iexercise were measures in minutes per week instead of hours? We can convert this in SPSS via the Transform - >Compute Variable menu option (see original SPSS handout for more information). The SPSS output is below:
Sample Question 2 (cont’d) d) What is the equation for the best fit line for this data? Remember, we use y = b0 + b1x format, with b0 being the y-intercept (i.e. what the value of y is when x=0, visually seen as when the line crosses the y axis) and b1 being the slope of the simple linear regression equation.
Sample Question 2 (cont’d) e) What would be the equation for the least squares line between heartrate and exercise if it were measured in minutes per week? Using the new variable we computed and re-running the regression, we see that:
Sample Question 2 (cont’d) f) What is the estimated heartrate for a person who exercise 10 hours per week? How would this change if this person exercised an additional 5 hours per week? y hat = b 0 + b 1 x y hat = 72.998hrs + -.883 (10hrs) y hat = 72.998hrs – 8.83hrs = 64.168hrs is the predicted value of y given our equation when a person exercised 10 hours y hat = 72.998hrs + -.883 (15hrs) y hat = 72.998hrs – 13.245hrs = 59.753hrs
Sample Question 2 (cont’d) f) What is the estimated heartrate for a person who exercise 10 hours per week? How would this change if this person exercised an additional 5 hours per week?
“r” versus “r 2 ” Correlation (represented as “r”) and proportion of variance (represented as “r 2 ”) might be difficult to discern at face value Similarities: They use the same letter to represent the two concepts In simple linear regression (what we’ve been doing with one x value), one can find the correlation (r) between the two variables. When exploring the strength of the relationship between the explanatory (x) and response (y) variables, one can do this by taking the square root of the r 2.
Correlation (r) From Kevin’s lecture, we know that correlation is the measure of strength of the linear relationship between two variables In SPSS when using the Analyze->Correlate->Bivariate menu option, the r is the value listed under the Pearson Correlation row that is matched between the two variables (one has a value of 1 when you match the same variable since the data would be the same)
Proportion of variance (r 2 ) Again, from Kevin (and he does a great job in explaining variance in lecture #3 towards the end of the Part 1), r 2 is the fraction of the total variability of the values of y (irrespective of x) over the variance of observed values from your model. In other words, r 2 it is a way to discern how helpful your regression equation is in explaining the variability in y.