Presentation is loading. Please wait.

Presentation is loading. Please wait.

What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll.

Similar presentations


Presentation on theme: "What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll."— Presentation transcript:

1

2 What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll finish their doctorate earlier?  Are computer literates less anxious about statistics?  …. ?  Are men more likely to study part-time?  Are women more likely to enroll in CCE?  …. ? Questions that Require Us To Examine Relationships Between Features of the Participants.  How tall are class members, on average?  How many hours a week do class members report that they study?  …. ?  How many members of the class are women?  What proportion of the class is fulltime?  …. ? Questions That Require Us To Describe Single Features of the Participants “Continuous” Data “Categorical” Data Research Is A Partnership Of Questions And Data © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 1 S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables

3 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 2 S010Y: Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables Just to remind you, here’s the codebook for the WALLCHT data … DatasetWALLCHT.txt Overview Summary information on selected aspects of state educational performance outcomes, resource inputs, and population characteristics, in 1988. Source US Department of EducationUS Department of Education and the National Center for Education Statistics.National Center for Education Statistics Sample Size50 states UpdatedDecember 5, 2003 Col Variable Name DescriptionMetric 1STATEName of the State.Words 2TCHRSALAverage teacher salary in the State.dollars 3STRATIO Average number of students per teacher statewide. ratio 4PPEXPEND Average expenditure per pupil in the State. dollars 5HSGRADRT Average high-school graduation rate statewide %age

4 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 3 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio This is my “best guess” for a summary linear trend line to represent the HSGRADRT vs. STRATIO relationship.. I obtained it by a mysterious process called ordinary least-squares (OLS) regression analysis. This is my “best guess” for a summary linear trend line to represent the HSGRADRT vs. STRATIO relationship.. I obtained it by a mysterious process called ordinary least-squares (OLS) regression analysis. 66.0 24.7 13.3 78.8 … and the output from the analysis gives me its best prediction for the values of HSGRADRT (the “predicted values”). … and the output from the analysis gives me its best prediction for the values of HSGRADRT (the “predicted values”). After I have conducted my “OLS Regression Analysis,” I just pick some sensible values of STRATIO … the MIN and MAX perhaps? After I have conducted my “OLS Regression Analysis,” I just pick some sensible values of STRATIO … the MIN and MAX perhaps? And the line that joins up the predicted values is known as the “fitted regression line”

5 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 4 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio The “OLS” method that was actually used by the regression analysis to provide this “best guess” for the trend … … has a useful physical analogy in the “thumbtack and elastic band” approach... Both the thumbtack and elastic band and the ordinary least- squares regression approaches find that fitted linear trend line for which the sum of the squared vertical distances of the data points from the fitted line is the least.

6 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 5 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables Here’s a couple of things to help you develop better intuition about the nature of fitted trend lines produced by OLS Regression Analysis. 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio A simulation that lets you try out the OLS regression fitting algorithm for yourself.simulation A simulation that lets you try out the OLS regression fitting algorithm for yourself.simulation A simulation that:simulation  Provides data examples,  Lets you draw your own version of the fitted trend line,  Then shows you what an OLS regression analysis would produce, by way of comparison. A simulation that:simulation  Provides data examples,  Lets you draw your own version of the fitted trend line,  Then shows you what an OLS regression analysis would produce, by way of comparison.

7 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 6 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 10/Handout 1: Summarizing Relationships Between Continuous Variables'; TITLE3 'The Infamous Wallchart Data'; TITLE4 'Data in WALLCHT.txt'; *--------------------------------------------------------------------------------* Input data, name and label variables in the dataset *--------------------------------------------------------------------------------*; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; *--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO *--------------------------------------------------------------------------------*; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; *--------------------------------------------------------------------------------* Plotting the relationship between HSGRADRT and STRATIO *--------------------------------------------------------------------------------*; PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10; RUN; OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 10/Handout 1: Summarizing Relationships Between Continuous Variables'; TITLE3 'The Infamous Wallchart Data'; TITLE4 'Data in WALLCHT.txt'; *--------------------------------------------------------------------------------* Input data, name and label variables in the dataset *--------------------------------------------------------------------------------*; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; *--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO *--------------------------------------------------------------------------------*; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; *--------------------------------------------------------------------------------* Plotting the relationship between HSGRADRT and STRATIO *--------------------------------------------------------------------------------*; PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10; RUN; Of course, you can also get PC-SAS to tell you where the OLS-fitted regression line is … Here are the usual data input statements Here are the PC- SAS regression analysis commands – we dissect them in detail on the next slide Creates another scatterplot of the data for use later

8 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 7 *--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO *--------------------------------------------------------------------------------*; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; *--------------------------------------------------------------------------------* Using regression analysis to summarize the relationship of HSGRADRT and STRATIO *--------------------------------------------------------------------------------*; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables Here’s the part of the PC_SAS program that deals specifically with the OLS Regression Analysis of the HSGRADRT versus STRATIO relationship … You request an OLS Regression Analysis by specifying a “Regression Model” that identifies the “Outcome” and the “Predictor(s)” to include in the analysis: Model HSGRADRT = STRATIO You request an OLS Regression Analysis by specifying a “Regression Model” that identifies the “Outcome” and the “Predictor(s)” to include in the analysis: Model HSGRADRT = STRATIO You identify the outcome variable (HSGRADRT) by placing it to the left of the “equals” sign, in the MODEL statement You identify the predictor variable (STRATIO) by placing it to the right of the “equals” sign, in the MODEL statement PROC REG is the command in PC-SAS that requests an OLS Regression Analysis

9 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 8 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables The REG Procedure Model: MODEL1 Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 337.52168 337.52168 6.07 0.0174 Error 48 2669.04952 55.60520 Corrected Total 49 3006.57120 Root MSE 7.45689 R-Square 0.1123 Dependent Mean 74.27600 Adj R-Sq 0.0938 Coeff Var 10.03943 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78 STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174 Here’s output from the OLS Regression Analysis of Outcome HSGRADRT on Predictor STRATIO….. This is the major part of the regression output. I unpack it on the next several slides This is the major part of the regression output. I unpack it on the next several slides Ignore this part of the output. When you go on to S030, you’ll learn what it all means

10 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 9 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78 STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174 Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78 STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174 The core part of the OLS Regression Output describes the fitted regression line.. But, how do you work with this “Fitted Model”? These “Parameter Estimates” tell you where PROC REG thinks that the fitted trend line should be drawn … by listing them, it’s telling you that the fitted trend line has the following algebraic equation:

11 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 10 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables Let’s try a couple.. Remember that the fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance… 1. When STRATIO = 13.3 (the minimum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = 93.69 – 14.90 = 78.8 1. When STRATIO = 13.3 (the minimum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = 93.69 – 14.90 = 78.8 2. When STRATIO = 24.7 (the maximum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = 93.69 – 27.66 = 66.0 2. When STRATIO = 24.7 (the maximum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = 93.69 – 27.66 = 66.0 You substitute reasonable values for predictor, STRATIO, into the fitted equation and then use it to compute the best predictions – or predicted values -- for HSGRADRT, as follows: Recognize these values?

12 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 11 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio 66.0 24.7 13.3 78.8 Here they are … and, of course, by choosing other values of STRATIO, the fitted equation can also tell us the location of every other point on the fitted line in between. To reproduce the fitted line, I just need to:  Systematically substitute all-possible values of STRATIO into the fitted equation, and  Compute corresponding predicted values of HSGRADRT. To reproduce the fitted line, I just need to:  Systematically substitute all-possible values of STRATIO into the fitted equation, and  Compute corresponding predicted values of HSGRADRT. Then, if I plotted them all, this is what I’d see

13 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 12 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance… 1. When STRATIO = 0 (this is a value of STRATIO that does not exist in the dataset, but provides an interesting anchor point nevertheless), Predicted value of HSGRADRT = (93.69) + (-1.12)(0) = 93.69 – 0 = 93.69 1. When STRATIO = 0 (this is a value of STRATIO that does not exist in the dataset, but provides an interesting anchor point nevertheless), Predicted value of HSGRADRT = (93.69) + (-1.12)(0) = 93.69 – 0 = 93.69 Recognize this value?

14 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 13 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance… 1. When STRATIO = 20 (or any other ad-hoc value of STRATIO that is within the sample range), Predicted value of HSGRADRT = (93.69) + (-1.12)(20) = 93.69 – 22.4 = 71.29 1. When STRATIO = 20 (or any other ad-hoc value of STRATIO that is within the sample range), Predicted value of HSGRADRT = (93.69) + (-1.12)(20) = 93.69 – 22.4 = 71.29 2. When STRATIO = 21 (notice that this is just one unit higher than the previous value of 20) Predicted value of HSGRADRT = (93.69) + (-1.12)(21) = 93.69 – 23.52 = 70.17 2. When STRATIO = 21 (notice that this is just one unit higher than the previous value of 20) Predicted value of HSGRADRT = (93.69) + (-1.12)(21) = 93.69 – 23.52 = 70.17 Recognize the difference in these values? = (70.17 – 71.29) = -1.12 Recognize the difference in these values? = (70.17 – 71.29) = -1.12

15 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 14 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables This means that each term in the fitted regression model has a specific interpretation … This is the predicted value of HSGRADRT, based on the OLS regression fit:  Its “hat” indicates that it is a prediction. This is the predicted value of HSGRADRT, based on the OLS regression fit:  Its “hat” indicates that it is a prediction. The predicted value represents the value of HSGRADRT that you would expect for a State, based solely on its value of STRATIO. This is the estimated intercept of the fitted regression line:  It tells you the predicted value of HSGRADRT, when STRATIO is zero. This is the estimated intercept of the fitted regression line:  It tells you the predicted value of HSGRADRT, when STRATIO is zero. In the current context, it doesn’t make much sense to interpret it (why?). This is the estimated slope of the fitted regression line:  It summarizes the relationship between HSGRADRT and STRATIO.  It tells you the difference in the predicted value of HSGRADRT per unit difference in STRATIO. This is the estimated slope of the fitted regression line:  It summarizes the relationship between HSGRADRT and STRATIO.  It tells you the difference in the predicted value of HSGRADRT per unit difference in STRATIO. Here, slope is negative, meaning that States with student/teacher ratios that are one child bigger will have a graduation rate that is 1.12% lower, on average This represents the actual values of predictor, STRATIO

16 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 15 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables It’s the estimated slope in a regression analysis that captures the relationship between outcome & predictor….. What would the scatterplot look like and what would the slope be, if states with larger student/teacher ratios tended to have higher graduation rates? STRATIO HSGRADRT What would the scatterplot look like and what would the slope be, if there were no relationship between high school graduation rate and student/teacher ratio? STRATIO HSGRADRT Here’s a simulation that let’s you create datasets with your mouse, and then shows you the OLS fitted line.simulation Here’s a simulation that let’s you create datasets with your mouse, and then shows you the OLS fitted line.simulation

17 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 16 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78 STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174 Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept 1 93.69187 7.95093 11.78 STRATIO 1988 Student/Teacher Ratio 1 -1.12140 0.45516 -2.46 Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio 1 0.0174 Like in our categorical data analysis, we can ask whether we could have reached this same conclusion by an accident of sampling. Could we have gotten a slope value of –1.12 by sampling from a population in which there was no relationship between HSGRADRT and STRATIO (i.e., by sampling from a null population in which the slope was zero). And, again, as in categorical data analysis, PROC REG provides a p-value to help you check out the effects of the idiosyncrasies of sampling:  The p-value for the HSGRADRT/STRATIO regression slope is 0.0174,  Since 0.0174 is less than.05, we can reject the null hypothesis that there is no relationship between HSGRADRT and STRATIO, in the population. And, again, as in categorical data analysis, PROC REG provides a p-value to help you check out the effects of the idiosyncrasies of sampling:  The p-value for the HSGRADRT/STRATIO regression slope is 0.0174,  Since 0.0174 is less than.05, we can reject the null hypothesis that there is no relationship between HSGRADRT and STRATIO, in the population.

18 © Willett, Harvard University Graduate School of Education, 12/18/2015S010Y/C10 – Slide 17 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio 1 100 ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 10 15 20 25 1988 Student/Teacher Ratio The Story So Far … In our investigation of state-level aggregate statistics, we have found that, on average, the percentage of seniors graduating from High School is lower in states with a higher student/teacher ratio. When state-wide high-school graduation rate (HSGRADRT) is treated as outcome and state-wide student/teacher ratio (STRATIO) is treated as the predictor, we find that the trend-line estimated by ordinary least-squares regression analysis has a slope of –1.12 (p = 0.0174). This tells us that two states whose student/teacher ratios differ by 1 student per teacher will tend to have graduation rates that differ by 1.12 percentage points, with states that enjoy lower student/teacher ratios tending to have the higher high- school graduation rates. The Story So Far … In our investigation of state-level aggregate statistics, we have found that, on average, the percentage of seniors graduating from High School is lower in states with a higher student/teacher ratio. When state-wide high-school graduation rate (HSGRADRT) is treated as outcome and state-wide student/teacher ratio (STRATIO) is treated as the predictor, we find that the trend-line estimated by ordinary least-squares regression analysis has a slope of –1.12 (p = 0.0174). This tells us that two states whose student/teacher ratios differ by 1 student per teacher will tend to have graduation rates that differ by 1.12 percentage points, with states that enjoy lower student/teacher ratios tending to have the higher high- school graduation rates.


Download ppt "What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll."

Similar presentations


Ads by Google