1/11/2016Slide 1 Extending the relationships found in linear regression to a population is procedurally similar to what we have done for t-tests and chi-square.

Slides:

Advertisements

Similar presentations

4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.

Advertisements

Copyright © 2010 Pearson Education, Inc. Slide

SW388R6 Data Analysis and Computers I Slide 1 Testing Assumptions of Linear Regression Detecting Outliers Transforming Variables Logic for testing assumptions.

One-sample T-Test of a Population Mean

5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.

Assumption of normality

Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.

Outliers Split-sample Validation

Detecting univariate outliers Detecting multivariate outliers

Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,

Outliers Split-sample Validation

Multiple Regression – Assumptions and Outliers

Multiple Regression – Basic Relationships

SW388R7 Data Analysis & Computers II Slide 1 Computing Transformations Transforming variables Transformations for normality Transformations for linearity.

Correlation and Regression Analysis

Regression Analysis We have previously studied the Pearson’s r correlation coefficient and the r2 coefficient of determination as measures of association.

Assumption of Homoscedasticity

Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.

SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.

Testing Assumptions of Linear Regression

8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.

Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.

8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.

SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.

SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.

SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.

8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.

8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.

Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.

8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.

SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.

8/23/2015Slide 1 The introductory statement in the question indicates: The data set to use: GSS2000R.SAV The task to accomplish: a one-sample test of a.

SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.

1 Least squares procedure Inference for least squares lines Simple Linear Regression.

Inferences for Regression

Hierarchical Binary Logistic Regression

Stepwise Multiple Regression

Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 1 Hierarchical Multiple Regression. Slide 2 Differences between standard and hierarchical multiple regression  Standard multiple regression is.

Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.

+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.

Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.

6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.

SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.

6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.

SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.

11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.

Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.

Slide 1 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables)

11/19/2015Slide 1 We can test the relationship between a quantitative dependent variable and two categorical independent variables with a two-factor analysis.

SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.

SW318 Social Work Statistics Slide 1 One-way Analysis of Variance  1. Satisfy level of measurement requirements  Dependent variable is interval (ordinal)

SW318 Social Work Statistics Slide 1 Percentile Practice Problem (1) This question asks you to use percentile for the variable [marital]. Recall that the.

SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.

Simple linear regression Tron Anders Moger

12/14/2015Slide 1 The dependent variable, poverty, is plotted on the vertical axis. The independent variable, enrolPop, is plotted on the horizontal axis.

SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.

12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.

1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.

Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.

26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.

2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.

(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.

IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)

26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.

By Dr. Madhukar H. Dalvi Nagindas Khandwala college

Assumption of normality

DEPARTMENT OF COMPUTER SCIENCE

Multiple Regression.

Multiple Regression – Split Sample Validation

Presentation transcript:

1/11/2016Slide 1 Extending the relationships found in linear regression to a population is procedurally similar to what we have done for t-tests and chi-square tests. In regression, the null hypothesis is that there is no relationship between the dependent and independent variables. When there is no relationship, the predicted values for the dependent variable are the same for all values of the independent variable. In order for this to happen, the slope in the regression equation would have to be zero, i.e. estimated dependent variable = intercept + 0 x independent variable. The value for the independent variable would be multiplied by zero and would not change. The null hypothesis of no relationship translates to the slope = 0, or b = 0. Without a relationship, our best estimate of the value of the dependent variable is the mean of the dependent variable (best = smallest total error). The alternative hypothesis is that there is a relationship, i.e. knowing the value of the independent variable helps us do a more accurate job of predicting values of the dependent variable (more accurate = less total error) If we reject the null hypothesis, we interpret the strength and direction of the relationship for the population represented by the sample. If we fail to reject the null hypothesis, we find that the data does not support the research hypothesis.

1/11/2016Slide 2 To test the inference in linear regression, we are required to satisfy the conditions stated for linear regression (linearity, equal variance of the residuals, and an absence of outliers). In addition, to use the normal distribution to accurately compute probabilities for the statistical test, the distribution of the residuals must be normal. Support for the normality of the residuals mirrors the criteria used for the normality of the dependent variable in t-tests – the variables are normally distributed, or if they are not, the sample size is large enough to apply the Central Limit theorem. Since it is difficult to accurately evaluate the scatterplots to support equality of variance and normality of residuals, we introduce the use of diagnostic statistical tests which provide the same numeric criteria for making decisions we use in hypothesis tests. We will use the Breusch-Pagan test for evaluating equality of variance for the residuals and the Shapiro-Wilk test for normality. Diagnostic tests have a null hypothesis that the data meets the condition we are testing for, e.g. equality of variance or normality. Rejection of the null hypothesis implies that the condition is not satisfied..

1/11/2016Slide 3 Our objective in these tests is to fail to reject the null hypothesis. i. e. conclude that the variance in the residuals is uniform or the residuals are normally distributed. The goal is, thus, opposite to what we hope to find in testing regular hypothesis tests. Our purpose is to assess or diagnose our data rather than to make inferences about the population. SPSS computes the Shapiro-Wilk test, but does not compute the Breusch-Pagan test. The script for Simple Linear Regression has been modified to include the Breusch- Pagan test in a table of statistics for homoscedasticity. The modified script file is named SimpleLinearRegressionInferenceTest.SBS and is available on the course web site. Due to the difficulties in running scripts, I have also provided a syntax file that computes the Breusch-Pagan statistic and probability. Syntax files do not usually have the same problems running on different versions of SPSS that we experience with script files, but their solutions are more cumbersome. Demonstration of the syntax file is included in this tutorial. The syntax file is named BreuschPaganSyntax.sps, available on the course web site. There is an SPSS macro on the web for computing Breusch-Pagan, but I find that it does not produce correct answers (or at least the answers in SAS and R). While I would usually set a more conservative alpha of 0.01 for diagnostic tests to make sure we only respond to serious violations, we will use 0.05 for this week’s problems.

1/11/2016Slide 4 The introductory statement in the question indicates: The data set to use (world2007.sav) The task to accomplish (a regression slope t-test ) The variables to use in the analysis: the independent variable slum population as percentage of urban population [slumpct] and the dependent variable infant mortality rate [infmort]. The alpha level of significance for the hypothesis test: 0.05 The criteria for evaluating strength: Cohen’s criteria

1/11/2016Slide 5 These problem also contain a second paragraph of instructions that provide the formulas to use if the analysis requires us to re-express or transform the variable to satisfy the conditions for linear regression.

1/11/2016Slide 6 The first statement asks about the level of measurement. The t-test of a regression slope requires that both the dependent variable and the independent variable be quantitative.

1/11/2016Slide 7 Since both the independent variable slum population as percentage of urban population [slumpct] and the dependent variable infant mortality rate [infmort] are quantitative, we mark the check box for a correct answer.

1/11/2016Slide 8 The first statement asks about the size of the sample. To answer this question, we run the linear regression in SPSS.

1/11/2016Slide 9 To compute a simple linear regression, select Regression> Linear from the Analyze menu.

1/11/2016Slide 10 First, move the dependent variable, infmort, to the Dependent text box. Second, move the independent variable, slumpct, to the Independent(s) list box. Third, click on the Statistics button to request basic descriptive statistics.

1/11/2016Slide 11 First, in addition to the defaults marked by SPSS, mark the check box for Descriptives so that we get the number of cases used in the analysis. Third, click on the Continue button to close the dialog box. Second, click on the Casewise diagnostics check box to produce the table with information about outliers and influential cases.

1/11/2016Slide 12 Next, click on the Plots button to request the residual plot.

1/11/2016Slide 13 Second, move *ZPRED (for standardized predictions) to the Y axis text box. First, move *ZRESID (for standardized residuals) to the Y axis text box. Fourth, click on the Continue button to close the dialog box. Third, mark the check box for a histogram and a normal probability plot of the residuals.

1/11/2016Slide 14 Next, click on the Save button to include Cooks distance in the output.

1/11/2016Slide 15 Click on the Continue button to close the dialog box. Mark the check box for Cook’s Distances to include this value in the data view and the output. Mark the check box for Standardized Residuals, which we will need in the test for the condition of normality of the residuals.

1/11/2016Slide 16 Click on the OK button to request the output.

1/11/2016Slide 17 The number of cases with valid data to analyze the relationship between "slum population as percentage of urban population" and "infant mortality rate" was 99, out of the total of 192 cases in the data set.

1/11/2016Slide 18 The number of cases with valid data to analyze the relationship between "slum population as percentage of urban population" and "infant mortality rate" was 99, out of the total of 192 cases in the data set. Mark the check box for a correct statement.

1/11/2016Slide 19 The next statement asks us to determine whether or not the data for the variables satisfies the conditions required for linear regression. Making inferences about the population based on linear regression requires four conditions or assumptions: a linear relationship between the variables, equal variance of the residuals across the predicted values, no outliers or influential cases distorting the relationship, and a normally distribution for the residuals.

1/11/2016Slide 20 To create the scatterplot, select the Legacy Dialogs > Scatter/Dot from the Graphs menu. To evaluate the linearity condition, we create a scatterplot.

1/11/2016Slide 21 In the Scatter/Dot dialog box, we click on Simple Scatter as the type of plot we want to create. Click on the Define button to go to the next step.

1/11/2016Slide 22 First, move the dependent variable infmort to the Y axis text box. Second, move the independent variable slumpct to the X axis text box. Third, click on the OK button to produce the plot.

1/11/2016Slide 23 The scatterplot appears in the SPSS output window. To facilitate our determination about the linearity of the plot, we will add a linear fit line, a loess fit line, and a confidence interval to the plot. See slides 8 through 18 in the powerpoint titled: SimpleLinearRegression-Part2.ppt for directions on adding the fit lines and confidence interval to the plot.

1/11/2016Slide 24 The criteria we use for evaluating linearity is a comparison of the loess fit line to the linear fit line. If the loess fit line falls within a 99% confidence interval around the linear fit line, we characterize the relationship as linear. Minor fluctuations over the lines of the confidence interval are ignored. The loess fit line in the scatterplot of the relationship between "slum population as percentage of urban population" and "infant mortality rate" does not lie within the confidence interval around the linear fit line. The pattern of points in the scatterplot shows an obvious curve, indicating non-linearity. We will re-express one or both variables if they are badly skewed to see if the relationship using transformed variables satisfies the assumption of linearity.

1/11/2016Slide 25 Since we did not satisfy the linearity condition, the statement is not marked. We do not need to test the other conditions, since we know we will not meet all of them. We will re-express one or both variables if they are badly skewed to see if the relationship using transformed variables satisfies the assumption of linearity.

1/11/2016Slide 26 When the raw data does not satisfy the conditions of linearity and equal variance, we examine the skewness of the variables to identify problematic skewing for one or both variables that might be corrected with re-expression. This statement suggests that the correct transformation should be a log of infant mortality rate. We should re-express variables that have skewness equal to or less than or equal to or greater than +1.0.

1/11/2016Slide 27 We will use the Descriptives procedure to obtain skewness for both variables. Select Descriptive Statistics > Descriptives from the Analyze menu.

1/11/2016Slide 28 First, move the variables infmort and slumpct to the Variable(s) list box. Second, click on the Options button to specify our choice for statistics.

1/11/2016Slide 29 Next, mark the check boxes for Kurtosis and Skewness in addition to the defaults marked by SPSSS. Finally, click on the Continue button to close the dialog box.

1/11/2016Slide 30 Click on the OK button to produce the output.

1/11/2016Slide 31 The skewness for "infant mortality rate" [infmort] was The skewness for "slum population as percentage of urban population" [slumpct] was Since the skew for the dependent variable "infant mortality rate" [infmort] (1.470) was equal to or greater than +1.0, we attempt to correct violation of assumptions by re-expressing "infant mortality rate" on a logarithmic scale. Since the skew for the independent variable "slum population as percentage of urban population" [slumpct] (-0.178) was between -1.0 and +1.0, we do not attempt to correct violation of assumptions by re-expressing it.

1/11/2016Slide 32 Since the skew for the dependent variable "infant mortality rate" [infmort] (1.470) was equal to or greater than +1.0, we attempt to correct violation of assumptions by re-expressing "infant mortality rate" on a logarithmic scale. We mark the statement as correct.

1/11/2016Slide 33 The next statement asks us to determine whether or not the data using the re-expressed variable satisfies the conditions required for linear regression. We check to see if the re-expressed variables satisfy the four conditions or assumptions required to make inferences about the population based on linear regression: a linear relationship between the variables, equal variance of the residuals across the predicted values, no outliers or influential cases distorting the relationship, and a normally distribution for the residuals.

1/11/2016Slide 34 We first create the transformed variable, the logarithm of infmort. Select the Compute Variable command from the Transform menu.

1/11/2016Slide 35 First, type the name for the re-expressed variable in the Target Variable text box. The directions for the problem give us the formula for the transformation: The formulas to transform "infant mortality rate" are "LG10(infmort)" and "(infmort)**2". Second, type the formula in the Numeric Expression text box. Third, click on the OK button to compute the transformation.

1/11/2016Slide 36 Next, we create the scatterplot for the relationship with the re-expressed variable. To create the scatterplot, select the Legacy Dialogs > Scatter/Dot from the Graphs menu.

1/11/2016Slide 37 In the Scatter/Dot dialog box, we click on Simple Scatter as the type of plot we want to create. Click on the Define button to go to the next step.

1/11/2016Slide 38 First, move the dependent variable LG_infmort to the Y axis text box. Second, move the independent variable slumpct to the X axis text box. Third, click on the OK button to produce the plot.

1/11/2016Slide 39 The scatterplot looks linear, but to make sure we will add fit lines and a confidence interval. The criteria we use for evaluating linearity is a visual comparison of the loess fit line to the linear fit line. If the loess fit line falls within the 99% confidence interval around the linear fit line, we characterize the relationship as linear. Minor fluctuations within the confidence interval or over the boundary of the confidence interval are ignored.

1/11/2016Slide 40 The loess fit line in the scatterplot of the relationship between "slum population as percentage of urban population" and the log transformation of "infant mortality rate" lies within the confidence interval around the linear fit line. The relationship is sufficiently linear to satisfy the assumption of linearity.

1/11/2016Slide 41 To compute a simple linear regression, select Regression> Linear from the Analyze menu. We next do the regression analysis using the transformed variable, creating the residual plot and the normality plot in the process.

1/11/2016Slide 42 First, move the dependent variable, LG_infmort, to the Dependent text box. Second, move the independent variable, slumpct, to the Independent(s) list box. Third, click on the Statistics button to request basic descriptive statistics.

1/11/2016Slide 43 First, in addition to the defaults marked by SPSS, mark the check box for Descriptives so that we get the number of cases used in the analysis. Third, click on the Continue button to close the dialog box. Second, click on the Casewise diagnostics check box to produce the table with information about outliers and influential cases.

1/11/2016Slide 44 Next, click on the Plots button to request the residual plot and the normality plot.

1/11/2016Slide 45 Second, move *ZPRED (for standardized predictions) to the Y axis text box. First, move *ZRESID (for standardized residuals) to the Y axis text box. Fourth, click on the Continue button to close the dialog box. Third, mark the check box for a histogram and a normal probability plot of the residuals.

1/11/2016Slide 46 Next, click on the Save button to include Cook’s distance in the output.

1/11/2016Slide 47 Click on the Continue button to close the dialog box. Mark the check box for Cook’s Distances to include this value in the data view and the output. Mark the check box for Standardized Residuals, which we will need to test for the condition of normality of the residuals.

1/11/2016Slide 48 Click on the OK button to request the output.

1/11/2016Slide 49 The criteria we use for evaluating equal variance is a visual inspection of the residual plot to determine whether the horizontal pattern of the points is more rectangular or more funnel shaped, i.e. narrowly spread at one end of the plot and widely spread at the other end. If the plot of the residuals is more rectangular, the assumption of equal variance is satisfied. If the plot of the residuals is more funnel-shaped, the assumption of equal variance is not satisfied.

1/11/2016Slide 50 Because it is often difficult to distinguish when the pattern of the points is rectangular or funnel-shaped, we will supplement the evaluation of equal variance with a diagnostic statistical test: the Breusch-Pagan test. The Breusch-Pagan statistic tests the null hypothesis that the variance of the residuals is the same for all values of the independent variable. When the probability of Breusch-Pagan statistic is less than or equal to alpha, we reject the null hypothesis, supporting a finding that the variance of residuals is different for residuals and we do not satisfy the equal variance assumption.

1/11/2016Slide 51 To use the syntax file downloaded from the course web site, select the Open > Syntax command from the Open menu. Download the syntax file, BreuschPaganSyntax.SPS from the course web site.

1/11/2016Slide 52 Click on the Open button to open the syntax file. Highlight the syntax file, BreuschPaganSyntax.SPS.

1/11/2016Slide 53 The file opens in the SPSS Syntax Editor. The syntax file uses the Data Editor to store its results, creating all of these additional variables. These DELETE command remove the extra variables. If the syntax is run without these variables, SPSS will issue warning messages which have no real consequence. If the file is run more than once, SPSS will generate a number of warning messages that it will not replace variables that were previously, and we may not be looking at the correct results for our problem. We need to replace the names for the dependent and independent variables. Highlight the text for dependentVariableName.

1/11/2016Slide 54 Type the name of the dependent variable, LG_infmort. Highlight the text for independentVariableName.

1/11/2016Slide 55 First, replace the highlighted text with the name of the independent variable. Entering the names of the variables is all that we need to change. To execute the commands in the syntax file, select All from the Run menu second. Note: be careful so that the period at the end of the command lines are not deleted.

1/11/2016Slide 56 Since we had not run the syntax file before, SPSS produces a warning message for each of the variable names on the DELETE commands. It thinks that we are asking it to delete a variable that does not exist and it wants to let us know. These warning messages have no consequence.

1/11/2016Slide 57 The syntax file added all of these variables (and more to the left) to the data editor. The syntax file omits cases with missing data from the analysis.

1/11/2016Slide 58 The interpretation of equal variance based on visual inspection of the residual plot is supported by the Breusch-Pagan statistic of with a probability of p =.069, greater than the alpha of p =.050. The null hypothesis is not rejected, and the assumption of equal variance is supported. The variable bp contains the Breusch- Pagan statistic and the column bpSig contains the p-value for the statistic. Having satisfied the condition for equal variance, we next check for influential cases.

1/11/2016Slide 59 Outliers and influential cases require can alter the regression model that would otherwise represent the majority of cases in the analysis. SPSS will save Cook's distances as a measure of influence to the data editor so we can identify that have a large Cook's distance. We will operationally define a large Cook's distance as a value of 0.5 or more. When we ran the regression using LG_infmort as the dependent variable, we requested that Cook’s distances be saved to the Data Editor and that our output include Casewise diagnostics. In the table titled “Residuals Statistics”, we see that the maximum Cook’s distance was.152, less than the criteria of 0.5. In this problem, there were no cases that had a Cook's distance of 0.5 or greater, qualifying as influential cases. Since we have no outliers or influential cases, we will test the final criteria of normality of the residuals.

1/11/2016Slide 60 The linear regression model expects the residuals to have a normal distribution. The distribution of the residuals is evaluated with the normality plot which compares the points for the actual distribution of the cases to a diagonal line that represents the expected pattern for a normally distributed variable. If the points deviate substantially and consistently from the diagonal line, the residuals are not normally distributed. Minor fluctuations around the line or at either end of the line can be ignored. In this problem, the plot of standardized regression residuals follows the diagonal, indicating that the residuals are normally distributed.

1/11/2016Slide 61 Because it is often difficult to distinguish whether or not points deviate substantially and consistently from the diagonal line, we will supplement the evaluation of normality of residuals with a diagnostic statistical test: the Shapiro-Wilk test. The Shapiro-Wilk statistic tests the null hypothesis that the distribution of the residuals is normal. When the probability of Shapiro-Wilk statistic is less than or equal to alpha, we reject the null hypothesis, supporting a finding that the residuals are not normally distributed and we do not satisfy the assumption of normality.

1/11/2016Slide 62 The normality tests are part of the Explore procedure. Select Descriptive Statistics > Explore from the Analyze menu.

1/11/2016Slide 63 Move the variable ZRE_2 to the Dependent List. The normality statistical tests are included with the plots, so we click on the Plots button. The normal condition requires that the residuals be normally distributed. We saved standardized residuals when we ran the regression. The correct choice is the standardized residuals from the second analysis (ZRE_2), in which we used the transformed variable, LG_infmort. If we had satisfied the regression conditions without re-expressing the data, we would not have run the second regression and would have selected ZRE_1 to test for normality.

1/11/2016Slide 64 Mark the check box for Normality plots with tests. Click on the Continue button to close the dialog box.

1/11/2016Slide 65 Click on the OK button to produce the output.

1/11/2016Slide 66 The interpretation of normal residuals is supported by the Shapiro-Wilk statistic of with a probability of p =.612, greater than the alpha of p =.050. The null hypothesis is not rejected, and the assumption of normal residuals. is supported.

1/11/2016Slide 67 We have satisfied all four of the conditions for making inferences based on linear regression. Mark the check box for a correct answer.

1/11/2016Slide 68 When the p-value for the statistical test is less than or equal to alpha, we reject the null hypothesis and interpret the results of the test. If the p-value is greater than alpha, we fail to reject the null hypothesis and do not interpret the result.

1/11/2016Slide 69 The p-value for this test (p <.001) is less than or equal to the alpha level of significance (p =.050) supporting the conclusion to reject the null hypothesis.

1/11/2016Slide 70 The p-value for this test (p <.001) is less than or equal to the alpha level of significance (p =.050) supporting the conclusion to reject the null hypothesis. Mark the question as correct. Rejection of the null hypothesis supports the research hypothesis and we interpret the results.

1/11/2016Slide 71 Since we know that we re-expressed the data to satisfy the conditions for linear regression, we skip the question that interprets the raw variables.

1/11/2016Slide 72 The final question focuses on the strength and direction of the relationship.

1/11/2016Slide 73 The strength of the relationship is based on the multiple R statistic in the Model Summary table. Applying Cohen's criteria for effect size (less than ±0.10 = trivial; ±0.10 up to ±0.30 = weak or small; ±0.30 up to ±0.50 = moderate; ±0.50 or greater = strong or large), the relationship was correctly characterized as a strong relationship (R =.795). Note: in SPSS output, the R statistic is always positive, so it does not show the direction of the relationship. The direction of the relationship is based on the b coefficient.

1/11/2016Slide 74 Since the sign of the b coefficient was positive (b =.01), the relationship is positive and the values for the variables move in the same direction. Higher scores on the variable "slum population as percentage of urban population" were associated with higher scores on the log transformation of "infant mortality rate".

1/11/2016Slide 75 The strength and direction of the relationship were both correctly stated. The question is marked as correct.

1/11/2016Slide 76 Logic outline for homework problems Both variables are quantitative? Yes Do not mark check box. Mark statement check box. No Mark only “None of the above.” Stop. Number of valid cases stated correctly? Do not mark check box. No Yes Mark statement check box.

1/11/2016Slide 77 Yes No Relationship between variables is linear? Variance of residuals is homogeneous? No outliers impacting regression solution? Residuals are normally distributed? Yes Mark check box for regression conditions Do not mark check box. No Linear pattern in scatterplot Residual plot and Breusch-Pagan test Normality plot and Shapiro-Wilk test or Central Limit Theorem Cook’s distance < 0.5

1/11/2016Slide 78 Skew of variables ≤ - 1.0, ≥ +1.0? Yes Do not mark re-expression check box. No Stop. Re-express badly skewed variables With no skewed variables, we do not have a strategy for meeting conditions. Since we have satisfied regression conditions, the question on re-expressing data is skipped.

1/11/2016Slide 79 Yes No Relationship between variables is linear? Variance of residuals is homogeneous? No outliers impacting regression solution? Residuals are normally distributed? Yes Mark check box for regression conditions Do not mark check box. No Stop. Linear pattern in scatterplot Residual plot and Breusch-Pagan test Normality plot and Shapiro-Wilk test or Central Limit Theorem Cook’s distance < 0.5 We can’t meet conditions. Since we have satisfied regression conditions, we do not re-express and do not check conditions.

1/11/2016Slide 80 Yes Do not mark check box. No Mark statement check box. Reject H 0 is correct decision (p ≤ alpha)? Stop. We interpret results only if we reject null hypothesis. Interpretation is stated correctly? Yes Do not mark check box. Mark statement check box. No The interpretation is stated for both raw data and re-expressed data.