Presentation is loading. Please wait.

Presentation is loading. Please wait.

Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.

Similar presentations


Presentation on theme: "Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre."— Presentation transcript:

1 Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Mahrita.Harahap@uts.edu.au Visit the Maths Study Centre 11am-5pm CB04.03.331 This presentation is viewable on http://mahritaharahap.wordpress.com/teaching-areas/ Check out the www.khanacademy.org website

2 Statistical inference is the process of drawing conclusions about the entire population based on information in a sample by: constructing confidence intervals on population parameters or by setting up a hypothesis test on a population parameter

3 Regression The linear regression line characterises the relationship between two quantitative variables. Using regression analysis on data can help us draw insights about that data. It helps us understand the impact of one of the variables on the other. It examines the relationship between one independent variable (predictor/explanatory) and one dependent variable (response/outcome). The linear regression line equation is based on the equation of a line in mathematics. β0+β1Xβ0+β1X

4 X: Predictor Variable Explanatory Variable Independent Variable Variable one can control. Y: Outcome variable Response Variable Dependent Variable The outcome to be measured/predicted.

5

6 General: Hypothesis Testing We use hypothesis testing to infer conclusions about the population parameters based on analysing the statistics of the sample. In statistics, a hypothesis is a statement about a population parameter. 1. The null hypothesis, denoted H 0 is a statement or claim about a population parameter that is initially assumed to be true. No “effect” or no “difference”. Is always an equality. (Eg. H 0 : population parameter=hypothesised null parameter) 2. The alternative hypothesis, denoted by H 1 is the competing claim. What we are trying to prove. Claim we seek evidence for. (Eg. H 1 : population parameter ≠ or hypothesised null parameter) 3. Test Statistic: a measure of compatibility between the statement in the null hypothesis and the data obtained. 4. Decision Criteria: The P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed sample value given H 0 is true. If p-value≤0.05 reject H o If p-value>0.05 do not reject H o 5. Conclusion: Make your conclusion in context of the problem.

7 Hypothesis Test for Correlation Coefficient Correlation is not significant. Correlation is significant. Correlation measures the strength of the linear association between two variables. Sample Correlation: r=

8 Interpretations of slope INTERPRETATION OF THE SLOPE The slope ‘β’ represents the predicted change in the response variable y given a one unit increase in the explanatory variable x. As the “independent variable” increases by 1 unit, the predicted “dependent variable” increases/decreases by β units on average. Y= a+β * X

9 H 0 : β=0. There is no association between the response variable and the independent variable. (Regression is insignificant) y= α + 0*X H 1 : β≠0. The independent variables will affect the response variable. (Regression is significant) y= α + βX If p-value≤ 0.05. We reject H0. There is evidence that β≠0, which means that the independent variable is an effective predictor of the dependent variable, at the 5% level of significance. If p-value >0.05. We do not reject Ho. There is no evidence that β≠0, which means that the independent variable is NOT an effective predictor of the dependent variable, at the 5% level of significance. Hypothesis Test for Slope

10 Confidence Interval for Slope

11 11

12 Coefficient for determination R 2 R-squared gives us the proportion of the total variability in the response variable (Y) that is “explained” by the least squares regression line based on the predictor variable (X). It is usually stated as a percentage. — —Interpretation: On average, R 2 % of the variation in the dependent variable can be explained by the independent variable through the regression model.

13

14

15 Confidence Intervals and Prediction Intervals The key point is that the prediction interval tells you about the distribution of values, not the uncertainty in determining the population mean. Prediction intervals must account for both the uncertainty in knowing the value of the population mean, plus data scatter. So a prediction interval is always wider than a confidence interval.

16

17

18 REVISION

19 Statistical inference is the process of drawing conclusions about the entire population based on information in a sample by: constructing confidence intervals on population parameters or by setting up a hypothesis test on a population parameter

20

21 General: Hypothesis Testing We use hypothesis testing to infer conclusions about the population parameters based on analysing the statistics of the sample. In statistics, a hypothesis is a statement about a population parameter. 1. The null hypothesis, denoted H 0 is a statement or claim about a population parameter that is initially assumed to be true. No “effect” or no “difference”. Is always an equality. (Eg. H 0 : population parameter=hypothesised null parameter) 2. The alternative hypothesis, denoted by H 1 is the competing claim. What we are trying to prove. Claim we seek evidence for. (Eg. H 1 : population parameter ≠ or hypothesised null parameter) 3. Test Statistic: a measure of compatibility between the statement in the null hypothesis and the data obtained. 4. Decision Criteria: The P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed sample value given H 0 is true. If p-value≤0.05 reject H o If p-value>0.05 do not reject H o 5. Conclusion: Make your conclusion in context of the problem.

22 Hypothesis Testing for a single mean Ho: μ=null parameter Ha: μ≠null parameter or μ null parameter Test Statistic: If p-vale<0.05. We reject the H0. Conclude that we have enough evidence to prove the alternative hypothesis is true at the 5% level of significance. If p-vale≥0.05. We reject the H0. Conclude that we have enough evidence to prove the alternative hypothesis is true at the 5% level of significance.

23 Hypothesis Testing for Difference in Means (2 independent samples)

24 Hypothesis Testing for Paired Difference in Means

25 Hypothesis Testing for 1 Proportion

26 Hypothesis Testing for Difference in 2 Proportions

27 Hypothesis Testing for a Single Categorical Variable (Goodness of Fit test) H o : Hypothesised proportions for each category p i =…. H a : At least one p i is different. Test Statistic: Calculate the expected counts for each cell as np i. Make sure they are all greater than 5 to proceed. Calculate the chi-squared statistic: Find p-value as the area in the tail to the right of the chi- squared statistic (always select right tail) for a chi-squared distribution with df=(# of categories-1) and compare to significance level α=0.05. If p-value< α, reject H o. Conclude that we have evidence to prove the alternative is true at the α % level of significance. If p-value≥ α, do not reject H o. Conclude that we do not have enough evidence to prove the alternative is true at the α % level of significance. α is 5% by default unless stated otherwise

28 Hypothesis Testing for an Association between two categorical variables H o : The two variables are not associated H a : The two variables are associated Test Statistic: Calculate the expected counts for each cell as (row total*column total)/n. Make sure they are all greater than 5 to proceed. Calculate the chi-squared statistic: Find p-value as the area in the tail to the right of the chi- squared statistic (always select right tail) for a chi-squared distribution with df=(r-1)*(c-1) and compare to significance level α=0.05. If p-value< α, reject H o. Conclude that we have evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance. If p-value≥ α, do not reject H o. Conclude that we do not have enough evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance. α is 5% by default unless stated otherwise

29 Hypothesis Testing for the difference in means of more than two samples (ANOVA – Analysis of Variance test) Condition met when the ratio of the largest standard deviation to the smallest is less than 2. The assumption of equal variances hold and therefore it is appropriate to use the ANOVA table when testing the difference in the means. H o : μ 1 =μ 2 =μ 3 or H o : The means are equal to each other H a : μ 1 ≠μ 2 ≠μ 3 or H o : At least one mean is different Construct an ANOVA table to calculate the F-Test Statistic based on your sample data: F-statistic=MSG/MSE Find p-value as the area in the tail to the right of the F-statistic (always select right tail) for a F distribution with df=(k-1)/(n-k) where k=no of categories and n=no of samples and compare to significance level α=0.05. If p-value< α, reject H o. Conclude that we have evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance. If p-value≥ α, do not reject H o. Conclude that we do not have enough evidence to prove the alternative hypothesis (in context of the question) is true at the α % level of significance. α is 5% by default unless stated otherwise


Download ppt "Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre."

Similar presentations


Ads by Google