 Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp. 463-482; 485.

Presentation on theme: "Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp. 463-482; 485."— Presentation transcript:

Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp. 463-482; 485

Elements of a test of hypotheses 3 Hypothesis testing: Process for finding out whether we can generalize about an association from a sample to a population Null hypothesis : (H_0) Represents the status quo to the party performing the sampling experiment, i.e., will be accepted unless the data provides convincing evidence it is false. Research hypothesis: (H_1) (aka alternative hypothesis) Will be accepted only if the data provides convincing evidence of its truth Homework: Skills 1, p. 464

Process of Hypothesis Testing 5 Step 1: Specify a research hypothesis and a null hypothesis Step 2: Compute the value of a test statistic for the relationship Step 3: Calculate the degrees of freedom for the variables involved Step 4: Look up the distribution for the test statistic to find its critical value at a specified level of probability (to determine the likelihood that a test stat. of a particular value could have occurred by chance alone) Step 5: Decide whether to reject the null hypothesis

Null Hypothesis 3 Null Hypothesis(H_0): speculates there is no association between the two variables. Examples: – H_0: men are no different from women in there political affiliations – H_0: There is no relationship between a respondent’s educational level and his or her parents – H_0: Older people are no more likely to be happy than younger people This is the only hypothesis that can actually be tested- we either reject or fail to reject the null hypothesis EX: H_0: There is no association between age and happiness among American adults; hw/ read p. 466

2 Statistical Independence Statistical Independence: Two variables are statistically independent when changes in one variable (age of respondents) have nothing to do with changes in a second (happiness), ie, they vary independently of one another Conversely, when two variables are statistically dependent on one another, changes in one variable are associated with changes in a second variable.,ie, changes in age(older respondents) are associated with changes in levels of happiness (more happiness)

2 Statistical Independence and hypothesis testing Ex/ Null Hypothesis: Age is statistically independent of happiness, ie, differences among respondents to the variable age are unrelated to any differences in their levels of reported happiness – Hyp. Testing: can assess the likelihood that the degree of statistical indep found in the sample is due to chance – If we find that the degree of statistical indep found in the sample is not likely to be due to chance, null hyp is rejected – If it is likely due to chance, null hyp is accepted

3 Type I and Type II Errors “Mistakes “ arising from whether a given sample may or may not be representative of a population If a Null Hypothesis assumes there is no association between two variables, and we reject it even though there is no association is a Type I error, i.e, we call someone a liar when he is telling the truth If a Null Hypothesis assumes there is no association between two variables, and we accept it even though there is an association is a Type II error, i.e., we say someone is truthful when he is lying

3 Type I and Type II Errors ConclusionH_0 trueH_1 true H_0 trueCorrect decisionType II error H_1 trueType I errorCorrect decision

3 Elements of a Test of Hypothesis Null Hypothesis (H_0): a theory about one of the population parameters. The theory generally represents the status quo, which must be proven false Research Hypothesis (H_1): a theory that contradicts the null hypothesis. The theory generally represents the truth that will be accepted only if there is evidence Test statistic: Sample statistic used to decide whether to reject the null hypothesis

3 Elements of a Test of Hypothesis (cont) Rejection region: The numerical values of the test statistic for which the null hypothesis will be rejected. The rejection region is chosen so that the probability is  that it will contain the test statistic when the null hypothesis is true, thereby leading to a Type I error. The value of  chosen is usually small (e.g., 0.01,0.05, or 0.1), and is referred to as the level of significance of the test. A 0.05 (or 5%) level of significance indicates that there is a 5% chance that we would reject the hypothesis when we should not, or we have 95% confidence that we have made the right decision Assumptions: Clear statement(s) of any assumptions made about the population(s) being sampled

Experiment and calculation of test statistic Conclusion: – If the numerical value of the test statistic falls in the rejection region, we reject the null hypothesis and conclude that the research hypothesis is true. We know that hypothesis testing will led to this conclusion incorrectly (Type I Error) 100  % of the time when H_0 is true. – If the test statistic does not fall in the rejection region, we do not reject H_0. Thus we reserve judgment about which hypothesis is true. We do not conclude that the null hypothesis is true because we do not, in general, know the probability that our test procedure will lead to an incorrect failure to reject H_0 (Type II Error) Elements of a Test of Hypothesis (cont)

5 Chi-Square Formula 12.1 Observed vs. Expected: Roll a die 6 times, get three 3’s—observed; expected: one 3 Pp.469-71 + skills: Filling in the table of expected values Skills 3,4: Excel Generally, the greater the value of chi-square, the more statistical dependence between two variables

3 Chi-Square/degrees of freedom We are using observations from a sample as well as certain population parameters. If these parameters are unknown,they must be estimated from the sample. Degrees of Freedom ( ): the number N of independent observations in the sample (ie, sample size) minus the number k of population parameters which must be estimatede from sample observations = N – k When working with a contingency table, df=(r-1)(c-1), where r and c are the number of rows and columns (resp) in the contingency table

Chi squared example—generate random digits 250 times digit 0123456789 obs freq 17312918142035302036 Exp freq 25

Chi squared example—generate random digits 250 times Question: Does the observed frequency differ from the expected distribution in a significant way? digit 0123456789 obs freq 17312918142035302036 Exp freq 25

3 Chi-Squarerandom digit example  ^2 = (17-25)^2/25 + (31-25)^2/25 + (29-25)^2/25 + … + (36-25)^2/25= [excel] 23.3 Degrees of freedom: 10-1=9 Table, p. 545  ^2 at.99 is 21.7; 23.3> 21.7, so the observed frequency differs from the expected frequency at the 0.01 level of significance, so the table of “random” numbers is somewhat doubtful

3 Chi-Squarequestion 200 tosses of a fair coin, 115 heads, 85 tails. Test the hypothesis that the coin is fair using (a) 0.05, (b) 0.01 levels of significance Ans: Df=2-1=1 (2 for H,T) O1=115, O2=85; E1=E2=100  2=(115-100)^2/100 + (85-100)^2/100 = 4.5 (a)  2 table for.95 is 3.84; 4.5>3.84, so reject hyp that coin is fair at the 0.05 level of significance (b)  2 table for.99 is 6.63; 4.5<6.63, so cannot reject hyp that coin is fair at the 0.01 level of significance

Interpreting Chi Square 4 When hypothesizing about an association between two variables, chi-square tells the likelihood that the degree of statistical dependence observed is simply the luck of the draw A p value of 0.05 tells that there are no more than 5 chances in 100 that the statistical dependence is due to chance. Thus, there are 95 chances in 100 that the statistical dependence found is not due to chance, so the null hypothesis, ie., no association between variables, is rejected The higher the value of p, the less likely we are to make a Type I error bility

Interpreting Chi Square 4 When hypothesizing about an association between two variables, chi-square tells the likelihood that the degree of statistical dependence observed is simply the luck of the draw A p value of 0.05 tells that there are no more than 5 chances in 100 that the statistical dependence is due to chance. Thus, there are 95 chances in 100 that the statistical dependence found is not due to chance, so the null hypothesis, ie., no association between variables, is rejected The higher the value of p, the less likely we are to make a Type I error bility

Interpreting Chi Square 4 P. 480-81: Table 12.4 (p. 472) has  ^2 = 15.487, =6 The higher the  ^2 value, the less likely it is that the value obtained is due to chance. (read table 12.9, p. 481) Rule of thumb: reject null hypothesis when  ^2 reaches 0.05—only 5 chances in 100 that the dependence is due to chance Skills7, p. 481 Skills 8, p. 485 (following their example, p. 484)

4 Homework/ p. 492/ 1,3 P 494/ spss 1,2

Download ppt "Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp. 463-482; 485."

Similar presentations