Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi–squared Tests for Ordinal and Nominal data 1.

Similar presentations


Presentation on theme: "Chi–squared Tests for Ordinal and Nominal data 1."— Presentation transcript:

1 Chi–squared Tests for Ordinal and Nominal data 1

2 2 Techniques to summerize data 1.One variables –univaraite methods 2.Two variables –bivariate methods Graphical displays Two interval variables –scatter plot Two categorical variables –clustered bar chart More than two variables –graphical displays are hard

3 3 Observations can be taken 1.At the same time –cross sectional data Market surveys: eg. brand preferences of 100 people, etc. 2.At successive times repeatedly –time series data Price of a certain stock over the last 5 years Note: succession can be in space too. But we omit such discussions

4 Describing Relationship between Two Nominal/Ordinal Variables Contingency / cross–classification / cross–tabulation table is used to describe (two or more) nominal variables Ex: Are the profession and newspaper reading habbits related? A sample of people are asked about their professions and newspaper preferences PersonOccupationNewpaper 1White-collarPost 2White-collarSun 3ProfessionalSun.. 354Blue-collarMail 4 Occ Newsp WCBCProTotal Globe 27 29 33 89 Mail 18 43 51112 Post 38 21 22 81 Sun 37 15 20 72 Total 120 108126354

5 5 Occupation Newspaper WCBCPro Globe27/120=0.23 29/108=0.27 33/126=0.26 Mail18/120=0.15 43/108=0.40 51/126=0.40 Post38/120=0.32 21/108=0.19 22/126=0.17 Sun 37/120=0.31 15/108=0.14 20/126=0.16 Relative frequencies

6 Time seires data Observations are repeated at successive times Ex: Total amount of taxed collected (in billions, US$) from year 1993 to 2002 in USA. 6 YearTax 1993 594 1994 625 1995 686 1996 755 1997 848 1998 940 19991032 20001137 20011178 20021038

7 7 Chi–squared Goodness–of–fit test 1.Binomial Experiment: A nomial variable has two outcomes Eg: Do the majority of people like new economic policies or not? 2.Multinomial Experiment: For a nominal variable that has three or more outcomes, we test more than two proportions Eg: Do the people have equal preferences on five brands of tea? Note: Multinomial cases can be reduced to binomial case sometimes!

8 Example 100 persons took part in a survey about different brands of coffee. Each of the persons tasted four different kinds of coffee (in a blind test), and noted which one they liked the best. The result of the test is as follows: Sort:EllipsGexusLuberLoflia Number of persons 26281630 8

9 Does the result of the survey show that any of the brands are more popular than the others, or are they all equal? In statistical terms we can formulate the problem as: Null hypothesis: All the coffee brands are equally popular. Alternative hypothesis: All the coffee brands are not equally popular. 9

10 If the null hypothesis is true, we could expect the following result of the survey: Can we with a significance level of 5% say anything about whether the null hypothesis is true or not. Brand:EllipsGexusLuberLoflia Number of persons 10

11 One way of measuring how much the observed table differs from the expected table is to look at the differences: 11

12 However, there is a problem with the fact that the difference between 10 and 20 is relatively larger than the difference between 10000 and 10010. How can we take this into account? Divide with the expected value and formulate a test statistic: 12

13 If the null hypothesis is true, ought to be close to zero. Is 4.64 so far away from zero that we can reject the null hypothesis? What is the sampling distribution for if the null hypothesis is true? 13

14 Chi-squared Chi-squared has two meanings: 1.A continuous distribution: -distribution 2.A statistical test where the sampling distribution for the test statistic is - distributed. 14

15 Chi-squared Distribution The distribution is a parametric distribution with the parameter v which is called the degrees of freedom. The distribution looks different for different degrees of freedom. Larger the v, the distribution is more symmetric and larger the expected value and standard deviation. 15

16 16 Tabulated values

17 Chi–squared Goodness–of–fit test A test to see if a variable with two or more possible categories has a specific distribution. (Do the observed frequencies in different categories align with what we can expect from some theory?) 17

18 Chi–squared Goodness–of–fit test Formulate null and alternative hypotheses Compute the expected frequencies if the null hypothesis is true (expected counts) Note the observed frequencies (how many) Use the difference between the expected and the observed values and compute the value of the - statistic. Compare your value with the critical value of or compare the p-value with your level of significance. 18

19 Chi–squared test of independence Test if two variables (with one or more categories) are independent For two nominal variables chi–squared test of a contingency table (Pearson’s Chi– squared test) 19

20 The 13 first weeks of the season, the TV watchers on Saturday evenings were distributed as follows: SVT128%SVT225% TV318%TV429% After a change of the TV program presentation, a sample of 300 households was taken and the following numbers were observed: SVT1 70 households SVT2 89 households TV3 46 householdsTV4 95 households Has the change in the TV program presentation changed the pattern of TV watchers? 20

21 Eg: Bike helmets A study was done to investigate whether the usage of bicycle helmets is an effective way to protect people in bicycle accidents from skull damage. 793 persons participated in the study, with the following results: Observed frequency table Used Helmet Damaged skull YesNoTotal Yes 17218 235 No130428 558 Total147646 793 21

22 We want to test: Null hypothesis: The amount of skull damages is the same no matter a person in an accident is using a helmet or not (no relationship) Alternative hypothesis: The amount of skull damages is different for those who use helmets and those who don’t. Formally, H 0 : Helmet use and skull damage are independent in accidents H 1 : They are dependent 22

23 We compute the expected value if the null hypothesis is true and perform a Chi-square test: Expected frequncy table Used helmet Damaged skull YesNo Yes235·147/793 =43,6 235·646/793 =191,4 235 No558·147/793 =103,4 558·646/793 =454,6 558 Total 147 646 793 23

24 We compare the observed table with the expected one. If the tables differ much we will reject the null hypothesis. Then we have empirical evidence that there can be a dependency between the variables. 24

25 If the null hypothesis is true we would get a value close to zero. Is 28,57 so far away from zero that we can reject the null hypothesis? We compare are observed value with the critical value. We can also compare our observed p-value with our significance level. 25


Download ppt "Chi–squared Tests for Ordinal and Nominal data 1."

Similar presentations


Ads by Google