Presentation is loading. Please wait.

Presentation is loading. Please wait.

8.11 Using Statistics To Make Inferences 8 Summary Contingency tables. Goodness of fit test. Saturday, 02 May 20158:05 AM.

Similar presentations


Presentation on theme: "8.11 Using Statistics To Make Inferences 8 Summary Contingency tables. Goodness of fit test. Saturday, 02 May 20158:05 AM."— Presentation transcript:

1 8.11 Using Statistics To Make Inferences 8 Summary Contingency tables. Goodness of fit test. Saturday, 02 May 20158:05 AM

2 8.22 Goals To assess contingency tables for independence. To perform and interpret a goodness of fit test. Practical Construct and analyse contingency tables.

3 8.33 Recall To compare a population and sample variance we employed? χ2χ2 Cc cc

4 8.44 Today The probability approach from last week is employed to tell if “observed” data confirms to the pattern “expected” under a given model.

5 8.55 Categorical Data - Example Assessed intelligence of athletic and non-athletic schoolboys. brightstupidTotal athletic lazy Total K. Pearson “On The Relationship Of Intelligence To Size And Shape Of Head, And To Other Physical And Mental Characters”, Biometrika, 1906, 5, , data on page 144.On The Relationship Of Intelligence To Size And Shape Of Head, And To Other Physical And Mental Characters

6 8.66 Procedure 1.Formulate a null hypothesis. Typically the null hypothesis is that there is no association between the factors. 2.Calculate expected frequencies for the cells in the table on the assumption that the null hypothesis is true. 3.Calculate the chi-squared statistic. This is for an r x c table with entries in row i and column j.

7 8.77 Procedure 4.Compare the calculated statistic with tabulated values of the chi-squared distribution with ν degrees of freedom. ν = (rows ‑ 1)(columns ‑ 1) = (r – 1)(c – 1)

8 8.88 Key Assumptions 1.Independence of the observations. The data found in each cell of the contingency table used in the chi-squared test must be independent observations and non-correlated. 2.Large enough expected cell counts. As described by Yates et al., "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, New York: W.H. Freeman p. 734).

9 8.99 Key Assumptions 3.Randomness of data. The data in the table should be randomly selected. 4. Sufficient Sample Size. It is also generally assumed that the sample size for the entire contingency table is sufficiently large to prevent falsely accepting the null hypothesis when the null hypothesis is true.

10 Example Assessed intelligence of athletic and non athletic schoolboys. Observed brightstupidTotal athletic lazy Total

11 Probabilities The probability a random boy is athletic is The probability a random boy is bright is Assuming independence, the probability a random boy is both athletic and bright is brightstupidTotal athletic lazy Total For 1708 respondents the expected number of athletic bright boys is CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

12 Expected brightstupidTotal athletic lazy560 Total The expected number of athletic bright boys is

13 Expected brightstupidTotal athletic530.98?1148 lazy560 Total The expected number of athletic stupid boys is

14 Expected brightstupidTotal athletic lazy560 Total The expected number of athletic stupid boys is 1148 – =

15 Expected brightstupidTotal athletic lazy?560 Total The expected number of lazy bright boys is

16 Expected brightstupidTotal athletic lazy259.02?560 Total The expected number of stupid lazy boys is

17 Expected brightstupidTotal athletic lazy Total The expected number of stupid lazy boys is 918 – =

18 Expected brightstupidTotal athletic lazy Total

19 χ2χ2 ObservedExpected Only one cell is free.

20 χ2χ2 As a general rule to employ this statistic, all expected frequencies should exceed 5. If this is not the case categories are pooled (merged) to achieve this goal. See the Prussian data later.

21 Conclusion νp=0.1p=0.05p=0.025p=0.01p=0.005p= The result is significant (26.73 > 3.84) at the 5% level. So we reject the hypothesis of independence between athletic prowess and intelligence.

22 SPSS Raw data Notev1 are the row labels v2 are the column labels v3 is the frequency for each cell

23 SPSS Data > Weight Cases Since frequency data has been input, necessary to weight. This is essential, do not use percentages.

24 SPSS Analyze > Descriptive Statistics > Crosstabs Set row and column variables. Frequencies already set.

25 SPSS Select chi-square

26 SPSS Select Observed – input data Expected – output data, under the model

27 SPSS Expected cell frequencies Expected under the model.

28 SPSS Pearson Chi Square is the required statistic Do not report p =.000, rather p <.001 Note Fisher’s exact test, only available in SPSS for 2x2 tables (see next slide). ff

29 What If We Have Small Cell Counts? Fisher's exact test The Fisher's exact test is used when you want to conduct a chi-square test but one or more of your cells has an expected frequency of five or less. Remember that the chi-square test assumes that each cell has an expected frequency of five or more, but the Fisher's exact test has no such assumption and can be used regardless of how small the expected frequency is. In SPSS, unless you have the SPSS Exact Test Module, you can only perform a Fisher's exact test on a 2x2 table, and these results are presented by default.

30 Aside Two dials were compared. A subject was asked to read each dial many times, and the experimenter recorded his errors. Altogether 7 subjects were tested. The data shows how many errors each subject produced. Do the two conditions differ at the 0.05 significance level (give the appropriate p value)? Observed data What key word describes this data?

31 Aside What tests are available for paired data? One sample t test Sign test Wilcoxon Signed Ranks Test CCCCCCCCCcCCCCCCCCCc

32 Aside What tests are available for paired data? What assumptions are made? One sample t test Sign test Wilcoxon Signed Ranks Test normality Resembles the SignTest in scope, but it is much more sensitive. In fact, for large numbers it is almost as sensitive as the Student t-test No assumption of normality

33 Aside What tests are available for paired data? One sample t test Sign test Wilcoxon Signed Ranks Test Sign test answers the question How Often?, whereas other tests answer the question How Much? One sample t test – mean Wilcoxon Signed Ranks Test - median

34 Example The table is based on case-records of women employees in Royal Ordnance factories during The same test being carried out on the left eye (columns) and right eye (rows). Stuart “The estimation and comparison of strengths of association in contingency tables”, Biometrika, 1953, 40, The estimation and comparison of strengths of association in contingency tables

35 Observed HighestSecondThirdLowestTotal Highest Second Third Lowest Total Is there any obvious structure?

36 Expected In general to find the expected frequency in a particular cell the equation is Row total x Column total / Grand total

37 Expected In general to find the expected frequency in a particular cell the equation is Row total x Column total / Grand total So for highest right and bottom left the equation becomes 1976 x 1907 / 7477 =

38 Expected HighestSecondThirdLowestTotal Highest503.98?1976 Second?2256 Third?2456 Lowest????789 Total Row total x Column total / Grand total 1976 x 1907 / 7477 =

39 Expected HighestSecondThirdLowestTotal Highest ?1976 Second ?2256 Third ?2456 Lowest????789 Total Row total x Column total / Grand total

40 Expected HighestSecondThirdLowestTotal Highest ?1976 Second ?2256 Third ?2456 Lowest????789 Total The missing values are simply found by subtraction

41 Expected HighestSecondThirdLowestTotal Highest ?1976 Second Third Lowest789 Total – – – =

42 Expected HighestSecondThirdLowestTotal Highest Second Third Lowest789 Total – – – =

43 Expected HighestSecondThirdLowestTotal Highest Second ?2256 Third ?2456 Lowest????789 Total Similarly for the remaining cells

44 Expected HighestSecondThirdLowestTotal Highest Second Third Lowest Total

45 Short Cut Contributions to the χ 2 statistic, for the top left cell the contribution is

46 Conclusion ν p=0.1p=0.05p=0.025p=0.01p=0.005p= The above statistic makes it very clear that there is some relationship between the quality of the right and left eyes. For the top left cell only. Nine cells are free.

47 HighestSecondThirdLowestTotal Highest Second Third Lowest Total8097 Total χ 2

48 Conclusion ν p=0.1p=0.05p=0.025p=0.01p=0.005p= The above statistic makes it very clear that there is some relationship between the quality of the right and left eyes. For all cells. Nine cells are free.

49 SPSS Raw data

50 SPSS Expected cell frequencies

51 SPSS Pearson Chi Square is the required statistic

52 Poisson Distribution The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. Typical applications are to queues/arrivals. The number of phone calls received per day. The occurrence of accidents/industrial injuries. More exotically, birth defects and the number of genetic mutations. The occurrence of rare diseases.

53 Poisson Distribution 1discrete events which are independent. 2events occur at a fixed rate λ per unit continuum. ( λ lambda)

54 Poisson Distribution x successes e is approximately equal to λ is the rate per unit continuum the mean is λ the variance is λ

55 Casio 83ES exp or “e” exp(1) = exp(2) = Its inverse, on the same key is ln, so ln( ) = 1 ln( ) = 2

56 Alternate applications A similar approach may be employed to test if simple models are plausible.

57 χ 2 Goodness of Fit Test The degrees of freedom are ν = m – n – 1, where there are m frequencies left in the problem, after pooling, and n parameters have been fitted from the raw data. For example…

58 Example The number of Prussian army corps in which soldiers died from the kicks of a horse in a year. Typical “industrial injury” data

59 Which distribution is appropriate? Is the data discrete or continuous? Discrete, since a simple count ccccccccccccccccccccccc

60 Check list of distributions DiscreteContinuous BinomialNormal PoissonExponential

61 Check list of distribution parameters DiscreteContinuous BinomialNormal PoissonExponential n p μ σ2μ σ2 λ cccccccccccccccccccccccccc Discrete, no “n” implies Poisson ccccccc λcccccccccccccccccccccccccc

62 Observed Data Number deaths in a corps Observed frequency (O i ) or more0 Total280 We need to estimate the Poisson parameter λ. Which is the mean of the distribution.

63 Observed Data Number deaths in a corps Observed frequency (O i ) or more0 Total280

64 Mean ccccccccccccccccccccc

65 Expected Number deaths in a corps Poisson model Expected probability or moreBy subtraction? Total11 λ = 0.7 and “e” is a constant on your calculator

66 Expected Number deaths in a corps Poisson model Expected probability or moreBy subtraction Total11

67 Expected Frequency Expected frequency for no deaths 280 x = Number deaths in a corps Expected probability Expected frequency (E i ) or more Total1

68 Expected Frequency Expected frequency for remaining rows 280 × probability = frequency Number deaths in a corps Expected probability Expected frequency (E i ) or more Total1280 Note the two expected frequencies less than 5!

69 χ 2 Calculation Number deaths in a corps Observed frequency (O i ) Expected frequency (E i ) or more Total Pool to ensure all expected frequencies exceed 5

70 Conclusion Here m (frequencies) = 4, n (fitted parameters) = 1 then ν = m – n – 1 = 4 – 1 – 1 = 2 ν p=0.1p=0.05p=0.025p=0.01p=0.005p= The hypothesis, that the data comes from a Poisson distribution would be accepted (5.991 > 1.95).

71 Next Week Bring your calculators next week

72 Read Read Howitt and Cramer pages Read Howitt and Cramer (e-text) pages Read Russo (e-text) pages Read Davis and Smith pages

73 Practical 8 This material is available from the module web page. Module Web Page

74 Practical 8 This material for the practical is available. Instructions for the practical Practical 8 Material for the practical Practical 8

75 Assignment 2 You will find submission details on the module web site module web site Note the dialers lower down the page give access to your individual assignment. It is necessary to enter your student number exactly as it appears on your smart card.

76 Assignment 2 As a general rule make sure you can perform the calculations manually. It does no harm to check your calculations using a software package. Some software employ non-standard definitions and should be used with caution.

77 Assignment 2 All submissions must be typed.

78 Whoops! Researchers at Cardiff University School of Social Science claim errors made by the Hawk- Eye line - calling technology can be greater than 3.6mm - the average error quoted by the manufacturers. Teletext, p June 2008

79 Whoops! Kate Middleton 'marries Prince Harry' on souvenir mug The Telegraph - Thursday 17 March 2011

80 Whoops! Poldark - BBC - 8 March 2015


Download ppt "8.11 Using Statistics To Make Inferences 8 Summary Contingency tables. Goodness of fit test. Saturday, 02 May 20158:05 AM."

Similar presentations


Ads by Google