Presentation is loading. Please wait. # Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.

## Presentation on theme: "Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis."— Presentation transcript:

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 14.1 Goodness-of-fit Tests When Category Probabilities Are Completely Specified

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Terminology A binomial experiment consists of a sequence of independent trials in which each trial can result in one of two possible outcomes. A multinomial experiment generalizes a binomial experiment by allowing each trial to result in one of k outcomes, where k is an integer greater than 2.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Multinomial Experiment The expected number of trials resulting in category i is E(N i ) = np i. When H 0 :p 1 = p 10,…,p k = p k0 is true, these expected values become E(N 1 ) = np 10,…, E(N k ) = np k0.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Recall: Chi-squared Critical Value Let, called a chi-squared critical value, denote the number of the measurement axis such that of the area under the chi-squared curve with v df lies to the right of

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Notation Illustrated

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Multinomial Experiment Provided that np i 5 for every i, the random variable has approximately a chi-squared distribution with k – 1 df.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Test With Significance Level Null hypothesis H 0 : p 1 = p 10,…,p k = p k0 Alternative hypoth H a : at least one Test statistic value: Rejection region:

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. P-Values for Chi-Squared Tests The P-value for an upper-tailed chi- squared test is the area under the curve to the right of the calculated

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. When the p i ’s Are Functions of Other Parameters Frequently the p i ’s are hypothesized to depend on a smaller number of parameters (m < k). Then a specific hypothesis involving the yields specific p i0 ’s, which are then used in the test.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. When the Underlying Distribution is Continuous Let X denote the variable being sampled. The hypothesized pdf is f 0 (x). Subdivide the measurement scale of X into k intervals [a 0, a 1 ),…,. The cell probabilities specified by H 0 are

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 14.2 Goodness of Fit for Composite Hypotheses

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. When Parameters Are Estimated The null hypothesis states that each p i is a function of a small number of parameters with the otherwise unspecified. H a : the hypothesis is not true

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. For general k, the joint distribution of N 1,…,N k is the multinomial distribution with Joint Distribution when H 0 is true this becomes

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Method of Estimation Let n 1,…,n k denote the observed values of N 1,…,N k. Then are those values of the that maximize are the maximum likelihood estimators of

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Theorem Under general “regularity” conditions on and the ’s if are estimated by the method of maximum likelihood as described previously and n is large, has approximately a chi-squared distribution with k – 1 – m df when H 0 is true.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Level Test An approximate level test of H 0 versus H a is then to reject H 0 if. In practice, the test can be used if for every i.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Degrees of Freedom A general rule of thumb for degrees of freedom in a chi-squared test is

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Test Procedure

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Let be the maximum likelihood estimators of based on the full sample X 1,…,X n and let denote the statistic based on these estimators. Then the critical value that specifies a level upper-tailed test satisfies Goodness of Fit for Discrete Distributions

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Goodness of Fit for Continuous Distributions The chi-squared test can be used to test whether the sample comes from a specified family of continuous distributions. Once the cells are chosen (independent of the observations) it is usually difficult to estimate unspecified parameters from the observed cell counts, so mle’s based on the full sample are computed.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Special Test for Normality H 0 : the population distribution is normal H a : the pop. distribution is not normal versus consists of rejecting H 0 when and Compute r for the pairs (x (1),y 1 ),…,(x (n),y n ). The Ryan-Joiner test of

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 14.3 Two-Way Contingency Tables

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Data With Counts or Frequencies 1. There are I populations of interest, each corresponding to a different row of the table, and each population is divided into the same J categories. A sample is taken from the ith population, and the counts are entered in the cells in the ith row of the table.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Data With Counts or Frequencies 2. There is a single population of interest, with each individual in the population categorized with respect to two different factors. There are I categories associated with the first factor and J categories associated with the second factor.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. n 11 n 12 n1jn1j n1Jn1J n 21 ni1ni1 n ij nI1nI1 n IJ Two-Way Contingency Table

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Estimated Expected Counts Under H 0 (Homogeneity)

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Null hypothesis: H 0 : p 1j = p 2j =…= p Ij Alternative hypoth.: H a : H 0 is not true Test statistic value: Rejection region: Test for Homogeneity Apply as long as

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Multinomial Experiment Provided that np i 5 for every i, the random variable has approximately a chi-squared distribution with k – 1 df.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Estimated Expected Counts (Independence)

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Null hypothesis: H 0 : Alternative hypoth.: H a : H 0 is not true Test statistic value: Rejection region: Test for Independence Apply as long as

Similar presentations

Ads by Google