# Model Selections and Comparisons (Categorical Data Analysis, Ch 9.2) Yumi Kubo Alvin Hsieh Model 1 Model 2.

## Presentation on theme: "Model Selections and Comparisons (Categorical Data Analysis, Ch 9.2) Yumi Kubo Alvin Hsieh Model 1 Model 2."— Presentation transcript:

Model Selections and Comparisons (Categorical Data Analysis, Ch 9.2) Yumi Kubo Alvin Hsieh Model 1 Model 2

Survey Data 1992 by Wright State University School of Medicine and United Health Services in Dayton, Ohio 2276 students in the last year of high school (nonurban area) We add more dimensions to 8.2.4 Variables: Alcohol (A), Cigarette (C), Marijuana (M) Added variables: Gender (G), Race (R)

Association Graphs (Definitions) association graph - set of vertices, each vertex is a variable edge - conditional association between 2 variables path - sequence of edges leading from one variable to another

Association Graphs (Saturated) M A C R G Variable Conditional Association M R G Path

Association Graphs (Reduced) M AC R G

Data Set Marijuana Use ========================================================== Race = White Race = Other ============================ ========================== FemaleMaleFemaleMale AlcoholCigaretteyesnoyesnoyesnoyesno yesyes40526845322823233019 no1321828201219118 noyes1171170118 no11171133012017

SAS Program Too large to place here: Go to survey.sas

R Program survey<-data.frame(expand.grid(cigarette=c("Yes","No"), alcohol=c("Yes","No"), marijuana=c("Yes","No"), gender=c("female","male"), race=c("white","other") ), count=c(405,13,1,1,268,218,17,117,453,28,1,1,228,201,17, 133,23,2,0,0,23,19,1,12,30,1,1,0,19,18,8,17)) library(MASS) fit.GR<-glm(count~. + gender*race, data=survey, family=poisson) # mutual independence + GR fit.homog.assoc<-glm(count~.^2, data=survey, family=poisson) # homogeneous association fit.3fact<-glm(count~.^3, data=survey, family=poisson) # all three factor terms summary(res<-stepAIC(fit.homog.assoc, scope= list(lower = ~ + cigarette + alcohol + marijuana + gender*race), direction="backward")) fit.AC.AM.CM.AG.AR.GM.GR.MR<-res fit.AC.AM.CM.AG.AR.GM.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR.MR, ~. - marijuana:race) fit.AC.AM.CM.AG.AR.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR, ~. - marijuana:gender) Original codes (modified below): http://math.cl.uh.edu/~thompsonla/RCode.txt

R Program (P-values) 1-pchisq((15.8-15.3),1) 1-pchisq((16.7-15.8),1) 1-pchisq((19.9-16.7),1) 1-pchisq((28.8-19.9),1) 1-pchisq((40.3-28.8),1)

Model Selection 1.Select an Alpha level (default to use 0.05) 2.Look at the P-values of the model Use (in R): 1-pchisq(G 2, df) 3.Stop selecting once you reach the Alpha in (1) 4.Model 1: G+R+A+C+M+GR 5.Model 2: G+R+A+C+M+GR+(all pairs)

Model Selection (Continued) 6.Model 3: G+R+A+C+M+GR+(all pairs)+(all 3 factors) 7.Model 4g: lowest change in G 2, taking out CR 8.Model 5: lowest change in G 2, taking out CG 9.Model 6: lowest change in G 2, taking out MR 10.Model 7: lowest change in G 2, taking out GM 11.Consider: A+C+M+AC+AM+CM

Goodness-of-Fit tests (Table 9.2) Model (G-Gender, R-Race, A-Alcohol, C-Cigarette, M-Marijuana)G2G2 df 1. Mutual independence + GR1325.125 2. Homogeneous association15.316 3. All three-factor terms5.36 4a. (2) - AC201.217 4b. (2) - AC107.017 4c. (2) - AC513.517 4d. (2) - AC18.717 4e. (2) - AC20.317 4f. (2) - AC16.317 4g. (2) - AC15.817 4h. (2) - AC25.217 4i. (2) - AC18.917 5. (AC, AM, CM, AG, AR, GM, GR, MR)16.718 6. (AC, AM, CM, AG, AR, GM, GR)19.919 7. (AC, AM, CM, AG, AR, GR)28.820

Thank You! Any Questions???

Download ppt "Model Selections and Comparisons (Categorical Data Analysis, Ch 9.2) Yumi Kubo Alvin Hsieh Model 1 Model 2."

Similar presentations