Using Modern Missing Data Analyses for effective inference about Hunters’ satisfaction towards OFW Program Muhammad Imran Khan
Motivation of Study Hunting & fishing are part of Nebraska's heritage NGPC is interested in improving hunter/angler recruitment & retention ( NGPC,2008 ) Data collected in 2013 to know about hunters’ motivations & satisfactions towards OFW lands Purpose of this study is to compare estimates using appropriate imputation methods 2
Missing Data Missingness in Surveys ( Groves et al., 2004 ) – Noncoverage – Unit Nonresponse – Item Nonresponse – Partial Nonresponse ( Brick & Kalton,1996 ) – Data Entry Error ( Anne & Andrea,2014 ) Missing data Mechanism( Buuren, 2012 ) – Missing Completely At Random (MCAR) – Missing At Random (MAR) – Missing Not At Random (MNAR) 3
How much missing data is “problematic” Researchers assign some limits: – > 5% ( Schafer,1999 ) – >10% ( Benntt,2001 ) – >20% ( Peng et al., 2006 ) – ( Widaman,2006 ) specified the following scale o 1%-2% (Negligible) o 5%-10% ( Minor) o 10%-25% (Moderate) o 25%-50% (High) o >50% (Excessive) Important problems of missingness ( Bell & Fairclough,2013 ) – decrease in precision – Increase bias in parameter estimation 4
NGPC & UNL conducted survey Sampling frame: hunters who purchased hunting license for hunting in 2012 in NE – The survey contained three parts: o Where, & what hunt; Environment Impact o Motivations(Relatedness, Competence, Autonomy) o Socio-demographic factors About collected data – Total questions = 42 (used 19 Qus. for analysis) – Sample size = 8181 – Completely filled =1555 (19%) – Unit nonresponse = 627 (8%) – Item nonresponse = 5999 (73%) o Varies from 1 to 8 missingness per respondent in all 19 Qus. 5 81%
Determining Type of Missing Data 6 M.Satisf.Rel_1Rel_2Comp.Auto. H_Days“Harvest” Educ.IncomeAge Ns %
Data used for analysis 13 Questions for motivation based on SDT 5 Questions on relatedness transformed to 2 factors 7
Data used for analysis 13 Questions for motivation based on SDT 4 Qus. on competence & autonomy transformed each to 1 factor 8
Satisfaction=Rel_1+Rel_2+Comp+Auto+ Educ+Age+Income+H_Days+Harvest Model used for the analysis 9 VariableDescription of the variable [measured on 7 point Likert scale] SatisfactionHow satisfied were you with your experience on private lands enrolled in the Open Fields and Waters (OFW)? Releatedness_1I enjoy mentoring other hunters Releatedness_2I go hunting primarily to spend time with others & people I care about CompetenceOverall, Hunting makes me feel competent in other areas of my life AutonomyHunting helps me to feel independent; self-sufficient and more control in life Education Highest level of education that you have complete (<HS;HS;S.C;C;≥ G ) Age Age (Approximately in years) Income Total annual income for your household before taxes (8 diff. levels) Hunting_Days Visiting OFW sites allowed me to increase total days I spent hunting “Harvest” If you hunted in 2012 on a OFW site, did you harvest? (Yes/No)
Deletion or non-imputing methods: o List-wise Deletion ( Pigott, 2001 ) o Pair-wise Deletion ( Bennett, 2001 ) Nonstochastic or ad-hoc methods: o Mean Imputation (Graham,2003) o Regression Imputation ( Qin et.al., 2007 ) Stochastic or Established methods: o Stochastic Regression ( Todd et al., 2013 ) o Multiple Imputation(MI) (John, et al., 2007) o Full Information Maximum Likelihood(FIML) o Expectation Maximization (EM)(Yiran & Chao-Ying, 2013) Methods for Handling Missing Data 10
Mean Imputation 11
Comparing Results 12 Fitted Model List-wise DeletionMean Imputation p-value Intercept Releatedness_ Releatedness_ Competence Autonomy Education Age Income Hunting_Days “Harvest” cases or rows are Deletedm=1, maxit=1
Multiple Imputation 13
Comparing Results 14 Fitted Model List-wise DeletionMean ImputationMultiple Imputation p-value Intercept Releatedness_ Releatedness_ Competence Autonomy Education Age Income Hunting_Days “Harvest” cases or rows are Deletedm=1, maxit=1 m=20, maxit=10
Comparing Results 15 Fitted Model List-wise Deletion Full Information Maximum Likelihood (FIML) Imputation Expectation Maximization (EM) Imputation p-value Intercept Releatedness_ Releatedness_ Competence Autonomy Education Age Income Hunting_Days “Harvest” cases or rows are Deleted EM algorithm (MLE) converges in 37 iterations
EM only shows that Releadness_2 is significant EM estimates smallest standard error for Income Comparison of Imputation Methods Summary 16 % of smaller estimations than List-wise Deletion out of 10 variables ApproachesEstimatesStd. Err.P-valueSuggestions List-wise DeletionBase Avoid to use Mean Imputation60%100%40%Careful use Multiple Imputation30%100%20%Better Full Information Maximum Likelihood 30%100%20%Better Expectation Maximization 40%90%20%Preferred if converged
Thanks for your kind attention Special Thanks to: Dr. Andrew Tyre, Uni. Of Nebraska, Lincoln Dr. Lisa Pennisi, Uni. Of Nebraska, Lincoln Dr. Allan McCutcheon, Uni. Of Nebraska, Lincoln Nebraska Game & Parks Commission
Anne-Kathrin,F. & Andrea B. (2014). The economic performance of Swiss drinking water utilities. Journal of Prod. Analysis. 41: doi /s Bell, M. L.,& Fairclough,D.L. (2013). Practical and statistical issues in missing data for longitudinal patient reported outcomes. Statistical Methods in Medical Research, 0(0), doi: / Bennett, D.A. (2001). How can I deal with missing data in my study? Australian and New Zealand Journal of Public Health, 25, Brick, J., & Kalton, J. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215–238. doi: / Buuren, S.V.(2012). Flexible imputation of missing data. Taylor & Francis, FL: CRC Press. John, W. G. & Allison E. O. & Tamika D. G.(2007). How many imputations are really needed? some practical clarifications of multiple imputation theory, Springer,8: Graham, J. W. (2003). Adding missing-data-relevant variables to FIML based structuralequation models. Structural Equation Modeling, 10,80–100. Groves, R., Fowler, F., Couper, M., Lepkowski, J., Singer, E., & Tourangeau, R. (2004). Survey methodology. Hoboken, NJ: John Wiley. Little, R.J.A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, NGPC (2008). Nebraska 20 year hunter/angler recruitment, development and retention plan. Lincoln, NE. Pigott, T. D. (2001). A Review of Methods for Missing Data. Educational Research and Evaluation, 7(4), Peng, C.Y., Harwell, M., Liou, S.M., & Ehman, L.H. (2006). Advances in missing data methods and implications for educational research. In S Sawilowsky (Ed.), Real data analysis (pp.31-78), Greenwich, CT: Information Age. Qin,Y.,Zhang,S.,Zhu,X.,Zang,J.,& Zhang,C. (2007). Semi-parametric optimization for missing data imputation. Appl Intell 27, DOI /s Schafer, J.L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research. 8: Todd D. L., Terrence D. J., Kyle M. L., & Whitney M. (2013). On the joys of missing data. Journal of Pediatric Psychology, doi: /jpepsy/jst048 Yiran D. & Chao-Ying J.P.(2013). Principled missing data methods for researchers. Springer, 2:222. References 18
Contact Information: