Presentation on theme: "Health and the Environment : A Case Study of Indonesian Children Claire Harper and Craig Harper Sta242/Env255 April 19, 2001."— Presentation transcript:
Health and the Environment : A Case Study of Indonesian Children Claire Harper and Craig Harper Sta242/Env255 April 19, 2001
Introduction Policy makers have long been interested in the relationship between the environment and human health. In some cases, such as water contamination, this relationship has been well studied. For other environmental factors, such as forest cover, health effects are not as easily quantifiable (we do not know an LD-50 level for forest cover). The purpose of this project was to determine what effects, if any, various environmental factors have on the health of Indonesian children. Do forest cover, water area, rainfall, and erosion affect the number of illnesses in children, after accounting for family, housing, and village characteristics?
Data Collection Methods Data were collected by Professor Subhrendu Pattanayak during his doctoral research in Indonesia. Observations were obtained through surveys of randomly selected households in several villages. Villages were selected to represent a variety of environmental characteristics. Data were only collected from households with a total family size less than eight. A GIS was used to measure village area, forest cover, and water area. Erosion and sedimentation rates were also derived using a GIS.
Variables Selected for Analysis Dependent Annual number of illnesses per child Independent Adult education (years) Total family size Annual number of illnesses per adult Size of farm (hectares) Condition of the floor: 1=stilts, 2=dirt, 3=cement Condition of the roof: 1=straw, 2=wood, 3=zinc Income from non-farm sources: 0 =none, 1= > zero Annual expenditures per family member (rupiahs) Village density (people/hectare) Primary forest cover (hectares) Secondary forest cover (hectares) Annual rainfall in watershed (mm) Water area (hectares) Annual erosion and sedimentation rate in watershed (tons)
Summary Statistics Focused on subset of data with > 0 illnesses per child
Exploratory Analysis Log transformed all numeric variables except family size
Model Selection We considered plausible interaction effects and quadratic terms. Subset model selection (leaps) was used to determine which interaction effects were significant with all main effects included. Then, we used manual stepwise selection (backward) to analyze the significance of the main effects. –Farm size (p=0.4749), erosion rate (p=0.3694), and floor condition (p=0.5069) were not significant and were not involved in a potentially important interaction effect.
Model Selection Subset model selection (leaps) was also used to determine which interaction effects were significant with the remaining main effects (minus farm size, erosion, and floor condition). Analyzed the importance of each interaction effect using manual stepwise regression. Tests for influential observations using Cook’s distance indicated that there were no influential points. All of the observations displayed Cook’s Distance < 0.08. Finally, we considered additional transformations. Transforming family size (the final untransformed variable) did not significantly improve our model.
Fitted Model If the variable is highlighted in red, then an increase in this variable is associated with an increase in the annual number of illnesses per child. If the variable is highlighted in blue, then an increase in this variable is associated with a decrease in the annual number of illnesses per child. R-squared = 0.2943 R-squared adjusted = 0.2688 This model is homoskedastic, but has a heavy tailed distribution (QQ Normal plot not shown).
Question 1: How do the environmental factors considered affect child health as a group? Hypotheses: H 0 : The model without env. factors is adequate. H A : The full model (with env. factors) is a significant improvement. Statistical Technique: We used an Extra Sum of Squares F-Test to test the joint significance of the environmental factors (log(rainfall), log(primary forest), log(secondary forest), log(water area), and [log(rain)*roof)]. Results: The addition of environmental factors significantly improved our model (F=3.24, p=0.01, ESS F-test). Conclusions and Limitations: There was sufficient evidence to reject the null hypothesis and conclude that inclusion of environmental factors does significantly improve our understanding of the annual number of illnesses in Indonesian children. This is a heavy-tailed distribution, even after transformations, and this may have undue influence on the results. Likewise, there may be lurking variables (i.e. watershed area) which were not accounted for in the data.
Question 2: How does forest cover affect child health? Hypotheses: H 0 : Primary forest cover is not a significant explanatory variable ( 9 =0). H A : Primary forest cover is significant ( 9 0). and H 0 : Secondary forest cover is not a significant explanatory variable ( 10 =0); H A : Secondary forest cover is significant ( 10 0). Statistical Technique: Two-sided t-tests were used to test the significance of each coefficient with =0.05, after accounting for the other variables in the model. We calculated 95% family-wise confidence intervals using Bonferroni techniques in order to simultaneously estimate the coefficients associated with environmental factors.
Question 2: Results and Conclusions Results: According to the t-tests, there is sufficient evidence to reject the null hypotheses and conclude that both primary and secondary forest are significant explanatory variables in this model (primary: p=0.0210; secondary: p=0.0017). However, under the more conservative approach of family-wise confidence intervals, primary forest does not appear to be significant (95% CI: -0.01551, 0.2819; includes zero). Secondary forest does appear to be significant, even with the conservative family-wise confidence interval (95% CI: 0.01824,0.1846). Conclusions: An increase in secondary forest cover is associated with an increase in the median annual number of illnesses per child in Indonesia. Doubling the amount of secondary forest cover is associated with a 7% (95% CI: 1%,14%) increase in the median number of illnesses per child per year. We fail to reject the null hypothesis for primary forest cover under the Bonferroni confidence interval. However, our results are highly suggestive of an association between primary forest cover and child illness.
Question 3: How does the amount of rainfall affect child health? Hypotheses: H 0 : Rainfall is not a significant explanatory variable ( 8 =0). H A : Rainfall is significant ( 8 0). and H 0 : The interaction between rainfall and roof type is not significant ( 12 =0). H A : The interaction effect is significant ( 12 0). Statistical Technique: Two-sided t-tests with =0.05. 95% family-wise confidence interval (Bonferroni) for family of environmental variables. Set up a dummy variable for roof to assess the degree of interaction between rainfall and each roof type.
Question 3: Results Results: According to the t-tests, there is sufficient evidence to reject the null hypotheses and conclude that both rainfall and the interaction between rainfall and roof type are significant (p=0.0028; p=0.0032, respectively). Under the more conservative approach of family-wise confidence intervals, the interaction effect does not appear to be significant (95% CI: -0.0140, 0.1518; includes zero). On the other hand, the rainfall variable is significant (95% CI: -0.7326,-0.0548). T-tests analyzing the significance of the dummy variables for both the roof variable and the interaction between roof and rain indicate that roofs 1 and 2 do not significantly differ from roof 3. Therefore, a reduced model without the dummy variables is a more appropriate model. The coefficients, t-values and p-values for the dummy variables are as follows:
Question 3: Conclusions Conclusions: An increase in rainfall is associated with a decrease in the median number of illnesses per child per year. In fact, a doubling of annual rainfall is associated with a 23.9% (95% CI: 4%,40%) decrease in the median number of illnesses per child per year. This is the strongest multiplicative change in median number of illnesses among the environmental factors. The negative results of the dummy variable tests are somewhat surprising. It was assumed by those collecting the data that the quality of roof increased from roof 1 (straw) to roof 3 (zinc). Interestingly, the coefficients of the dummy variables suggest that roof 2 (wood) is actually associated with a lower rate of child illness than roof 3. Unfortunately, the lack of significance of the coefficients prevents us from definitively answering this question. On the other hand, the fact that the family-wise (Bonferroni) confidence interval indicated that the interaction effect was not significant makes the lack of significance among the dummy variables less surprising.
General Conclusions Modeling Indonesian children’s health is an extremely complicated prospect. With all of the variables we have included, our model explains just 29% of the variation in annual number of illnesses (or 26% with adjusted R 2 ). Our analysis indicates that environmental factors are important when attempting to explain child health but the predictive power of such explanations is very low. Despite a lack of predictive power, however, the model does exhibit several interesting associations. For example, we expected the coefficient associated with water area to be negative because of a suspected increased number of insects; that coefficient turned out to be positive and significantly greater than 0. We began this project in hopes of finding a human health argument for conservation of primary forest. On the contrary, the significantly positive nature of the coefficient of the primary forest variable creates a disincentive for conservation. Before promoting deforestation for health reasons, however, we must again consider the uncertainty inherent in the model.
Recommendations and Further Research The observational nature of the data prevent any inference of cause and effect relationships. Thus, we may only discuss associations between variables. We were highly suspicious of observations claiming to have had no illnesses among children for a year and focused only on families with counted illnesses. Future surveys identifying the type of illness in question would be helpful in building a more descriptive model. Future studies should consider focusing analyses on a specific type of illness to increase the predictive power of the model. Few policy recommendations can be drawn from this particular model. More research is needed into the environmental factors affecting the amount of disease among children. Increasing the predictive power of the model will be key to increasing the utility of the model as a policy tool. Acknowledgments: We would like to thank Professor Subhrendu Pattanayak for supplying the data.
How does the rate of erosion and sedimentation in the watershed affect child health? The erosion/sedimentation variable was not significant in any of the models considered. It was also not significant in the final model (F=0.3142, p=0.5755, ESS F-Test). There is only a 2% probability that erosion and sedimentation rate affects the annual number of illnesses in children in the top 20 models (total posterior probability). Limitations: The erosion/sedimentation variable is correlated with rain (correlation coefficient = - 0.75). This reduces our ability to assess the significance of this variable since it would have a lower t-statistic and a wider confidence interval.
Does water area affect child health? Hypotheses: H 0 : Water area is not a significant explanatory variable ( 11 =0). H A : Water area is significant ( 11 0). Statistical Technique: Two-sided t-tests with =0.05. 95% family-wise confidence interval (Bonferroni) for family of environmental variables. Results: According to the t-test, there is sufficient evidence to reject the null hypothesis and conclude water area is a significant explanatory variable in this model (p=0.0094). The family-wise confidence interval supports the conclusion that water area is significant (95% CI: -0.2907,-0.0011). Conclusions: An increase in water area is associated with a decrease in the annual number of illnesses per child. Doubling the water area is associated with a 10% decrease in the median number of illnesses per child.