Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assessing the Health Effects of Air Pollution; Statistical and Computational Challenges Scott L. Zeger on behalf of The Environmental Biostatistics and.

Similar presentations


Presentation on theme: "Assessing the Health Effects of Air Pollution; Statistical and Computational Challenges Scott L. Zeger on behalf of The Environmental Biostatistics and."— Presentation transcript:

1 Assessing the Health Effects of Air Pollution; Statistical and Computational Challenges Scott L. Zeger on behalf of The Environmental Biostatistics and Epidemiology Group (EBEG) The Johns Hopkins University Bloomberg School of Public Health CISES Meeting – Chicago October, 2004

2 Key Collaborators Francesca Dominici Aidan McDermott Jon Samet Roger Peng Leah Welty Hopkins Environmental Biostatistics and Epidemiology Group (EBEG) http://www.biostat.jhsph.edu/bstproj/ebeg/

3 Sources of Support U.S. National Institute of Health (NIH) U.S. Environmental Protection Agency (EPA) Health Effects Institute (HEI) - independent non-profit who receives funds from: –U.S. EPA –Automobile Manufacturers Association

4 Outline Air pollution and mortality: a brief overview of the epidemiologic evidence –Cohort studies –Time series studies - NMMAPS Spatial-time series models –Temporal then spatial models Key statistical issues Toward reproducible research

5 Daytime in London, 1952 Source: National Archives Particulate levels – 3,000  g/m^3

6 Designer Smog Masks - London 1950’s Source: DL Davis. When Smoke Ran Like Water (2002)

7 ~10,000 excess deaths

8 4,000 first week 8,000 over next 2 months Pollution or flu or both? 50 th Anniversary Meeting

9 Can Air Pollution Kill at Doses an Order of Magnitude Lower? “Air pollution”: many constituents –Particles (<2.5 microns penetrate to deep lung) –Ozone –Gases: NO2, SO2, CO –… Focus on particles because of epidemiologic data

10 Key Epidemiologic Evidence Chronic exposures: cohort studies –Six Cities Study (e.g. Dockery, et al, 1993) –American Cancer Study (e.g. Pope, et al, 2002) Acute exposures: multi-city time series studies –NMMAPS (90 U.S.cities; e.g. Samet, et al, 2000) –APHEA (29 Eur cities; e.g.Katsouyanni, et al, 2003) –CANADIAN (8 Cities; e.g. Burnett, Goldberg, 2003)

11 Six CitiesACS People 8,111500,000+ Person years 111,0767.5M Deaths 1,43060,000+ Cities 650 Exposure Yearly average Covariates Age,smoking, exercise,+ Total mortality RR 1.26*1.10* Cardio-pulmonary RR 1.37*1.17* Lung cancer RR 1.37*1.29* * - Most –vs- least in Six Cities Study Cohort Studies

12 Public Health Significance In US, EPA estimates on order of 10,000 particle- attributable deaths per year if cohort relative risks represent a causal effect Smoking – 400,000 smoking attributable deaths per year

13 Caveats on Cohort Studies Regressions of “adjusted” mortality rates on longer-term average pollution level Cross-city ecologic comparisons Sample size is number of cities –6CS – 6 –ACS – 50 What else is different between higher and lower polluted cities? Does air pollution cause mortality?

14 Multi-city Time Series Studies of Acute Effects Compare higher to lower polluted days within the same community Avoid problem of unmeasured differences among cities New confounders –Longer-term trends in population characteristics, medical practice, smoking rates, changing demographics, etc –Seasonal effects of infectious diseases and weather –Day of month, week, holidays

15 Risk Estimates From Cohort and Time Series Studies risks Cohort studies estimate association between time-to-death and long-term exposure to air pollution (chronic exposure) Time Series studies estimate association between risk of death and the level of air pollution shortly before death conditional on longer-term exposures (acute exposure) Time series studies of particulate pollution are useful to address the causal question, not to estimate the size of health effects. They ignore chronic exposures.

16 National Morbidity and Mortality Air Pollution Study (NMMAPS) HEI funded collaboration of Johns Hopkins and Harvard Universities; Jon Samet, PI 90 largest U.S cities covering roughly 40% of annual deaths (now 105) 1987- 1994; now updated through 2001 Mortality and hospitalizations (14 cities)

17 NMMAPS Locations

18 Data for Baltimore, Maryland

19 Semi-parametric Regression Model for Each City (c)

20 Statistical Problem Pollution signal embedded in correlated “noise”

21 City-specific Estimates

22 Map of City Specific Estimates

23 Spatial Model for Relative Rates

24 Three Models “Three stage”- as in previous slide “Two stage”- ignore region effects; assume cities have exchangeable random effects Two stage with “spatial” correlation -city random effects have isotropic exponentially decaying autocorrelation function

25 Joint Estimation of 90 City Slopes With Spatial Model Approximate the conditional distribution of each city estimate given its true value by a Gaussian model with mean and variance equal to the mle and inverse of Fisher information under an over-dispersed Poisson model No borrowing strength across cities for estimation of smooth functions of time and temperature (a full Bayesian analysis with “infinite prior variances for these terms)

26 Joint Estimation MCMC implementation with proper priors for the variance components –Standard uninformative priors are not –Half Gaussians with large variances on  ^2 Have compared inferences to full Bayes analysis in a parametric analogue – no difference

27 Posterior Distribution of National Average

28 Results Stratified by Cause of Death

29 Evidence for Heterogeneity Among Cities in True Relative Rates

30 Shrinkage

31 Bayes Posterior Estimates

32 Statistical Formulation Pollution effect Confounders Space-time frailty

33 Scientific and Statistical Issues 1.Model for the baseline frailty process and other unmeasured confounders process in space and time –personal variables (smoking, exercise) –city-specific variables (demographics, medical services) –influenza epidemics 2.Co-pollutants 3.Public health significance: “harvesting?” 4.Distributed lags 5.Reproducible research

34 1. Model for Spatial Time Series By collecting people across a large city, central limit theorem smooths out individual behaviors and produces a temporally smooth nuisance function Ignore the spatial correlation in mortality process and estimate city-specific relative rates Model spatial associations among rate estimates instead of modeling associations among the mortality events themselves

35 Formulation of Time Series Model “Collect and Conquer”

36 Degree of Adjustment

37

38 2. Co-pollutants Recent Testimony on the EPA Proposed Decision on Particulate Matter Suresh H. Moolgavkar, M.D., Ph.D. Member, Fred Hutchinson Cancer Research Center; Professor of Epidemiology and Biostatistics, University of Washington - Leading Industry Consultant “the potential for uncontrolled confounding by co- pollutants currently preclude the conclusion that the particulate component of air pollution is causally associated with adverse effects on human health.”

39 Co-pollutants Estimated the same model with –PM10 + ozone –PM10 + ozone + NO2 –PM10 + ozone + SO2 –PM10 + ozone + CO Pooled data over the largest 20 cities that tell most of the story

40 Co-pollutants Individual cities can change substantially; Average across 20 cities changes little

41 3. Public Health Significance Harvesting idea –Only the very frail could possibly die from air pollution –They would have died anyway in a few days –Air pollution, kills but causes only a trivial loss of quality days of life If true, we would expect associations only at shorter time scales

42 Total Suspended ParticlesMortality Philadelphia Frequency Domain Decomposition

43 Frequency Domain Log-linear Regression – Philadelphia TSP

44 Frequency Band Relative Risk Estimates Pooled over 4 Cities

45 4. Distributed Lag Models NMMAPS described mortality as a function of air pollution u=1 (or 0,2,3) days before because PM data only available every sixth days in most cities To capture the entire acute effect, must include pollution levels from previous week or two Two statistical-computational issues –How to flexibly model the distributed lags –How to contend with substantial missing covariate data

46 Distributed Lag Models (DLMs) for PM 10 on Mortality

47 Effect of unit increase in PM 10 7 days ago on today’s mortality Distributed Lag Function = ‘total effect’

48 Example DLMs for PM 10 on Mortality Chicago 1987-2000

49 1.No knowledge of early lag effects 2.Lag effects must eventually go to zero 3.Lag effects get smoother further back in time Prior Knowledge of DL Function Our approach: Construct as to reflect 1-3

50 Constructing Distributed Lag Prior 1.No knowledge of early lag effects 2.Lag effects must eventually go to zero Large Variances → Small Variances

51 3.Lag effects tend to zero smoothly Uncorrelated → Correlated

52

53 Bayesian Averaged Dist Lags of PM 10 on Mort (Chicago) Total Effect

54 Bayesian Averaged Dist Lags of PM 10 on Mort (Detroit)

55 Toward Reproducible Epidemiologic Reseach (RER) U.S. EPA setting national policy about air pollution based on acute and chronic disease studies – lots of $$ at stake Research conducted in the context of an adversarial debate about whether current levels of pollution cause mortality – credibility of epidemiologic evidence

56 Statistical Problem Pollution signal embedded in correlated “noise”

57 Convergence Problem NMMAPS estimated the city-specific relative rates using Generalized Additive Models (gam) in S-plus gam relies upon several parameters, four of which control the decision of when to declare convergence of the estimation algorithm 5 years into work, we discovered that the default parameters we used were too lax for our application In addition, Ramsey, et al discovered the gam under- estimates the standard errors of the relative rates estimates

58 Model Sensitivity: Relative Rate estimates for GAM (default and strict) versus GLM Dominici, McDermott, Zeger, Samet AJE 2002 GAM (default) versus GLM estimatesGAM(strict) versus GLM estimates

59 What Difference Did it Make?

60 The Press: The New York Times (June 2002)

61 “(A)lthough many questions remain about how fine particles kill people, the NMMAPS study shows there’s no mistaking that PM is the culprit NMMAPS in Science July 2000 Understatement of statistical uncertainty in the press

62 Levels in Replication Investigator Study Data Analysis Software Reproducibility

63 Toward Reproducibility in iHAPSS Post papers (tech reports) on iHAPSS web-site Hyperlink main results in paper (tables, figures) to –Statistical computing environment (R) with: program that generates the results datafile used by the program to generate the results Give user opportunity to alter the analyses –In this computing environment –In their own environment?

64 internet Health and Air Pollution Surveillance System (iHAPSS)

65 R as a Platform for Distributing Data Convenient online help system for documenting datasets Vignette system for more detailed descriptions of data or code Functions can be provided for handling data Data can be delivered as a single unit/package, rather than in separate (possibly unlinked) pieces

66 NMMAPSdata Preprocessing functions for setting up the database to reproduce recent NMMAPS findings –basicNMMAPS: analysis of PM 10 and mortality –seasonal: estimating seasonally varying effects of PM 10 –tempDLM: distributed lag models for temperature

67 NMMAPSdata Index Number of U.S. cities: 108 Number of days of observations: 5114 Number of age categories: 3 Number of variables: 291 Database size (uncompressed): 2.5GB

68

69

70 Toward Reproducibility of Epidemiologic Research iHAPSS as a model Journals require that published papers be accompanied by programs/data necessary to reproduce their results Next steps to move the field in this direction

71 Main Points Once Again Reviewed the epidemiologic evidence for an association of particulate air pollution and mortality –Cohort studies: RR=1.25 across range of exposures –Time series studies: Mortality in space and time –Summarize over time, then analyze in space

72 Main Points Once Again Value of Bayes estimates of maps of relative risks Time-scale specific relative risks Distributed lags models Reproducible Epidemiologic Research ScienceStatistics

73 Testimony on the EPA Proposed Decision on Particulate Matter Suresh H. Moolgavkar, M.D., Ph.D. Professor of Epidemiology and Biostatistics, University of Washington Industry Consultant “The proposed new regulations for particulate matter are based on the assumption that the magnitude of the associations between these pollutants and adverse human health effects reported in some epidemiologic studies is predictive of the gains in human health that would accrue by lowering ambient concentrations. The evidence simply does not support this assumption. Briefly, the dearth of toxicological information, the absence of biological understanding of underlying mechanism, and the potential for uncontrolled confounding by co-pollutants currently preclude the conclusion that the particulate component of air pollution is causally associated with adverse effects on human health.”


Download ppt "Assessing the Health Effects of Air Pollution; Statistical and Computational Challenges Scott L. Zeger on behalf of The Environmental Biostatistics and."

Similar presentations


Ads by Google