Assessing the Health Effects of Air Pollution; Statistical and Computational Challenges Scott L. Zeger on behalf of The Environmental Biostatistics and.

Slides:



Advertisements
Similar presentations
Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive.
Advertisements

Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Spatial point patterns and Geostatistics an introduction
Agency for Healthcare Research and Quality (AHRQ)
Halûk Özkaynak US EPA, Office of Research and Development National Exposure Research Laboratory, RTP, NC Presented at the CMAS Special Symposium on Air.
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Biostatistics for Dummies Biomedical Computing Cross-Training Seminar October 18 th, 2002.
Critical Issues of Exposure Assessment for Human Health Studies of Air Pollution Michelle L. Bell Yale University SAMSI September 15, 2009.
Copyright 2008, The Johns Hopkins University and Francesca Dominici. All rights reserved. Use of these materials permitted only in accordance with license.
Department of Engineering and Public Policy Carnegie Mellon University Integrated Assessment of Particulate Matter Exposure and Health Impacts Sonia Yeh.
Model Choice in Time Series Studies of Air Pollution and Health Roger D. Peng, PhD Department of Biostatistics Johns Hopkins Blomberg School of Public.
GIS in Spatial Epidemiology: small area studies of exposure- outcome relationships Robert Haining Department of Geography University of Cambridge.
Health Risks of Exposure to Chemical Composition of Fine Particulate Air Pollution Francesca Dominici Yeonseung Chung Michelle Bell Roger Peng Department.
1 Practical issues and tools for modeling temporal and spatio-temporal trends in atmospheric pollutant monitoring data Paul D. Sampson Department of Statistics.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Ozone NAAQS – Issues with the Science Presented by: Lucy Fraiser Zephyr Environmental Corporation February 5, 2015 Air & Waste Management Association Hot.
Are the Mortality Effects of PM 10 the Result of Inadequate Modeling of Temperature and Seasonality? Leah Welty EBEG February 2, 2004 Joint work with Scott.
Lecture 8 Objective 20. Describe the elements of design of observational studies: case reports/series.
Air pollution and its impact on health: Comparing findings in China with findings in Europe and the USA Kristin Aunan, CICERO CCICED, October 29, 2007,
Health Impact Assessment on the Benefits of Reducing PM 2.5 Using Mortality Data from 26 European Cities Introduction The proposed draft of the new European.
An example of how the environment may contribute to health disparities: Estimates of mortality that could be prevented if Interior and Northern British.
Air Quality Health Risk Assessment – Methodological Issues and Needs Presented to SAMSI September 19, 2007 Research Triangle Park, NC Anne E. Smith, Ph.D.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 5: Analysis Issues in Large Observational Studies.
Reinhard Mechler, Markus Amann, Wolfgang Schöpp International Institute for Applied Systems Analysis A methodology to estimate changes in statistical life.
Task Force on Health Recent results - Particulate matter Michal Krzyzanowski TFH Chair Head, Bonn Office European Centre for Environment and Health WHO.
Lecture 8: Generalized Linear Models for Longitudinal Data.
U.S. EPA DISCLAIMER EPA strongly cautions that these study results should not be used to draw conclusions about local exposure concentrations or risk.
Term 4, 2005BIO656 Multilevel Models1 Hierarchical Models for Pooling: A Case Study in Air Pollution Epidemiology Francesca Dominici.
01/22/2004 Assessing the Health Effects of Atlanta’s Air Pollution Jennifer L. Peel, PhD, MPH Emory University Rollins School of Public Health 01/22/2004.
Air Pollution and Health: An introduction Ferran Ballester.
Overview What we’ll cover: Key questions Next steps
WHO European Centre for Environment and Health Overview of health impacts of particulate matter in Europe Michal Krzyzanowski WHO ECEH Bonn Office Joint.
1 Answering the Key Questions: The Latest PM Research Results Presentation to California Air Resources Board Sacramento, June 28, 2001 by Dan Greenbaum,
Why is Air Pollution a Global Public Health Problem? Daniel Krewski, PhD, MHA McLaughlin Centre for Population Health Risk Assessment NERAM Colloquium.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 California Environmental Protection Agency Follow-up to the Harvard Six-Cities Study: Health Benefits of Reductions in Fine Particulate Matter Air Pollution.
Spatial Dynamic Factor Analysis Hedibert Freitas Lopes, Esther Salazar, Dani Gamerman Presented by Zhengming Xing Jan 29,2010 * tables and figures are.
Developments & Issues in the Production of the Summary Hospital-level Mortality Indicator (SHMI) Health and Social Care Information Centre (HSCIC)
Calculation of excess influenza mortality for small geographic regions Al Ozonoff, Jacqueline Ashba, Paola Sebastiani Boston University School of Public.
INSTITUTE OF OCCUPATIONAL MEDICINE EDINBURGH, EH8 9SU, UK Potential magnitude of chronic mortality effects of air pollution J Fintan Hurley & Brian G Miller.
2006 Summer Epi/Bio Institute1 Module IV: Applications of Multi-level Models to Spatial Epidemiology Instructor: Elizabeth Johnson Lecture Developed: Francesca.
An Introductory Lecture to Environmental Epidemiology Part 5. Ecological Studies. Mark S. Goldberg INRS-Institut Armand-Frappier, University of Quebec,
HOW HOT IS HOT? Paul Wilkinson Public & Environmental Health Research Unit London School of Hygiene & Tropical Medicine Keppel Street London WC1E 7HT (UK)
Women’s Health: Diabetes and Dust Storms TC Liu P.1 Tsai-Ching Liu Women’s Health: Diabetes and Dust Storms Department of Public Finance and Public Finance.
OBJECTIVES (i) An update of the national analysis a) to assess the confounding and modifying effect of community and neighbourhood level ecological.
Impact of Air Pollution on Public Health: Transportability of Risk Estimates Jonathan M. Samet, MD, MS NERAM V October 16, 2006 Vancouver, B.C. Department.
2005 Hopkins Epi-Biostat Summer Institute1 Module 3: An Example of a Two-stage Model; NMMAPS Study Francesca Dominici Michael Griswold The Johns Hopkins.
Air pollution and its impact on health: Comparing findings in China with findings in Europe and the USA Kristin Aunan, CICERO CCICED, October 29, 2007,
Multi-level Models Summer Institute 2005 Francesca Dominici Michael Griswold The Johns Hopkins University Bloomberg School of Public Health.
Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments.
Global and Regional estimates of the Burden Due to Ambient Air Pollution: results from GBD ST AFRICA/MIDDLE-EAST EXPERT MEETING AND WORKSHOP ON THE.
Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.
July Hopkins Epi-Biostat Summer Institute Module II: An Example of a Two- stage Model; NMMAPS Study Francesca Dominici and Scott L. Zeger.
1 Part09: Applications of Multi- level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
1 Module IV: Applications of Multi-level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
Methodological Considerations in Assessing Effects of Air Pollution on Human Health Rebecca Klemm, Ph.D. Klemm Analysis Group, Inc. American Public Health.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Study of PM10 Annual Arithmetic Mean in USA  Particulate matter is the term for solid or liquid particles found in the air  The smaller particles penetrate.
log(Yit) = α + β×Xit + γ×Zi + ftime + (PM2.5)×fspat,
EUROEPI2010 -EPIDEMIOLOGY AND PUBLIC HELTH IN AN EVOLVING EUROPE
Statistical Methods for Model Evaluation – Moving Beyond the Comparison of Matched Observations and Output for Model Grid Cells Kristen M. Foley1, Jenise.
Statistical Data Analysis
Module 3: An Example of a Two-stage Model; NMMAPS Study
Visualization and Analysis of Air Pollution in US East Coast Cities
Bart Ostro, Chief Air Pollution Epidemiology Unit
Statistical Data Analysis
Presentation transcript:

Assessing the Health Effects of Air Pollution; Statistical and Computational Challenges Scott L. Zeger on behalf of The Environmental Biostatistics and Epidemiology Group (EBEG) The Johns Hopkins University Bloomberg School of Public Health CISES Meeting – Chicago October, 2004

Key Collaborators Francesca Dominici Aidan McDermott Jon Samet Roger Peng Leah Welty Hopkins Environmental Biostatistics and Epidemiology Group (EBEG)

Sources of Support U.S. National Institute of Health (NIH) U.S. Environmental Protection Agency (EPA) Health Effects Institute (HEI) - independent non-profit who receives funds from: –U.S. EPA –Automobile Manufacturers Association

Outline Air pollution and mortality: a brief overview of the epidemiologic evidence –Cohort studies –Time series studies - NMMAPS Spatial-time series models –Temporal then spatial models Key statistical issues Toward reproducible research

Daytime in London, 1952 Source: National Archives Particulate levels – 3,000  g/m^3

Designer Smog Masks - London 1950’s Source: DL Davis. When Smoke Ran Like Water (2002)

~10,000 excess deaths

4,000 first week 8,000 over next 2 months Pollution or flu or both? 50 th Anniversary Meeting

Can Air Pollution Kill at Doses an Order of Magnitude Lower? “Air pollution”: many constituents –Particles (<2.5 microns penetrate to deep lung) –Ozone –Gases: NO2, SO2, CO –… Focus on particles because of epidemiologic data

Key Epidemiologic Evidence Chronic exposures: cohort studies –Six Cities Study (e.g. Dockery, et al, 1993) –American Cancer Study (e.g. Pope, et al, 2002) Acute exposures: multi-city time series studies –NMMAPS (90 U.S.cities; e.g. Samet, et al, 2000) –APHEA (29 Eur cities; e.g.Katsouyanni, et al, 2003) –CANADIAN (8 Cities; e.g. Burnett, Goldberg, 2003)

Six CitiesACS People 8,111500,000+ Person years 111,0767.5M Deaths 1,43060,000+ Cities 650 Exposure Yearly average Covariates Age,smoking, exercise,+ Total mortality RR 1.26*1.10* Cardio-pulmonary RR 1.37*1.17* Lung cancer RR 1.37*1.29* * - Most –vs- least in Six Cities Study Cohort Studies

Public Health Significance In US, EPA estimates on order of 10,000 particle- attributable deaths per year if cohort relative risks represent a causal effect Smoking – 400,000 smoking attributable deaths per year

Caveats on Cohort Studies Regressions of “adjusted” mortality rates on longer-term average pollution level Cross-city ecologic comparisons Sample size is number of cities –6CS – 6 –ACS – 50 What else is different between higher and lower polluted cities? Does air pollution cause mortality?

Multi-city Time Series Studies of Acute Effects Compare higher to lower polluted days within the same community Avoid problem of unmeasured differences among cities New confounders –Longer-term trends in population characteristics, medical practice, smoking rates, changing demographics, etc –Seasonal effects of infectious diseases and weather –Day of month, week, holidays

Risk Estimates From Cohort and Time Series Studies risks Cohort studies estimate association between time-to-death and long-term exposure to air pollution (chronic exposure) Time Series studies estimate association between risk of death and the level of air pollution shortly before death conditional on longer-term exposures (acute exposure) Time series studies of particulate pollution are useful to address the causal question, not to estimate the size of health effects. They ignore chronic exposures.

National Morbidity and Mortality Air Pollution Study (NMMAPS) HEI funded collaboration of Johns Hopkins and Harvard Universities; Jon Samet, PI 90 largest U.S cities covering roughly 40% of annual deaths (now 105) ; now updated through 2001 Mortality and hospitalizations (14 cities)

NMMAPS Locations

Data for Baltimore, Maryland

Semi-parametric Regression Model for Each City (c)

Statistical Problem Pollution signal embedded in correlated “noise”

City-specific Estimates

Map of City Specific Estimates

Spatial Model for Relative Rates

Three Models “Three stage”- as in previous slide “Two stage”- ignore region effects; assume cities have exchangeable random effects Two stage with “spatial” correlation -city random effects have isotropic exponentially decaying autocorrelation function

Joint Estimation of 90 City Slopes With Spatial Model Approximate the conditional distribution of each city estimate given its true value by a Gaussian model with mean and variance equal to the mle and inverse of Fisher information under an over-dispersed Poisson model No borrowing strength across cities for estimation of smooth functions of time and temperature (a full Bayesian analysis with “infinite prior variances for these terms)

Joint Estimation MCMC implementation with proper priors for the variance components –Standard uninformative priors are not –Half Gaussians with large variances on  ^2 Have compared inferences to full Bayes analysis in a parametric analogue – no difference

Posterior Distribution of National Average

Results Stratified by Cause of Death

Evidence for Heterogeneity Among Cities in True Relative Rates

Shrinkage

Bayes Posterior Estimates

Statistical Formulation Pollution effect Confounders Space-time frailty

Scientific and Statistical Issues 1.Model for the baseline frailty process and other unmeasured confounders process in space and time –personal variables (smoking, exercise) –city-specific variables (demographics, medical services) –influenza epidemics 2.Co-pollutants 3.Public health significance: “harvesting?” 4.Distributed lags 5.Reproducible research

1. Model for Spatial Time Series By collecting people across a large city, central limit theorem smooths out individual behaviors and produces a temporally smooth nuisance function Ignore the spatial correlation in mortality process and estimate city-specific relative rates Model spatial associations among rate estimates instead of modeling associations among the mortality events themselves

Formulation of Time Series Model “Collect and Conquer”

Degree of Adjustment

2. Co-pollutants Recent Testimony on the EPA Proposed Decision on Particulate Matter Suresh H. Moolgavkar, M.D., Ph.D. Member, Fred Hutchinson Cancer Research Center; Professor of Epidemiology and Biostatistics, University of Washington - Leading Industry Consultant “the potential for uncontrolled confounding by co- pollutants currently preclude the conclusion that the particulate component of air pollution is causally associated with adverse effects on human health.”

Co-pollutants Estimated the same model with –PM10 + ozone –PM10 + ozone + NO2 –PM10 + ozone + SO2 –PM10 + ozone + CO Pooled data over the largest 20 cities that tell most of the story

Co-pollutants Individual cities can change substantially; Average across 20 cities changes little

3. Public Health Significance Harvesting idea –Only the very frail could possibly die from air pollution –They would have died anyway in a few days –Air pollution, kills but causes only a trivial loss of quality days of life If true, we would expect associations only at shorter time scales

Total Suspended ParticlesMortality Philadelphia Frequency Domain Decomposition

Frequency Domain Log-linear Regression – Philadelphia TSP

Frequency Band Relative Risk Estimates Pooled over 4 Cities

4. Distributed Lag Models NMMAPS described mortality as a function of air pollution u=1 (or 0,2,3) days before because PM data only available every sixth days in most cities To capture the entire acute effect, must include pollution levels from previous week or two Two statistical-computational issues –How to flexibly model the distributed lags –How to contend with substantial missing covariate data

Distributed Lag Models (DLMs) for PM 10 on Mortality

Effect of unit increase in PM 10 7 days ago on today’s mortality Distributed Lag Function = ‘total effect’

Example DLMs for PM 10 on Mortality Chicago

1.No knowledge of early lag effects 2.Lag effects must eventually go to zero 3.Lag effects get smoother further back in time Prior Knowledge of DL Function Our approach: Construct as to reflect 1-3

Constructing Distributed Lag Prior 1.No knowledge of early lag effects 2.Lag effects must eventually go to zero Large Variances → Small Variances

3.Lag effects tend to zero smoothly Uncorrelated → Correlated

Bayesian Averaged Dist Lags of PM 10 on Mort (Chicago) Total Effect

Bayesian Averaged Dist Lags of PM 10 on Mort (Detroit)

Toward Reproducible Epidemiologic Reseach (RER) U.S. EPA setting national policy about air pollution based on acute and chronic disease studies – lots of $$ at stake Research conducted in the context of an adversarial debate about whether current levels of pollution cause mortality – credibility of epidemiologic evidence

Statistical Problem Pollution signal embedded in correlated “noise”

Convergence Problem NMMAPS estimated the city-specific relative rates using Generalized Additive Models (gam) in S-plus gam relies upon several parameters, four of which control the decision of when to declare convergence of the estimation algorithm 5 years into work, we discovered that the default parameters we used were too lax for our application In addition, Ramsey, et al discovered the gam under- estimates the standard errors of the relative rates estimates

Model Sensitivity: Relative Rate estimates for GAM (default and strict) versus GLM Dominici, McDermott, Zeger, Samet AJE 2002 GAM (default) versus GLM estimatesGAM(strict) versus GLM estimates

What Difference Did it Make?

The Press: The New York Times (June 2002)

“(A)lthough many questions remain about how fine particles kill people, the NMMAPS study shows there’s no mistaking that PM is the culprit NMMAPS in Science July 2000 Understatement of statistical uncertainty in the press

Levels in Replication Investigator Study Data Analysis Software Reproducibility

Toward Reproducibility in iHAPSS Post papers (tech reports) on iHAPSS web-site Hyperlink main results in paper (tables, figures) to –Statistical computing environment (R) with: program that generates the results datafile used by the program to generate the results Give user opportunity to alter the analyses –In this computing environment –In their own environment?

internet Health and Air Pollution Surveillance System (iHAPSS)

R as a Platform for Distributing Data Convenient online help system for documenting datasets Vignette system for more detailed descriptions of data or code Functions can be provided for handling data Data can be delivered as a single unit/package, rather than in separate (possibly unlinked) pieces

NMMAPSdata Preprocessing functions for setting up the database to reproduce recent NMMAPS findings –basicNMMAPS: analysis of PM 10 and mortality –seasonal: estimating seasonally varying effects of PM 10 –tempDLM: distributed lag models for temperature

NMMAPSdata Index Number of U.S. cities: 108 Number of days of observations: 5114 Number of age categories: 3 Number of variables: 291 Database size (uncompressed): 2.5GB

Toward Reproducibility of Epidemiologic Research iHAPSS as a model Journals require that published papers be accompanied by programs/data necessary to reproduce their results Next steps to move the field in this direction

Main Points Once Again Reviewed the epidemiologic evidence for an association of particulate air pollution and mortality –Cohort studies: RR=1.25 across range of exposures –Time series studies: Mortality in space and time –Summarize over time, then analyze in space

Main Points Once Again Value of Bayes estimates of maps of relative risks Time-scale specific relative risks Distributed lags models Reproducible Epidemiologic Research ScienceStatistics

Testimony on the EPA Proposed Decision on Particulate Matter Suresh H. Moolgavkar, M.D., Ph.D. Professor of Epidemiology and Biostatistics, University of Washington Industry Consultant “The proposed new regulations for particulate matter are based on the assumption that the magnitude of the associations between these pollutants and adverse human health effects reported in some epidemiologic studies is predictive of the gains in human health that would accrue by lowering ambient concentrations. The evidence simply does not support this assumption. Briefly, the dearth of toxicological information, the absence of biological understanding of underlying mechanism, and the potential for uncontrolled confounding by co-pollutants currently preclude the conclusion that the particulate component of air pollution is causally associated with adverse effects on human health.”