Introduction Outdoor air pollution has a negative effect on health. On days of high air pollution, rates of cardiovascular and respiratory events increase.

Slides:



Advertisements
Similar presentations
Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality Leah J. Welty, PhD Department of Preventive.
Advertisements

Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.
Experiments and Variables
Correlation and regression Dr. Ghada Abo-Zaid
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Qualitative Variables and
Model assessment and cross-validation - overview
A Short Introduction to Curve Fitting and Regression by Brad Morantz
What is Statistical Modeling
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Flexible modeling of dose-risk relationships with fractional polynomials Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical.
Evaluating Hypotheses
Prediction and model selection
Thoughts on Simplifying the Estimation of HIV Incidence John Hargrove, Alex Welte, Paul Mostert [and others]
Model Choice in Time Series Studies of Air Pollution and Health Roger D. Peng, PhD Department of Biostatistics Johns Hopkins Blomberg School of Public.
Forecasting McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Maximum likelihood (ML)
Model Checking in the Proportional Hazard model
Classification and Prediction: Regression Analysis
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Diane Stockton Trend analysis. Introduction Why do we want to look at trends over time? –To see how things have changed What is the information used for?
Today Evaluation Measures Accuracy Significance Testing
Are the Mortality Effects of PM 10 the Result of Inadequate Modeling of Temperature and Seasonality? Leah Welty EBEG February 2, 2004 Joint work with Scott.
Algorithm Taxonomy Thus far we have focused on:
Simple Linear Regression
Air Quality Health Risk Assessment – Methodological Issues and Needs Presented to SAMSI September 19, 2007 Research Triangle Park, NC Anne E. Smith, Ph.D.
Term 4, 2005BIO656 Multilevel Models1 Hierarchical Models for Pooling: A Case Study in Air Pollution Epidemiology Francesca Dominici.
Non-life insurance mathematics Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
1 California Environmental Protection Agency Follow-up to the Harvard Six-Cities Study: Health Benefits of Reductions in Fine Particulate Matter Air Pollution.
Time Series Analysis and Forecasting
It’s About Time Mark Otto U. S. Fish and Wildlife Service.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 E. Fatemizadeh Statistical Pattern Recognition.
John G. Zhang, Ph.D. Harper College
HOW HOT IS HOT? Paul Wilkinson Public & Environmental Health Research Unit London School of Hygiene & Tropical Medicine Keppel Street London WC1E 7HT (UK)
Women’s Health: Diabetes and Dust Storms TC Liu P.1 Tsai-Ching Liu Women’s Health: Diabetes and Dust Storms Department of Public Finance and Public Finance.
Impact of Air Pollution on Public Health: Transportability of Risk Estimates Jonathan M. Samet, MD, MS NERAM V October 16, 2006 Vancouver, B.C. Department.
INTRODUCTION TO Machine Learning 3rd Edition
2005 Hopkins Epi-Biostat Summer Institute1 Module 3: An Example of a Two-stage Model; NMMAPS Study Francesca Dominici Michael Griswold The Johns Hopkins.
Simultaneous estimation of monotone trends and seasonal patterns in time series of environmental data By Mohamed Hussian and Anders Grimvall.
Time Series Analysis and Forecasting. Introduction to Time Series Analysis A time-series is a set of observations on a quantitative variable collected.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.
July Hopkins Epi-Biostat Summer Institute Module II: An Example of a Two- stage Model; NMMAPS Study Francesca Dominici and Scott L. Zeger.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
The parametric g-formula and inverse probability weighting
Methodological Considerations in Assessing Effects of Air Pollution on Human Health Rebecca Klemm, Ph.D. Klemm Analysis Group, Inc. American Public Health.
Chapter 15 Forecasting. Forecasting Methods n Forecasting methods can be classified as qualitative or quantitative. n Such methods are appropriate when.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
Skill of Generalized Additive Model to Detect PM 2.5 Health Signal in the Presence of Confounding Variables Office of Research and Development Garmisch.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Time Series Forecasting Trends and Seasons and Time Series Models PBS Chapters 13.1 and 13.2 © 2009 W.H. Freeman and Company.
Longitudinal Data & Mixed Effects Models Danielle J. Harvey UC Davis.
Bias and Variance of the Estimator
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Module 3: An Example of a Two-stage Model; NMMAPS Study
5.1 Introduction to Curve Fitting why do we fit data to a function?
The study of the association between environmental exposure and preterm birth: design and methodological aspects Patrizia Schifano and Xavier Basagana.
Parametric Methods Berlin Chen, 2005 References:
Longitudinal Data & Mixed Effects Models
Presentation transcript:

Introduction Outdoor air pollution has a negative effect on health. On days of high air pollution, rates of cardiovascular and respiratory events increase. This increase may last for a number of days because of a delay, in some people, between exposure and onset. The precise length of this delay remains unknown. Most time-series studies of the health effects of air pollution have used an exposure of only the current day, or the current and previous day’s pollution. These studies will have underestimated the effects of pollution if the largest delay is longer than a day. A study in Europe found associations between particular matter and all-cause (non- external) mortality that lasted for forty days 1, although the largest risk increases were in seven days. A study in Ireland found significant increases in respiratory deaths 3-4 weeks after exposure to black smoke 2. As well as the length of the delay, the shape of the time-varying relationship is of interest. We might expect the shape to vary smoothly with time, we might also expect that risk will gradually decrease with increasing delay (although this decrease might not be monotone). The shape may also show harvesting, where events are only brought forward in time by air pollution. Such patterns ameliorate the overall increase in risk (which is estimated by the total area of the shape). The aim of this work was to create a method that could correctly estimate the delay between air pollution exposure and event onset in time-series data. We also compared parametric and non-parametric estimates of the shape. Time-series and regression notation We examine time-series of daily counts of events Y t, t=1,…,n. For notation purposes the covariates were split into a vector of daily air pollution levels Z, and a matrix of other covariates X. We assume the counts follow a Poisson distribution, and regress the mean count on the covariates and the lagged (delayed) effect of air pollution, where L is the maximum lag (the key unknown parameter). Likely covariates in air pollution models include: season, temperature, humidity and day of the week. Non- linear models, especially splines, are often used for covariates such as temperature. The overall estimated increase in risk due to pollution is the sum of the estimates. The constant L is selected by the user. The user also determines the form of the constraint on . In this work we look at three constraints: Unconstrained: each parameter (  j ) is estimated independently of its neighbours. The sum of the estimates gives an unbiased estimate of the overall risk, but the estimated shape is often very noisy. Polynomial: parameters are constrained to follow a smooth polynomial change over lagged time 3, where P is the model order ( P=0, constant, P=1 linear, P=2 quadratic, etc.). P can be selected using stepwise increases in the order and testing the significance of, stopping when is no longer significant. Window: the unconstrained parameters are smoothed using a moving average window where the  * are unconstrained parameters, and W is the one-sided width of the window (selected by the user). Investigating the time-varying effects of air pollution using distributed lagged models: a comparison of polynomial and window models Adrian Barnett †, Gail Williams †, Trudi Best ‡, Anne Neller ‡, Rod Simpson ‡ † School of Population Health, The University Of Queensland, Australia; ‡ The University of the Sunshine Coast, Australia Criteria for an optimal fit We use Akaike’s Information Criteria (AIC) and the Deviance Information Criteria (DIC) 4 to judge the optimal value of L. Both criteria are given by the same formula, where m is the (effective) number of parameters ( L for the unconstrained model, and P for the polynomial). As the AIC/DIC are a trade-off between model fit and complexity, their minimum value can be used to choose between candidate models. Simulation study of mortality counts We created 100 time-series of mortality counts with a mean of 30 and length n =365, using the time-varying shapes shown below. Shape A is a delayed effect between exposure and onset followed by a period of harvesting. Shape B is a constant effect for 10 days followed by a non-linear sinusoid. Both shapes have a maximum lag of L=24. For simplicity we included no covariates. We used SAS (and the AIC) to fit the unconstrained and polynomial models and WinBUGS (and the DIC) to fit the window model. Importantly, when testing up to L lags the first L mortality counts were set to missing. Simulation results Real study: daily mortality in Brisbane, Australia We examined the time-varying effect of average 24-hour PM 10 and average 24-hour NO 2 on total (non-external) mortality in Brisbane. We controlled for long-term trend, season, temperature, difference in temperature (current day minus previous), average humidity, average pressure, and extremes of temperature (coldest and hottest 1% of days). The time-series ran from 1998 to 2001, with an average daily mortality count of 25. Results on Brisbane data DIC over lag for PM 10 (left) and NO 2 (right) for unconstrained and window models Shapes of the absolute lagged changes in mortality used in the simulation study * Evaluated at =24. RMSE=root-mean square-error, based on average difference between estimated and known shape. Discussion The AIC performed badly at choosing the correct lag under the polynomial model. Increasing the lag under a polynomial model does not add parameters to the AIC, and for smoothly varying shapes is only likely to subtly change the fit (likelihood). For the unconstrained and window models the AIC/DIC found the correct lag, or neighbouring lag, in a high proportion of the simulations. The window smooth model gave the most accurate estimates of the shape. Knowing the correct lag has two advantages: the length of the effect may be of biological interest, and the overall effect of the pollutant will be more accurate. In an analysis of total mortality in Brisbane the true lags for PM 10 and NO 2 were estimated to be 39 and 9 days respectively. The estimated shape for PM 10 showed an increase in risk followed by some mortality displacement at around 25 to 40 days. Proposed future work Examine the time-varying effect of temperature, and of temperature and pollutants simultaneously. Examine the use of the DIC to choose the optimal size of the smoothing parameter W. Examine the use of an endpoint constraint in the window model. That is, setting the L+1th unconstrained parameter to zero. References 1. Zanobetti A., Schwartz J, Samoli E, et al. (2002). The temporal pattern of mortality responses to air pollution: A multicity assessment of mortality displacement. Epidemiology 13: Goodman P, Dockery D and Clancy L. (2004). Cause-specific mortality and the extended effects of particulate pollution and temperature exposure. Environmental Health Perspectives 112: Diggle P, Heagerty P, Liang KY. (2002). Analysis of Longitudinal Data. Oxford, Oxford University Press. 4. Spiegelhalter DJ, Best NG, Carlin BP, et al. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society Series B 64: Box plots of shape estimates ( L=24, W=1 ) Average AIC/DIC by lag ( W=1 ) Estimated window smooth (W=1) shapes for PM 10 (left) and NO 2 (right) =39, AUS=-0.02 (95% PI: -0.16, 0.12) =9, AUS=0.7 (95% PI: 0.5, 0.8) Detailed simulation results for Shape A Shape AShape B Uncon.Poly.Window ( W =1) Uncon.Poly.Window ( W =1) Percent of correct estimates of L =2429%0%22%33%3%34% =23,2553%0%50%49%6%39% Estimate of shape accuracy RMSE *