Day One: 8:30-12:00 Background and Overview

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

1 A B C
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
5.1 Rules for Exponents Review of Bases and Exponents Zero Exponents
Lecture 8: Hypothesis Testing
AP STUDY SESSION 2.
1
& dding ubtracting ractions.
STATISTICS Linear Statistical Models
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Media-Monitoring Final Report April - May 2010 News.
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Sampling in Marketing Research
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
PP Test Review Sections 6-1 to 6-6
Data structure is concerned with the various ways that data files can be organized and assembled. The structures of data files will strongly influence.
MM4A6c: Apply the law of sines and the law of cosines.
Briana B. Morrison Adapted from William Collins
Chi-Square and Analysis of Variance (ANOVA)
Regression with Panel Data
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Biology 2 Plant Kingdom Identification Test Review.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Artificial Intelligence
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.
Subtraction: Adding UP
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
12 System of Linear Equations Case Study
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Chapter Thirteen The One-Way Analysis of Variance.
Chapter 8 Estimation Understandable Statistics Ninth Edition
Clock will move after 1 minute
& dding ubtracting ractions.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Select a time to count down from the clock above
16. Mean Square Estimation
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Chapter 4 FUGACITY.
The Population Attributable Fraction (PAF) for Public Health Assessment: Epidemiologic Issues, Multivariable Approaches, and Relevance for Decision-Making.
Presentation transcript:

Using the Population Attributable Fraction (PAF) to Assess MCH Population Outcomes Deborah Rosenberg, PhD and Kristin Rankin, PhD Epidemiology and Biostatistics School of Public Health University of Illinois at Chicago

Day One: 8:30-12:00 Background and Overview Basic Formulas and Initial Computations Moving Beyond Crude PAFs: Organizing Multiple Factors into a Risk System Summary, Component and “Adjusted” PAFs

Background Epidemiologists most commonly use ratio measures to estimate the magnitude of an association between a risk factor and an outcome Impact measures, such as the Population Attributable Fraction (PAF), account for both the magnitude of association and the prevalence of risk in the population PAFs are underused because of methodological concerns about how to appropriately account for the multifactorial nature of risk factors in the population

Background In a multivariable context, the goal is to generate a PAF for each of multiple factors, taking into account relationships among the factors Generating mutually exclusive and mutually adjusted PAFs is not straightforward given the overlapping distributions of exposure in the population; therefore methods that go beyond usual adjustment procedures are required With appropriate methods, the PAF can be a tool for program planning and priority setting in public health since, unlike ratio measures, it permits sorting of risk factors according to their impact on an outcome

Historical Highlights Levin’s PAF (1953) “Indicated maximum proportion of disease attributable to a specific exposure” If an exposure is completely eliminated, then the disease experience of all individuals would be the same as that of the “unexposed” P(E) = prevalence of the exposure in the population as a whole p0 = prevalence of the outcome in the population as a whole p2 = prevalence of the outcome in the unexposed

Historical Highlights Miettenin (1974) Adjusted PAF = Proportion of the disease that could be reduced by eliminating one risk factor, after controlling for others factors and accounting for effect modification Bruzzi (1985)/Greenland and Drescher (1993) Summary PAF = Proportion of the disease that could be reduced by simultaneously eliminating multiple risk factors from the population Method for using regression modeling to generate PAFs Benichou and Gail (1990) Variance estimates for the adjusted and summary PAF based on the delta method

Example: Summary PAF for Three Risk Factors for a Health Outcome Components of a risk system: complete crossclassification of factors

Apportioning the Summary PAF The complete crossclassification of factors is not satisfactory because it fails to provide an overall estimate of impact for each risk factor. Methodological work has been and is still being carried out to develop approaches that apportion the Summary PAF in a way that yields estimates of impact for each of a set of risk factors

Apportioning the Summary PAF Eide and Gefeller (1995/1998) Sequential PAF = Proportion of the disease that could be reduced by eliminating one risk factor from the population after some factors have already been eliminated First Sequential PAF = the “adjusted PAF” —the particular sequential PAF in which a risk factor is eliminated first before any other factors

Apportioning the Summary PAF Ordering is imposed for eliminating risk factors from the population, while simultaneously controlling for all other factors in the model EXAMPLE (Sequence #1):Eliminate A, then B, then C Sequential PAF* (A) = (A|B, C) Sequential PAF (B) = (A U B|C) – (A|B, C) Sequential PAF (C) = (A U B U C) – (A U B|C) *First Sequential or “adjusted” PAF

Summary PAF Apportioned into Sequential PAFs for Sequence #1 Eliminate A, then B, then C

Apportioning the Summary PAF Eide and Gefeller (1995/1998) Average PAF = Simple average of all sequential PAFs Equal apportionment of risk over every possible sequence (removal orderings), since the order in which risk factors will be eliminated in the “real world” is an unknown Based on the Shapley-solution in Game Theory Method of fairly distributing the total profit gained by team members working in coalitions

Apportioning the Summary PAF: The Average PAF Six Sequences for Three Risk Factors Sequence #1: Eliminate A, then B, then C Sequence #2: Eliminate A, then C, then B Sequence #3: Eliminate B, then A, then C Sequence #4: Eliminate B, then C, then A Sequence #5: Eliminate C, then A, then B Sequence #6: Eliminate C, then B, then A There are a total of 6 sequential PAFs for each of the three risk factors. The Average PAF for each factor, then, is the simple average of all 6.

Summary PAF Apportioned into Average PAFs for Three Risk Factors

The Summary PAF: the Basis for Producing Multifactorial PAFs The Summary PAF can be apportioned into: component PAFs reflecting every possible combination of factors being considered sequential PAFs reflecting pieces of one particular sequence in which risk factors might be eliminated average PAFs reflecting estimates of the impact of eliminating multiple risk factors regardless of the order in which each is eliminated

PAFs from Different Study Designs Cross-sectional: Prevalence and measure of effect estimated from same data source Interpretation: Proportion of prevalent cases that can be attributed to exposure Cohort: Interpretation: Proportion of incident cases that can be attributed to exposure Case-Control: Prevalence of exposure among the cases must be used and the OR in place of the RR, using the rare disease assumption

Methodological Issues for the PAF in a Multivariable Context In addition to different computational approaches, decisions about how variables will be considered may be different when focusing on the PAF as compared with focusing on the ratio measures of association Differentiating the handling of modifiable and unmodifiable factors Confounding and effect modification Handling factors in a causal pathway

Analytic Considerations Variable Selection Modifiability Unmodifiable factors are only used as potential confounders or effect modifiers; PAFs not calculated Modifiable factors are factors that can possibly be altered with clear intervention strategies Classification of risk factors as unmodifiable or modifiable depends on perspective and may alter results

Analytic Considerations Model Building Differential handling of unmodifiable and modifiable factors Levels of measurement Coding choices Effect modification within modifiable factors across modifiable and unmodifiable factors within unmodifiable factors Selection of a final model may not be based on statistical significance of the ratio measure of effect Stratified models Defining the “significance” of PAFs

Analytic Considerations Presentation and Interpretation Average PAFs allow for the sorting of modifiable risk factors according to the potential impact of risk factor reduction strategies on an outcome in the population; Ratio measures only provide the magnitude of the association between a risk factor and a disease The PAF is the proportion of an outcome that could be reduced if a risk factor is completely eliminated in the population – take care not to over-interpret findings

Analytic Considerations So, why isn’t the multifactorial PAF used more commonly in the analysis of public health data? No known standard statistical packages to complete all of the steps Variance estimates for the average PAF are not yet available, either for random samples or for samples from complex designs Currently, can only report 95% confidence intervals around crude, summary, and first sequential (adjusted) PAFs While the interpretation of average PAFs is strengthened by evidence of causality, an average PAF cannot itself establish causality

Analytic Considerations As always, having an explicit conceptual framework / logic model is important for multivariable analysis Conceptualization is particularly critical when producing PAFs because decisions about variable handling and model building will determine the computational steps as well as influencing the substantive interpretation of results.

Laying the Groundwork: An Example with Crude PAFs

Overview of Attributable Risk Measures Measures based on Risk Differences Attributable Risk Attributable Fraction Population Attributable Risk Population Attributable Fraction (PAF)

Overview of Attributable Risk Measures General Interpretation Attributable Risk: The risk of an outcome attributed to a given risk factor among those with that factor Attributable Fraction: The proportion of cases of an outcome attributable to a risk factor in those with the given risk factor Pop. Attributable Risk: The risk of an outcome attributed to a given risk factor in the population as a whole Pop. Attributable Fraction (PAF): The proportion of cases of an outcome attributable to a risk factor in the population as a whole

Overview of Attributable Risk Measures Equivalent / Alternative Terminology Attributable Risk, Risk Difference Attributable Fraction, Attributable Risk % Attributable Proportion, Etiologic Fraction Pop. Attributable Risk Pop. Attributable Fraction, Population Attributable Risk %, Etiologic Fraction, Attributable Risk

Overview of Attributable Risk Measures Various Formulas For the Crude PAF

Example: Smoking and Low Birthweight Crude RR = 10.00 = 1.60 6.25

Example: Smoking and Low Birthweight Crude Association Interpretation of the RR v. the PAF Women who smoke are at 1.6 times the risk of delivering a LBW infant compared to women who do not smoke. 10.7% of LBW births can be attributed to smoking. If smoking were eliminated, we would expect 75 fewer LBW births and the LBW rate would be reduced from 7% to 6.25%

Example: Cocaine and Low Birthweight Crude Association Crude RR = 30.00 = 4.77 6.29

Example: Cocaine and Low Birthweight Crude Association Interpretation of the RR v. the PAF Women who use cocaine are at 4.77 times the risk of delivering a LBW infant compared to women who do not use cocaine. 10.2% of LBW births can be attributed to cocaine use. If cocaine use were eliminated, we would expect 71 fewer LBW births and the LBW rate would thus be reduced from 7% to 6.29%

Smoking and Low Birthweight Cocaine and Low Birthweight RR Compared to PAF Notice that although the relative risk for the association between cocaine and low birthweight is much greater than that for smoking and low birthweight, the PAF for each is quite similar—10.7 for smoking and 10.2 for cocaine.

Moving Beyond Crude PAFs Multivariable Approaches: Organizing Multiple Factors into a Risk System

PAFs Based on Organizing Multiple Factors into a Risk System Summary PAF: The total PAF for many modifiable factors considered in a single risk system Component PAF: The separate PAF for each unique combination of exposure levels in a risk system “Adjusted” PAF: The PAF for eliminating a risk factor first from a risk system Sequential PAF: The PAF for eliminating a risk factor in a particular order from a risk system; sets of sequential PAFs comprise possible removal sequences Average PAF: The PAF summarizing all possible sequences for eliminating a risk factor

Extension of Basic Formulas for Multifactorial PAFs = = Rothman Bruzzi k=Number of unique exposure categories created with a complete cross-classification of independent variables pj=proportion of total cases that are in the “jth” unique exposure category RRj=Relative risk for the “jth” exposure level compared with the common reference group Important: Note that in these formulas, the pjs are column percents

The Simple Case of 2 Binary Variables Organization into a Risk system

Equivalence of the Rothman and Bruzzi Formulas

The simple case of 2 binary variables Smoking and Cocaine Crude RR = 1.60 Crude RR = 4.77

Smoking and Cocaine Organized into a Risk System If smoking and cocaine use were recoded as a single “substance use” variable:

Components of each combination of risk factors in the smoking-cocaine risk system: pj* rpj* RRj *pj = column % **rpj = row %

Component PAFs and Summary PAF for the Smoking-Cocaine Risk System Using Rothman’s formula: The Summary PAF is the sum of component PAFs + + + = 0.16

Component PAFs and Summary PAF for the Smoking-Cocaine Risk System Using Bruzzi’s formula: With Bruzzi’s formula, the Summary PAF is not built from component PAFs

Limitation of Component PAFs from the Smoking-Cocaine Risk System While the component PAFs of a risk system sum to the Summary PAF for the system as a whole, they do not provide mutually exclusive measures of the PAF for each risk factor Here, the Summary PAF = 0.16, but the two factors overlap: the component PAFs still do not disentangle smoking and cocaine for those who do both

The “Adjusted” PAF: Obtaining a Single PAF for a Given Risk Factor The Stratified Approach: The PAF for eliminating a risk factor after controlling for other risk factors With the Rothman formula, data are organized into the more traditional strata set-up for adjustment: Not assuming homogeneity, pj & RRj are stratum-specific: Assuming homogeneity, Overall

The “Adjusted” PAF: Obtaining a Single PAF for a Given Factor The Stratified Approach If there is multiplicative effect modification in the RR... As usual, it is inappropriate to average widely varying stratum-specific RRs, say 3.0 and 0.90, because a single average would misrepresent the magnitude of the association, and sometimes, as in this example, misrepresent the direction of the association as well.

The “Adjusted” PAF: Obtaining a Single PAF for a Given Factor The Stratified Approach If there is not multiplicative effect modification in the RR... If there is no evidence of multiplicative effect modification and sample size permits, there is really nothing to be gained by not using stratum-specific estimates. Whichever formula is used, the result is a single “adjusted” PAF.

The “Adjusted” PAF: Obtaining a Single PAF for a Given Factor Reorganizing the data to get an adjusted PAF with Rothman’s formula

The “Adjusted” PAF: The PAF for Smoking, Controlling for Cocaine Use* RR=1.37 + = RR=1.36 *Using stratum-specific estimates

The “Adjusted” PAF: The PAF for Cocaine Controlling for Smoking* RR=4.33 + = RR=4.30 *Using stratum-specific estimates

The “Adjusted” PAF: Obtaining a Single PAF for a Given Risk Factor Using the Bruzzi formula, the “strata” are defined as each row of the risk system. In the smoking-cocaine risk system, then, there are 4 “strata”. For the PAF for smoking, controlling for cocaine use, the 4 ps are the 4 column percents and the 4 RRs are: rp1/rp2 rp2/rp2 rp3/rp4 rp4/rp4

For the Burzzi Formula: the RRj* and RRj~ RR=1.37 RR=1 RR=1.36 RR=1

The “Adjusted” PAF: Obtaining a Single PAF for a Given Risk Factor In the Bruzzi approach to “adjustment”, there are 3 different versions of the relative risks: RRj = the component RRs RRj* = the RRs for combinations of covariates in the absence of the factor being 'adjusted‘—in this simple example, these are the 2 RRs not involving smoking RRj~ = the RRs for the factor being 'adjusted' conditioned on combinations of the covariates—in this simple example, these are the 2 RRs for smoking in the presence and absence (conditioned) on cocaine use. These are the “stratum-specific” RRs in the classic stratified set-up

The “Adjusted” PAF: Obtaining a Single PAF for a Given Risk Factor Using the Bruzzi method: PAF for Smoking, controlling for cocaine use. PAF for cocaine, controlling for smoking.

The “Adjusted” PAF Obtaining a Single PAF for a Given Factor The Stratified Approach Notice that controlling for confounding typically reduces the PAF, just as it typically reduces the relative risk or odds ratio. Crude v. “Adjusted” PAF for smoking: 0.107 v. 0.076 Crude v. “Adjusted” for cocaine: 0.102 v. 0.099

Limitations of the “Adjusted” PAF: While adjustment methods control for other risk factors, the resulting adjusted PAFs still are not mutually exclusive and they do not meet the criterion of summing to the Summary PAF for all factors combined ≠ 0.042+0.062+0.056=0.16 0.076 + 0.099 = 0.175

Limitations of the “Adjusted” PAF: Adjustment procedures result in a PAF that taken by itself represents an estimate—perhaps unrealistic—of the impact of eliminating one exposure first in a risk system, controlling for other factors, but not considering that some of those other factors may also be eliminated. The “adjusted” PAF becomes more useful when it is considered as one element of a set of possible sequences for addressing all of the risk factors in a risk system—HOLD THIS THOUGHT

Extension to the Case of 3 Binary Variables Example: SAS Code for reformatting individual-level data for the outcome and risk factors of interest into k observations proc sort data=work.Orig_SampleLBW; by lbw smoke cocaine poverty; run; proc freq data=work.Orig_SampleLBW; tables lbw*smoke*cocaine*poverty/list;

Extension to the Case of 3 Binary Variables LBW by Smoking, Cocaine use and Poverty

Extension to the Case of 3 Binary Variables Data rearranged into “strata” in the Bruzzi sense...

Component Prevalences and Relative Risks for a Risk System with Three Variables Prevalence and RR added Example (first row): pj = 24 / 700 = 0.034 RRj = [(24/59) / (175/4605)] = 10.70

Unique Cross-Classifications of n Variables For binary variables, the # of strata k = 2n, where n=# variables Example: Smoke (1=Yes, 0=No), Cocaine (1=Yes, 0=No), Poverty(1=Yes, 0=No) In general, K = the product of the # of levels for each variable; e.g. in Bruzzi, et al (1985): k = 2*3*3*4 = 72 smoke cocaine poverty yes no k = 23 = 8

Component PAFs for Entire Risk System Summary PAF = 0.46

Summary and Adjusted PAFs for a 3 Factor Risk System Discuss Worksheet A in Supplementary Excel File Component, Adjusted and Summary PAF calculations for smoke, cocaine, and poverty

Using Modeling to Compute Summary and “Adjusted” PAFs Advantages of Modeling for Obtaining Intermediate Estimates for PAFs—as usual in comparison to stratified methods Modeling is not as sensitive to sparse data in individual cells when there are many strata If you choose to consider confounding and effect modification in the same model, estimates are generated more easily Note: Using an assumption-free approach, all variables are treated as effect modifiers (but this method breaks down quickly as there are more variables in the risk system)

Assumption-Free Approach Using Fully Specified Model /*Binomial Regression – Directly estimate RRs*/ proc genmod data=LBW desc; model lbw=smoke cocaine poverty smoke*cocaine smoke*poverty cocaine*poverty smoke*cocaine*poverty/dist=bin link=log; weight freq; run; /*Logistic Regression – ORs as estimates of RRs*/ proc logistic data=LBW desc; smoke*cocaine*poverty; weight freq; run;

Results from Fully-Specified Binomial Regression Model Response Profile Ordered Value lbw Total Frequency 1 700 2 9300 PROC GENMOD is modeling the probability that lbw='1'. Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 8 4816.8235 602.1029 Scaled Deviance Pearson Chi-Square 9999.9937 1249.9992 Scaled Pearson X2 Log Likelihood   -2408.4117

Results from Fully-Specified Binomial Regression Model Analysis of Parameter Estimates Parameter DF Estimate Standard Error Chi-Square Pr > ChiSq Intercept 1 -3.2701 0.0741 1945.31 <.0001 smoke 0.5869 0.1386 17.94 cocaine 1.8201 0.2140 72.36 poverty 0.8447 0.0931 82.27 smoke*cocaine -0.3155 0.2902 1.18 0.2769 smoke*poverty -0.5306 0.1836 8.35 0.0039 cocaine*poverty -0.6844 0.2951 5.38 0.0204 smoke*cocaine*poverty 0.6494 0.4020 2.61 0.1062 Scale 1.0000 0.0000

Component and Summary PAFs from Fully-specified Model Discuss Worksheet B in Supplementary Excel File: Summary PAFs from Fully Specified Models

Re-examining Fully-Specified Model Analysis of Parameter Estimates Parameter DF Estimate Standard Error Chi-Square Pr > ChiSq Intercept 1 -3.2701 0.0741 1945.31 <.0001 smoke 0.5869 0.1386 17.94 cocaine 1.8201 0.2140 72.36 poverty 0.8447 0.0931 82.27 smoke*cocaine -0.3155 0.2902 1.18 0.2769 smoke*poverty -0.5306 0.1836 8.35 0.0039 cocaine*poverty -0.6844 0.2951 5.38 0.0204 smoke*cocaine*poverty 0.6494 0.4020 2.61 0.1062 Non-significant interaction terms could be dropped from model

Reduced Model proc genmod data=LBW desc; model lbw=smoke cocaine poverty smoke*poverty cocaine*poverty/dist=bin link=log; weight freq; run; Analysis of Parameter Estimates Parameter DF Estimate Standard Error Chi-Square Pr > ChiSq Intercept 1 -3.2506 0.0712 2083.76 <.0001 smoke 0.5169 0.1251 17.08 cocaine 1.6369 0.1482 121.99 poverty 0.8111 0.0903 80.60 smoke*poverty -0.3981 0.1643 5.87 0.0154 cocaine*poverty -0.3407 0.2012 2.87 0.0905 Non-significant interaction term could be dropped from model

Analysis of Parameter Estimates Final Model proc genmod data=LBW desc; model lbw=smoke cocaine poverty smoke*poverty / dist=bin link=log; weight freq; run; Analysis of Parameter Estimates Parameter DF Estimate Standard Error Chi-Square Pr > ChiSq Intercept 1 -3.2341 0.0704 2110.99 <.0001 smoke 0.5741 0.1203 22.79 cocaine 1.4372 0.0980 214.96 poverty 0.7787 0.0884 77.69 smoke*poverty -0.4778 0.1568 9.28 0.0023

Component and Summary PAFs from Final Reduced Model Model Discuss Worksheet C in Supplementary Excel File: Summary PAFs from Final Reduced Models

Exercise 1 Discussion of Exercise 1 Day One: 1:00-3:15 Exercise 1 Discussion of Exercise 1

Day One: 3:15-5:00 Overview of Sequential and Average PAFs: Example with 2 modifiable risk factors Case study with 3 factors: -2 modifiable factors, 1 unmodifiable factor -3 modifiable factors Introduction of Exercise 2

Sequential PAFs (PAFSEQ) for the Smoking-Cocaine Risk System For the smoking-cocaine risk system, there are 2 possible sequences: Eliminate smoking first, controlling for cocaine use, then eliminate cocaine use Eliminate cocaine use first, controlling for smoking, then eliminate smoking And within each sequence, there are two sequential PAFs

Sequential PAFs (PAFSEQ) for the Smoking-Cocaine Risk System The PAFSEQ for eliminating smoking, controlling for cocaine use: PAFSEQ1a (S|C) = 0.076 The PAFSEQ for eliminating cocaine use after smoking has already been eliminated is the remainder of the Summary PAF PAFSEQ1b = PAFSUM – PAFSEQ1a (S|C) = 0.16 – 0.076 = 0.084

Sequential PAFs (PAFSEQ) for the Smoking-Cocaine Risk System The PAFSEQ for eliminating cocaine use, controlling for smoking: PAFSEQ2a (C|S) = 0.099 The PAFSEQ for eliminating smoking after cocaine use has already been eliminated is the remainder of the Summary PAF PAFSEQ2b = PAFSUM – PAFSEQ2a (C|S) = 0.16 – 0.099 = 0.061

Sequential PAFs (PAFSEQ) for the Smoking-Cocaine Risk System By definition, the sequential PAFs within the two possible sequences sum to the Summary PAF Smoking First Cocaine Use First 0.076 + 0.084 = 0.16 0.099 + 0.061 = 0.16

Average PAF (PAFAVG) for the Smoking-Cocaine Risk System While the sequential PAFs for each sequence sum to the Summary PAF, they still do not provide a overall comparison of the impact of smoking and cocaine use regardless of the order in which they are eliminated That is, regardless of when cocaine might be eliminated, what would the impact of eliminating smoking be on average?

Average PAF (PAFAVG) for the Smoking-Cocaine Risk System To calculate an average, the sequential PAFs are rearranged, grouping the two for smoking together and the two for cocaine together: Eliminating smoking first, averaged with eliminating smoking second Eliminating cocaine use first, averaged with eliminating cocaine use second

Average PAF (PAFAVG) for the Smoking-Cocaine Risk System Averaging Sequential PAFs Average PAF for Smoking: = Average PAF for Cocaine Use:

Average PAFs for the Smoking-Cocaine Risk System The Average PAFs for each factor in the risk system are mutually exclusive and their sum equals the Summary PAF: 0.0685 + 0.0915 = 0.16

Case Study: Example with Three Factors Scenario: You are asked to prioritize spending for interventions that target the high rate of low birth weight (LBW) in your jurisdiction. Data: You have a data set with relatively reliable data on smoking during pregnancy, cocaine use during pregnancy and poverty level. Method: You would like to use one of the methods you just learned for calculating the PAFs for each of these factors.

Modifiable and Unmodifiable Risk Factors Using a Modeling Approach Within one model, we can differentiate between those factors considered to be modifiable and those factors considered to be unmodifiable While this does not change the model, this differentiation has an impact on the resulting summary, sequential, and average PAFs due to how relative risks are calculated

Decisions for PAF Analysis Would you consider each of the following variables unmodifiable or modifiable for preventing LBW? Smoking (1=Smoking during pregnancy, 0=No smoking) Cocaine (1=Cocaine use during pregnancy, 0=No cocaine) Poverty (1=Below Federal Poverty Level, 0=Above FPL) What type(s) of PAF is/are most appropriate? Adjusted (only focused on one factor, controlling for others) Sequential (specifying one ordering for targeting factors) Average (account for all possible sequences of eliminating each factor)

Descriptive Statistics for Case Study

Case Study Part I Considering Poverty as Unmodifiable Calculating Sequential and/or Average PAFs for Smoking and Cocaine Use Considering Poverty as Unmodifiable

Sequential PAFs for the Smoking-Cocaine-Poverty Risk System, Considering Poverty as Unmodifiable With 3 factors, but only 2 of them modifiable, there are 2 possible sequences: Eliminate smoking first, controlling for cocaine use and poverty, then eliminate cocaine use Eliminate cocaine use first, controlling for smoking and poverty, then eliminate smoking And within each sequence, there are two sequential PAFs

SAS Code: Obtaining Prevalences and Beta Estimates for Smoke, Cocaine and Poverty /*Create a listing of the frequencies for each possible combination of smoke, coke, poverty for LBW cases to calculate proportions*/ proc freq order=formatted; tables poverty*smoke*cocaine/list nopercent; where lbw=1; run; /*Binomial regression to obtain RRs*/ proc genmod; title2 “RRs for Smoke and Coke with LBW, controlling for Poverty"; model lbw = smoke cocaine poverty smoke*poverty /dist=bin link=log obstats; /*Binomial distribution*/

Discuss Worksheets D and E in Supplementary Sequential PAFs for the Smoking-Cocaine-Poverty Risk System, Considering Poverty as Unmodifiable Discuss Worksheets D and E in Supplementary Excel File: Calculations for 1st Sequential PAFs, Summary PAFs, and Average PAFs for Smoking and Cocaine, Controlling for Poverty

PAFSEQ for Smoking and Cocaine, Considering Poverty as Unmodifiable Sequence 1: Smoking, THEN Cocaine PAFSEQ1a: (S | C U P)= 0.074 PAFSEQ1b : (C U S | P) – (S | C U P) = 0.156 – 0.074= 0.082 Sequence 2: Cocaine, THEN Smoking PAFSEQ2a : (C | S U P) = 0.098 PAFSEQ2b: (S U C | P) – (C | S U P) = 0.156 - 0.098= 0.058 The Summary PAF includes only smoking and cocaine, since poverty is unmodifiable.

PAFSEQ for Smoking and Cocaine, Considering Poverty as Unmodifiable Smoking THEN Cocaine, Controlling for Poverty Cocaine THEN Smoking, Controlling for Poverty PAFSEQ2 PAFSUM=0.156 PAFSUM=0.156 PAFAGG=0.15

Average PAF (PAFAVG) Eide (1995): Based on Game Theory according to Cox’s Theorem (1984) for risk allocation (attributable risk among the exposed) , where “n” is the number of modifiable risk factors in the risk system, “w” is the number of unique removal sequences for all variables in risk system and “i” represents a specific variable in the system Note: Average PAF is sometimes called the “partial” attributable fraction

Average PAFs for Smoking and Cocaine, Controlling for Poverty Average PAF for Smoking PAFAVG: ((PAFSEQ1a+PAFSEQ2b)/2) PAFAVG : ((0.074 + 0.058 ) / 2) = 0.066 Average PAF for Cocaine PAFAVG: ((PAFSEQ1b+PAFSEQ2a)/2) PAFAVG : ((0.098 + 0.082 ) / 2) = 0.090

Considering Poverty as Modifiable Case Study Part II Calculating Sequential and/or Average PAFs for Smoking, Cocaine Use, and Poverty Considering Poverty as Modifiable

Sequential PAFs (PAFSEQ) for the Smoking-Cocaine Risk System For the smoking-cocaine-poverty risk system, there are 6 possible sequences: Smoking, cocaine use, poverty Smoking, poverty, cocaine use Cocaine use, smoking, poverty Cocaine use, poverty, smoking Poverty, smoking, cocaine use Poverty, cocaine use, smoking And within each sequence, there are three sequential PAFs

SAS Code: Obtaining Prevalences and Beta Estimates for Smoke, Cocaine and Poverty /*Create a listing of the frequencies for each possible combination of smoke, coke, poverty for LBW cases to calculate proportions*/ proc freq order=formatted; tables poverty*smoke*cocaine/list nopercent; where lbw=1; run; /*Binomial regression to obtain RRs*/ proc genmod; title2 “RRs for Smoke and Coke with LBW, controlling for Poverty"; model lbw = smoke cocaine poverty smoke*poverty /dist=bin link=log obstats; /*Binomial distribution*/

Sequential PAFs n=4, n!= 4x3x2x1 = 24 unique sequences Q: How many unique sequences will there be for removing risk factors from the risk system? A: n!, where n=# of modifiable risk factors in system Ex: n=3, n!= 3x2x1 = 6 unique sequences n=4, n!= 4x3x2x1 = 24 unique sequences n=5, n!= 5x4x3x2x1 = 120 unique sequences etc… To calculate the PAFSEQ for factors removed second and third in a 3 variable risk system, it is necessary to compute the PAF for every pair of two factors combined, adjusting for the third factor. These are intermediate Summary PAFs.

Discuss Worksheets F and G in Supplementary Sequential PAFs for the Smoking-Cocaine-Poverty Risk System, Considering Poverty as Modifiable Discuss Worksheets F and G in Supplementary Excel File: Calculations for 1st Sequential PAFs, Summary PAFs, and Average PAFs for Smoking, Cocaine, and Poverty

PAFSEQ for Smoking Removed First Sequence 1: Smoking, THEN Cocaine, THEN Poverty PAFSEQ1a: (S | C U P) = 0.074 PAFSEQ1b: (S U C | P) – (S | C U P) = 0.156 – 0.074 = 0.082 PAFSEQ1c: (S U C U P) – (S U C | P) = 0.441 – 0.156 = 0.286 Sequence 2: Smoking, THEN Poverty, THEN Cocaine PAFSEQ2a: (S | P U C)= 0.074 PAFSEQ2b: (S U P | C) – (S | P U C) = 0.383 – 0.074 = 0.310 PAFSEQ2c: (S U P U C) – (S U P | C) = 0.441 – 0.383 = 0.058

PAFSEQ for Smoking Removed First Smoking THEN Cocaine, THEN Poverty Smoking THEN Poverty, THEN Cocaine PAFSEQ2

PAFSEQ for Cocaine Removed First Sequence 3: Cocaine, THEN Smoking, THEN Poverty PAFSEQ3a: (C | S U P)= 0.098 PAFSEQ3b: (C U S | P) – (C | S U P) = 0.156 – 0.098 = 0.058 PAFSEQ3c: (C U S U P) – (C U S| P) = 0.441 – 0.156 = 0.286 Sequence 4: Cocaine, THEN Poverty, THEN Smoking PAFSEQ4a : (C | P U S)= 0.098 PAFSEQ4b: (C U P | S) – (C | P U S) = 0.355 – 0.098 = 0.257 PAFSEQ4c: (C U P U S) – (C U P | S) = 0.441 – 0.355 = 0.086

PAFSEQ for Cocaine Removed First Cocaine THEN Smoking, THEN Poverty Cocaine THEN Poverty, THEN Smoking PAFSEQ2

PAFSEQ for Poverty Removed First Sequence 5: Poverty, THEN Smoking, THEN Cocaine PAFSEQ5a: (P | S U C) = 0.275 PAFSEQ5b: (P U S | C) – (P | S U C) = 0.383 – 0.275 = 0.108 PAFSEQ5c: (P U S U C) – (P U S | C) = 0.441 – 0.383 = 0.058 Sequence 6: Poverty, THEN Cocaine, THEN Smoking PAFSEQ6a: (P | C U S)= 0.275 PAFSEQ6b: (P U C | S) – (P | C U S) = 0.355 – 0.275 = 0.080 PAFSEQ6c: (P U C U S) – (P U C | S) = 0.441 – 0.355 = 0.086

PAFSEQ for Poverty Removed First Poverty THEN Smoking, THEN Cocaine Poverty THEN Cocaine THEN Smoking PAFSEQ2

PAFAVG for Smoking, Cocaine and Poverty (6 Sequential PAFs in each Average, 4 are Unique) Average PAF for Smoking PAFAVG = (PAFSEQ1a +PAFSEQ2a+PAFSEQ3b+PAFSEQ4c+PAFSEQ5b+PAFSEQ6c) / 6 PAFAVG = (2(0.074) + 0.058 + 0.108 + 2(0.086)) / 6) = 0.081 Average PAF for Cocaine (PAFSEQ1b +PAFSEQ2c+PAFSEQ3a+PAFSEQ4a+PAFSEQ5c+PAFSEQ6b) / 6 PAFAVG = (2(0.098)+0.082+0.080+2(0.058)) / 6 = 0.079 Average PAF for Poverty PAFAVG = (PAFSEQ1c+PAFSEQ2b+PAFSEQ3c+PAFSEQ4b+PAFSEQ5a+PAFSEQ6a) / 6 PAFAVG = (2(0.275)+0.310+0.257+2(0.286)) / 6 = 0.281

Average PAFs for all possible models Smoke and Coke, Controlling for Poverty Smoke and Coke Smoke, Coke and Poverty PAFSUM=0.16 PAFSUM=0.156 PAFSUM=0.441

Smoke and Coke, Controlling for Poverty Average PAFs for all possible models – with no interaction term for smoke*poverty Smoke and Coke, Controlling for Poverty Smoke and Coke Smoke, Coke and Poverty PAFSUM=0.160 PAFSUM=0.155 PAFSUM=0.393

Average PAFs stratified by poverty PAFSUM=0.088 PAFSUM=0.245 Poverty = Yes Poverty = No

Introduction of Exercise 2

Day Two: 8:00-12:00 Exercise 2 and Discussion of Exercise 2 Brief Review Model Building Issues Exercise 3

Review The Population Attributable Fraction (PAF) could be a useful tool to inform priority-setting and development of targeted interventions in public health since it estimates the potential impact of risk reduction in the population on the occurrence of a health outcome The PAF incorporates both a measure of association between a risk factor and an outcome and the prevalence of the risk factor in the population as a whole.

Review The Summary PAF is the proportion of an outcome that could be reduced by simultaneously eliminating from the population all modifiable factors in a risk system. The Summary PAF can be partitioned into: Component PAFs Sequential PAFs corresponding to a particular removal sequence Average PAFs The modifiable factors in the risk system can be “adjusted” both for each other and for other unmodifiable factors

Partitioning of the Summary PAF Review Partitioning of the Summary PAF for a Risk System Component PAFs Sequential PAFs for Average PAFs One Possible Sequence

Review The component PAFs reflect every combination of the modifiable factors in the risk system and do not yield any factor-specific PAF Sequential PAFs yield factor-specific PAFs, but these factor-specific PAFs vary across the possible removal sequences; the first sequential PAF in any sequence is what is commonly called the “adjusted” PAF Component PAFs and Sequential PAFs for a given sequence are not mutually exclusive estimates of the impact of eliminating modifiable factors regardless of whether and when other modifiable factors are also eliminated.

Review The number of possible sequences is a function of the number of variables in the risk system and becomes large quickly as the number of variables increases. Number of Risk Factors Number of Possible Removal Orderings / Sequences Number of Unique Sequential PAFs 2 2! = 2 3 3! = 6 4 4! = 24 8 5 5! = 120 16 6 6! = 720 32 7 7! = 5,040 64

Review The number of average PAFs equals the number of variables in a risk system. Average PAFs, by considering every possible sequence, yield mutually exclusive estimates, making comparisons of the potential impact of risk reduction intervention strategies possible The average PAF may be a better measure of impact than the first sequential (“adjusted”) PAF since typically there are multiple interventions operating simultaneously—risk reduction activities are unordered and often intersect

Review Sequence X: Factor M1Mn, controlling for UM1UMz PAFSEQXa: (M1| M2 U  U Mn U UM1 U  U UMz) (“adjusted” PAF for M1) PAFSEQXb: (M1 U M2 | M3 U  U Mn U UM1 U  U UMz) – (M1| M2 U  U Mn U UM1 U  U UMz)  PAFSEQXn: M1 U  U Mn | UM1 U  U UMz) – (M1 U  U Mn-1 | Mn U UM1 U  U UMz) The 2nd, 3rd, to n-1th sequential PAFs are the remainders from intermediate Summary PAFs; the nth sequential PAF is the remainder from the total Summary PAF

Review Computation of the sequential PAFs within particular removal sequences becomes cumbersome as the number of variables, both modifiable and unmodifiable increases Intermediate Summary PAFs are required for differing subsets of modifiable variables in a risk system

Review Whether computing crude, “adjusted”, summary, or sequential PAFs, and whether using a stratified or modeling approach, some form of either the Rothman or Bruzzi formulas can be used.

Model Building Issues and Strategies in the Context of Estimating PAFs Reporting PAFs

Model Building Issues and Strategies Within one model, we can differentiate between those factors considered to be modifiable and those factors considered to be unmodifiable The differentiation between modifiable and unmodifiable variables may change the final model since this differentiation has an impact on decisions as to whether the variable is included in a final model In addition, the resulting summary, sequential, and average PAFs will vary depending on which variables are designated as modifiable because of how relative risks are calculated

Model Building Issues and Strategies Variable Selection Modifiability Unmodifiable factors are only used as potential confounders or effect modifiers; PAFs not calculated Modifiable factors are factors that can possibly be altered with clear intervention strategies Being in the pool of modifiable factors not only influences final PAF estimates, but also may change level of measurement, choice of reference level, and handling of confounding and effect modification

Model Building Issues and Strategies Differential handling of unmodifiable and modifiable factors Levels of measurement: Modifiable variables cannot be continuous Modifiable variables can be ordinal or nominal Sets of dummy variables can be used, but for modifiable factors it means there will be a separate PAF for each dummy variable Unmodifiable variables can be at any level of measurement, although if there is effect modification with a modifiable factor, recoding into categories will be necessary for continuous variables

Model Building Issues and Strategies Choice of Reference Level for Comparison Since PAFs quantify the impact of complete elimination of a risk factor, it may be more realistic to define reference groups that pull back from this maximum: Some Examples: >= 2 days exercise, rather than >= 5 days exercise <=1 medical risk factor rather than 0 medical risks

Model Building Issues and Strategies Reference Groups for Modifiable Factors More restrictive level of the reference group could lead to both a higher prevalence of exposure and stronger measure of effect, resulting in an inflated PAF Importance of distinguishing between never exposed and formerly exposed Use conceptual framework and balance evidence with realistic goals

Model Building Issues and Strategies Effect modification within modifiable factors—use either a product term or could use common reference coding to create a set of dummy variables across modifiable and unmodifiable factors—this might point to doing modeling stratified by the unmodifiable factor involved in the interaction; if the unmodifiable variable is continuous, it would have to be recoded into categories for stratification within unmodifiable factors—use a product term or ignore the interaction if it does not have an impact on the measures of association for the modifiable factors

Model Building Issues and Strategies Parsimony is not as important when building a model as a step toward obtaining average PAFs; that is, variables with insignificant RRs / ORs may be included in a final model if the resulting PAFs based on them are meaningfully large.

Model Building Issues and Strategies Criteria for selection of variables for a final model: The prevalence estimates themselves might also be used in to inform decisions about which variables will stay in a model Criteria for Modifiable Risk Factors Staying in a Model 1st Sequential PAF 95% CI Does Not Include 0 95% CI Includes 0 Significant RR / OR Close to the null ? Far from the null Not Significant

Model Building Issues and Strategies For unmodifiable factors, statistical significance may be more important as it is one component of indicating the presence of confounding of the effects of the modifiable factors The prevalence of the unmodifiable factors in the population is not of interest since they are not part of the risk system for which PAFs are being estimated

Model Building Issues and Strategies Possible Model building strategies Build models with one modifiable factor at a time plus the unmodifiable factors Build models with subsets of modifiable factors that are within a domain (substantively related) plus the unmodifiable factors Build models starting with all modifiable and unmodifiable factors, and then use a manual backward elimination approach

Moving from Modeling to Reporting of PAFs For any model building strategy: Choose final pool of modifiable factors based on the significance of the first sequential PAFs and 95% CIs, or some other explicitly decided upon criteria Calculate average PAFs for all modifiable factors in the final model, but report only those with values above some threshold, e.g. 2%, 5%, 10%?

Moving from Modeling to Reporting of PAFs Even with careful choice of reference levels, average PAFs are probably over-estimates of the expected reduction in an outcome since they assume that all of the factors in a risk system can be completely eliminated from the population

Moving from Modeling to Reporting of PAFs Average PAFs can be refined by differentially weighting removal sequences to reflect issues such as funding streams or political will, since in reality not all removal sequences are equally likely, or by incorporating measures of uptake and efficacy of public health interventions (this is beyond the scope of this training)

Moving from Modeling to Reporting of PAFs Variance estimates for Average PAFs need to be developed and then a consensus needs to be reached for the interpretation of resulting confidence intervals. As always, narrower CIs will mean increased reliability The CIS across multiple PAFs will undoubtedly overlap. What will this mean for informing the prioritization process across modifiable factors? Will a CI with a lower bound < 1 mean a factor is not significant and therefore not a priority?

Presentation and Interpretation

Total PAF=0.457

Presentation of Sequential PAFs for the Smoke, Coke and Poverty Risk System

Interpretations of Sequential PAFs from the Smoke- Coke- Poverty Risk System PAFSEQ (smoking 3rd, after coke and poverty) =0.09 An additional nine percent of LBW cases can be attributed to smoking after cocaine use and poverty have already been eliminated from the population of pregnant women The expectation is that an additional 63 cases (0.09*700) of LBW in this sample of pregnant women would have been prevented had smoking been eliminated from the population after the elimination of cocaine and poverty

Interpretation of Average PAFs from the Smoke, Coke, and Poverty Risk System PAFAVG (Smoking) = 0.06 On average, regardless of the order in which risk factors are removed from the risk system, the expectation is that six percent of LBW cases would be prevented if smoking is eliminated from the population, while also considering the impact of cocaine and poverty PAFAVG (Cocaine) = 0.09 On average, nine percent of LBW cases would be prevented by the additional elimination of cocaine exposure from the population after a random collection of exposures has already been eliminated.

Presentation Issues to consider Is there any time when displaying stratified PAFs would be appropriate? Targeting an intervention to a particular risk group Displaying the interaction effects between variables Others?

Interpretation Issues to Consider PAF should not be mis-interpreted as the percent of diseased who have the risk factor of interest or the percent of cases for which an identifiable risk factor can be found. Example: PAF for impact of 10 factors on breast cancer=0.25. Incorrect: Although various risk factors have been identified as causes of breast cancer, the fact remains that in 75% of all breast cancer no identifiable risk factor can be found. Incorrect: Only 25 percent of breast cancer cases can be attributed to one or more risk factors, meaning that the majority of cancers occur in women with no risk factors. Rockhill, et al., 1998

Interpretation Issues to Consider Rothman: With a PAF of 25%, the following interpretation is not completely true: 25% of disease would be reduced if X risk factor were eliminated. Assumes all biases are absent Assumes that absence of risk factor would not expand person-years at risk, which could subsequently lead to more cases (in the case of competing risks) Rothman, & Greenland, 1998

Interpretation Issues to Consider Rothman Example 1: PAF=0.25 for smoking in relation to coronary deaths. Elimination of smoking could lead to less lung cancer deaths, which would lead to more people living long enough to die by coronary heart disease. Therefore, “25% fewer coronary deaths would have occurred had these doctors not smoked” is a little misleading. Rothman Example 2: PAF=0.20 for spermicide in relation to Down’s syndrome Elimination of spermicide use could lead to more pregnancies, which would lead to more Down’s syndrome cases. Therefore, “20% fewer Down’s syndrome cases would have occurred had the couple not used spermicide” is a little misleading.

Exercise 3

Day Two: 1:00-2:00 Discuss Exercise 3

Interactive Model Building: Demonstration and Exercise Day Two: 2:00-4:30 Interactive Model Building: Demonstration and Exercise