Applied Epidemiologic Analysis Fall 2002 Applied Epidemiologic Analysis Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia.

Applied Epidemiologic Analysis Fall 2002 Applied Epidemiologic Analysis Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia Taylor Chelsea MorroniJudith Weissman

Applied Epidemiologic Analysis Fall 2002 Lecture 11 Standardization, sampling fractions, and multilevel analysis Goals: To review motivation for methods of standardization of study data To understand the connection between standardization of study data to match the population and weighted statistical analysis To review the connection between clustered sampling and the violation of the assumption of independence of observations To see how clustered data can be informative about group/area effects and differences in impact of exposures To see how multilevel clustered data relate to ecologic analysis

Applied Epidemiologic Analysis Fall 2002 Why standardize measures of incidence or exposure effects? Measures of incidence and exposure effects as estimated in a study are inevitably influenced by the population strata as represented in the study sample. Unless these measures are constant across strata we can expect different estimates associated with different study designs and reference populations. Any comparison across populations (e.g. state or country estimates), and any projection to future or past populations, will need to take these changing population compositions into account.

Applied Epidemiologic Analysis Fall 2002 A sample with 100 subjects in each age group; not the same distribution as the population * standardized incidence proportion

Applied Epidemiologic Analysis Fall 2002 Adjusting incidence or prevalence proportions to the population distribution of strata requires multiplying each stratum’s estimate by the relative sampling fraction, summing, and dividing by the sum of the strata sampling fractions. Alternatively, each observation in each stratum may be weighted by the same relative sampling fraction, and these summed and the total divided by the sum of the relative sampling fractions (= N). Equivalent methods of standardization

Applied Epidemiologic Analysis Fall 2002 Representing the population in multivariate analyses; weighting data * standardized incidence proportion

Applied Epidemiologic Analysis Fall 2002 Standardizing risk and rate ratios Both utilize comparable means: multiplying individual observations or strata estimates by relative sampling ratios will standardize risk ratios Special considerations for rate ratios –Require strata-specific values for person-time for determination of the appropriate relative sampling fraction –These values will be used to standardize both the numerator and denominator of rate ratios –Doing so assumes that exposure does not effect the strata person-time substantially (e.g. rare disease assumption)

Applied Epidemiologic Analysis Fall 2002 Standardized measures in epidemiology differ from standardized scores/coefficients in statistics “Standard” or z scores A correlation coefficient is a measure of association that has substituted standard deviation units for the original units of measurement A regression coefficient (or an odds ratio) expresses a relationship as a change in dependent variable units per unit change in the independent variable.

Applied Epidemiologic Analysis Fall 2002 In what sense are epidemiological effect measures standardized measures? Note that rate ratios are “unit free” by including disease per person-time unit in both numerator and denominator Risk ratios are similarly unit free by including disease per person in both numerator and denominator. An advantage is that rate ratios and risk ratios can be compared for different outcomes, different exposures, or even both.

Applied Epidemiologic Analysis Fall 2002 Rate and risk differences and OLS regression coefficients In contrast to rate and risk ratios, rate and risk differences retain the original units. It is possible that they will be more constant across populations because they are less likely to change with a change in the distribution of other risks for the disease. In that sense, they are much like regression coefficients in OLS (literally, they are regression coefficients if OLS is applied to a disease outcome) which are also expected to be more stable across populations.

Applied Epidemiologic Analysis Fall 2002 Advantages and disadvantages of standardized coefficients Risk and rate ratios and correlation coefficients share advantages and disadvantages: They are unit-free and thus comparable across changes in variables (e.g. disease and/or exposure) Even when variables are the same, they are likely to vary across populations with different distributions of the same or other relevant variables.

Applied Epidemiologic Analysis Fall 2002 Other circumstances when differential sampling fractions may be employed In many study designs certain subpopulations may be deliberately over-represented in the sampling. This is particularly likely when these sub-populations are to be the subject of special study and representation at the overall sampling fraction rate would yield too small a sample for adequate statistical power.

Applied Epidemiologic Analysis Fall 2002 Over-sampling of particular strata Often it is desired to estimate the incidence proportion, or measures of effect for the entire population. In these circumstances, the relative sampling fraction used as a weight for individual observations will accomplish this goal. Virtually all statistical programs provide for use of such weights, and there is no constraint on the complexity of the model that may be estimated.

Applied Epidemiologic Analysis Fall 2002 Clustering Ordinary statistical analyses assume that study participants are individually randomly sampled from the population; observations are independent. Whenever there is observational clustering on predictors of the DV that are NOT included in the model, standard error of estimates will be too small. Such clustering occurs when study participants are obtained from “group settings” such as different diagnostic or treatment sources, whole classrooms or other groups, or neighborhoods that may differ on relevant variables.

Applied Epidemiologic Analysis Fall 2002 Clusters at the sampling stage Often in the past these clusters might be ignored. The availability of statistical programs to properly analyze them has created both more proper statistical estimates (especially of confidence limits) and new awareness of important substantive questions When there are few (e.g. <10) clusters, it may be more efficient and informative to include “membership” as a categorical variable in the analysis.

Applied Epidemiologic Analysis Fall 2002 Variables measured at the cluster level Differences between clusters on variables measured on the study individuals such as average age, average education, ethnic background, mean score on a test Differences between clusters on variables that either: –Represent cluster properties per se: weather, urbanization, location, pollution, teacher’s training, doctors per patient –Variables which could be measured on the individual level but are not study variables: % voting, mean persons per room, % of babies born with low birth weight

Applied Epidemiologic Analysis Fall 2002 Goals of multilevel analysis statistical programs Produce proper standard errors. Ask new kinds of questions: –Questions at the aggregate or cluster level –Questions about effects of aggregate characteristics on individual participants (at the individual level) –Questions about different individual level effects that depend on the aggregate or cluster in which they appear

Applied Epidemiologic Analysis Fall 2002 Goals of multilevel analysis statistical programs Questions at the aggregate or cluster level Are there differences in mortality rates (or disease markers) associated with the medical systems from which they were recruited for persons with a given disease? Questions about effects of aggregate characteristics on individual participants (at the individual level) Does the risk for disease depend on the average risk of those around them (assuming a non-infectious disease where this question is particularly interesting)? Questions about different individual level effects depending on the aggregate or cluster in which they appear Are there differences in the relationship between use of a given therapy and outcome for individuals in different treatment settings?

Applied Epidemiologic Analysis Fall 2002 Multilevel analyses as “random coefficient” methods Multilevel analyses are sometimes referred to as “random coefficient” methods because it can answer these new types of questions: As in traditional analyses, the individual level outcome variable is viewed as a “random” - to be predicted – variable. Additionally, the variation in relationships of, e.g. exposures to diseases, as reflected in different cluster estimates of effects (coefficients) may also be viewed as a “random” – to be predicted – variable.

Applied Epidemiologic Analysis Fall 2002 Multilevel regression analysis Use maximum likelihood techniques (including “empirical Bayes” estimation). Examine possibility that there may be different effects of exposures in different contexts. Analyses are carried out at the cluster level AND the individual participant level.

Applied Epidemiologic Analysis Fall 2002 First stage: The intraclass correlation (ICC) reflects the fraction of disease variance associated with the cluster differences. Second stage: Adds individual or cluster level “fixed” independent variables to the prediction. Third stage: Add other variables that may characterize clusters. Fourth stage: Add potential interactions between individual and aggregate level variables. Multilevel regression analysis, continued

Applied Epidemiologic Analysis Fall 2002 Neighborhood socioeconomic status and all-cause mortality Hans Bosma, H. Dike van de Mheen, Gerard J. J. M. Borsboom, and Johan P. Mackenbach, American Journal of Epidemiology, 2001, 153, 363-371. 8,506 participants in a survey of quality of life and contextual factors in Eindhoven, the Netherlands 86 neighborhood clusters All-cause mortality from municipal registers matched to individuals Four SES indicators  % with primary schooling only  % unskilled laborers  % unemployed or disabled  % with severe financial problems

Applied Epidemiologic Analysis Fall 2002 Individual education High Intermediately high Intermediately low Low (Recolored from original) FIGURE 1. Percent deceased during follow-up by individual and neighborhood educational level. Estimated for men aged 49 years without baseline diseases (n = 6,506 deaths).

Applied Epidemiologic Analysis Fall 2002 An example of a policy-effect analysis in public health Averett SL, Rees DI, Argys LM. 2002 The impact of government policies and neighborhood characteristics on teenage sexual activity and contraceptive use. American Journal of Public Health, 92, 1773-1778. In this study of teenage sexual activity and contraceptive use the predictors included both individual levels variables such as religious background, parental education, and section of the US from which the subsample was drawn (and associated differences in family planning service availability), neighborhood characteristics such as median income, racial composition.

Applied Epidemiologic Analysis Fall 2002 Ecologic studies Individual level data are not available. All data, including the dependent variable, are measured only at the aggregate or cluster level. Still attempt to make some conclusions about individuals, since causal impacts on disease operate at the individual level. These could be thought of as multilevel studies without the individual level data.

Applied Epidemiologic Analysis Fall 2002 Such analyses may address public health and epidemiological issues otherwise neglected See Diez-Roux AV. 1998. Bringing context back into epidemiology: variables and fallacies in multilevel analysis. American Journal of Public Health, 88, 216-222.

Applied Epidemiologic Analysis Fall 2002 Ecologic studies Shown by Robinson (1950) that conclusions about individuals based ecologic data are not necessarily appropriate; “ecologic fallacy.” Still used because: –Low cost and convenience –Measurement limitations of individual-level studies –Design limitations of individual-level studies –Interest in ecologic effects per se –Simplicity of analysis and presentation

Applied Epidemiologic Analysis Fall 2002 Karpati A, Glea S, Awerbuch T, Levins R. 2002. Variability and vulnerability at the ecological level: Implications for understanding the social determinants of health. American Journal of Public Health, 92, 1768-1772. Examination of variations in disease and mortality rates across US counties. Note that regions with the greatest variability in county disease and mortality were those with the greatest variability in county SES indicators.

Applied Epidemiologic Analysis Fall 2002 Applied Epidemiologic Analysis Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia.

Similar presentations

Presentation on theme: "Applied Epidemiologic Analysis Fall 2002 Applied Epidemiologic Analysis Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Applied Epidemiologic Analysis Fall 2002 Applied Epidemiologic Analysis Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia.

Similar presentations

Presentation on theme: "Applied Epidemiologic Analysis Fall 2002 Applied Epidemiologic Analysis Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia."— Presentation transcript:

Similar presentations

About project

Feedback