# Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 20091/29 Multivariate analysis: Introduction Third training Module EpiSouth.

## Presentation on theme: "Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 20091/29 Multivariate analysis: Introduction Third training Module EpiSouth."— Presentation transcript:

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 20091/29 Multivariate analysis: Introduction Third training Module EpiSouth Madrid, 15 th to 19 th June, 2009 Dr D. Hannoun National Institute of Public Health Algeria

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 2/29 Generality Introduction: Generality Stratification allows us: Control confounding Reveal effect modification Limits of stratification: Only a few number of confounders could be controlled simultaneously The joint effect of confounders cannot be analysed correctly +++ Choice of classes with quantitative variables  Other tools: MULTIVARIATE ANALYSIS Assess the reality of the effect of exposure on the disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 3/29 Joint effect Introduction: Joint effect Example: Hepatitis BSEP Potential confounders: Age (children/adults), immunity (good/deficient)  Joint effect: the effect of two/more factors combined together  Marginal effect: the effect of one confounder alone without taking in consideration the other potential confounders Control onStrate 1 F+ Strate 2 F- Strate 3Strate 4Crude effect Adjusted Measure 2.0 F1+/F2+F1-/F2-F1+/F2-F1-/F2+ Age (F1)2,0 2,0 2,0 Immunity (F2)2,0 2,0 2,0 Factors 1+21,0 1,0 1 1 1,0

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 4/29 Definition Multivariate analysis: Definition Definition: Simultaneously, adjust for several variables Simultaneously, control for several potential confounders Several models: Multiple linear regression Logistic regression Cox regression …. Vocabulary Disease Y= dependant variable Risk factors= independant variables or predictors Procedures, at the analysis phase, that

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 5/29 Definition Multivariate analysis: Definition How: Representation of the disease Y as a function of other variables Risk factors Potential confounders By modelling the relationship studied Set of variables Statistical procedures: Multivariate analysis: The best Subset of variables describes the relationship between RF and disease Measure of the relationship: parameters To describe the disease via an equation The best model fitting the data

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 6/29 Definition Multivariate analysis: Definition Writing Model: E(Y/E, X 1, X 2 …, X p ) = f(E, X 1, X 2 …, X p ) Y: a given Disease E: Exposure X 1,X 2 …: other variables Example: F= linear function E(Y/E, X 1, X 2 …, X p ) = α + βE + β 1 X 1 + β 2 X 2 + … + β p X p β, β 1, β 2 … measure the relation between the exposure E, the others risk factors X1, X2… and the disease Y controlled on the other variables If β =0  No relationship between exposure and the disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 7/29 Definition Multivariate analysis: Definition  The adjusted measures of association we obtain from multivariable analysis are:  For each variable in the model, we obtain the effect measure of the relationship between this variable and the disease controlled on the other variables Direct effects and not total effects

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 8/29 Advantages Multivariate analysis: Advantages Advantages/techniques: Estimation of effects and controlling for more than one confounder simultaneously Study of the joint effect of several risk factors and quantify the intensity of interaction Possibility to have continuous risk factor Study the dose-response relationship: interest for causality and the specific risk at intermediary levels Study the trend effect according to the level of the risk factor Prediction of the disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 9/29 Step Multivariate analysis: Step Several steps: Choosing the appropriate model to summarize data Define the strategy variable selection Estimate the model coefficients Method of least squares (LS) estimation Method of maximum likelihood (ML) estimation Writing and interpreting the model Study the adequation of the model

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 10/29 Choice of the model Multivariate analysis: Choice of the model Depends on the form of the function f: 1. Nature of the outcome variable Continuous outcome  Multiple linear Regression Categorical outcome  Logistic regression (LR) Outcome time to an event  Cox regression 2. Nature of joint effect Additif  Multiple linear regression Multiplicatif  Logistic regression Cox regression 3. Form of the variable-distribution Normally distributed… 4. Assumption

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 11/29 variables selection Multivariate analysis: variables selection The final model depends on the variables will be selected: At the study design: Decide which variables to adjust or to control for How the variable will be coded Which interaction should be considered At the analytical phase: Which variables must be entered in the model Variables must be forced P value E.g.: 7 variables coded 0/1 with all interaction terms2 7 = 128 coefficients to estimate in the final model!  Neccesity of STRATEGY

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 12/29 Parameters estimation Multivariate analysis: Parameters estimation Purpose of multivariate analysis: To obtain some measure of the effect that describes the exposure- outcome relationship adjusted for relevant extraneous factors Parameters estimation depends on the model used: In MLR  regression coefficients β In LR  odds ratio In Cox  hazard ratio

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 13/29 Modeladequation Multivariate analysis: Model adequation Verify the adequation of the model: Capacity of the model to represent correctly the value of the disease given the value of subset of risk factors Steps: Adequation of the model: Graphical methods +++ Statistical tests Interpreting the test: be careful to the outlier The best model is necessary not the best statistical model: choose the model with the best understanding of the disease  The fitting model could be used for prediction

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 14/29 Introduction MLR: Introduction = multivariate model used in case of continuous data Principle: Describe one variable as a linear function of one or more other variables Form: E(Y)=f(E,X1,X2…)  F= linear function E(Y/X) = α + βX Simple linear regression model E(Y/X 1,, X p )= α + β 1 X 1 + … + β p X p Multi. linear regression model E(Y) = α + βX Disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 15/29 Introduction MLR: Introduction Incidence rate of ARI Atmopsheric pollution: density of PM 10 Y = α + βX + ε β = slope of the straight line Estimate the change in Y for one unit of X E.g. when pollution atmospheric increases 1%, the incidence rate of ARI increases by 2 cas/100.000 person α = intercept which correspond to the value of disease when the exposure equal 0, or more generally describes the baseline ε = error term in the model Statistical model In simple linear regression: Y = α + βX ^^^

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 16/29 Introduction MLR: Introduction In Multiple linear regression: Statistical model: Y = α + β 1 X 1 + β 2 X 2 + … + β p X p + ε E.g.: Variation of incidence rate of ARI with atmospheric pollution Potential confounders: age and smoking X 1 = density of PM 10 X 2 = age of person X 3 = smoking ARI Inc. Rate = α + β 1 density of PM 10 + β 2 Age + β 3 smoking + ε

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 17/29 Introduction MLR: Introduction In Multiple linear regression: β 1 = slope along the X 1 dimension: variation of ARI with the change of 1 unit of PM 10 density controlled on the other variables β 2 = slope along the X 2 dimension: variation of ARI with the change of one unit of AGE controlled on the other variables β 3 = slope along the X 3 dimension: variation of ARI with the change of one unit of smoking (person/year) controlled on the other variables α = intercept, value of the disease when there is no risk factor… ε = error term in the model ARI Inc. Rate = α + β 1 density of PM 10 + β 2 Age + β 3 smoking + ε

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 18/29 Parameters estimation MLR: Parameters estimation Method used: least squares estimation Principle: Identify the best straight line that minimizes the sum of squared residuals YiYi ŶiŶi (X i,Ŷ i,) (X i,Y i,) XiXi Least squared line fit SSR = Σ(Y i - Ŷ i ) 2 = Σ(Yi - α – βX) 2

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 19/29 Variables selection MLR: Variables selection Decide which variables to control for: 1. Prediction of the risk of the disease We haven’t to take in consideration all confounders but the best group of predictors Importance in term of public Health +++ E.g.: incidence rate of ARI – Exposure: atmospheric pollution – Predictors: age and smoking 2. Estimation of the relation between exposure and disease We have to take in consideration ALL confounders to control confounding Importance in term of causal association E.g.: incidence rate of ARI – Exposure: atmospheric pollution – Predictors: age, smoking, breastfeeding, ROR…

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 20/29 Variables selection MLR: Variables selection Which variables must be entered in the initial model:2 situations Some are obligatory in the model because there are recognized as risk factor: exposure Other variables  significant relationship between the variable and the disease in the bivariate analysis  All candidate variables to modelling

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 21/29 Variables selection MLR: Variables selection Which interaction should be considered: Problem of interaction must be approached in a manner wich facilitates understanding of the nature of the causal effect Statistical consideration should serve rather than determine our objectives Adjonction of an interaction term  Addition of an other regression coefficient in the equation More difficulties to interpret the model For a given interaction, you must ensure that the variables which are in the term interaction are contained in the model

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 22/29 Variables selection MLR: Variables selection Example : Incidence rate of ARI 1. Model WITH an interaction term: Interaction BETWEEN smoking and age:β 2,3 X 2 X 3 ARI Inc. Rate = α + β 1 density of PM 10 + β 2 Age + β 3 smoking + β 2,3 Age smoking + β 4 breastfeeding + β 5 ROR + ε ARI Inc. Rate = α + β 1 density of PM 10 + β 2 Age + β 2,3 Age smoking + ε

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 23/29 Variables selection MLR: Variables selection Which variables must be entered in the initial model:2 situations … How the variables must be entered in the initial model: Strategy must be defined Start with ALL variables  Backward elimination Start with NO variable  Forward selection Mixed the two previous methods  Stepwise selection

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 24/29 Variables selection MLR: Variables selection sexeage Pollution ROR smoking breastfeeding region Profession Age*smoking At The stud design Bivariate analysis and stratification First part of analytical phase Significant variables Pollution Age Smoking Breastfeeding ROR V. must be forced Pollution Candidate variables to modeling The largest possible model Define how the V. could be entered in the model Backward Forward Stepwise Multivariate analysis Rules Second part of analytical phase Final model: Pollution Age Smoking

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 25/29 Backwards strategy MLR: Backwards strategy Principle : Begins with ALL candidate variables in the model  largest POSSIBLE model At each step, Drop one variable, the choice of this variable is based on statistical rules  remains variable which is not significant Continue until no more variables can be dropped, meaning all remaining variables are relevant Advantages: Evaluate the joint confounding effects of all variables Limits: With many risk factors, strata could provide no information

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 26/29 Forward strategy MLR: Forward strategy Principle : Begins with NO variable in the model  smallest POSSIBLE model At each step, Keep one variable in the model, the choice of this variable is based on statistical rules Start with the variable that has the biggest change-in-estimate impact when evaluated individually Keep the var. which changes meangfully the adjusted estimate Continue until no other variables can be added Advantages: Avoids the initial sparse cell problem of backwards approach Limits: Does not evaluate joint confounding effects of many variables

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 27/29 Conclusion MLR: Conclusion Goal of modeling: To obtain The smallest subset of relevant risk factors to describes the disease With the best understanding of the disease Like for stratification, you must identify: First, significant interaction term: don’t forget to verifiy that the v. which are in the term interaction are contained in the model  statistical significance + biological consideration Secondly, test the confounding effect  No statistical test Retain significant risk factors, confounder risk factors and interaction term that help us to understand and to explain the occurrence of disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 28/29 Conclusion Multivariate analysis allows to control and adjust the effect of exposure with several extraneaous factors simultaneously The adjusted measures of association are direct effects and not total effects Multivariate analysis is a useful tool but it could be very dangerous if we haven’t preliminary defined the strategy Purpose of the study Method of variable selection Assumption Adequation of the model…

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 2009 29/29 Conclusion As with stratification method, statistical considerations should serve rather than determine our objectives Multivariate analysis requires computer to run the statistical programme The choice of the model depends upon of a lot of factors: outcome variable, form of the relationship between exposure and disease…

Download ppt "Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 20091/29 Multivariate analysis: Introduction Third training Module EpiSouth."

Similar presentations