1 Regression Analysis Modeling Relationships
2 Regression Analysis Regression Analysis is a study of the relationship between a set of independent variables and the dependent variable. Dependent Variable Independent Variables The Linear Equation representing the ‘true’ or population relationship:
3 Variables Dependent Variable: Also called the predicted variable. Its value depends on, or can be predicted by the independent variables. Independent Variables: Also called the predictor variables. These can be measured directly, and are used to predict the dependent (or to simply understand it better).
4 Modeling Process Define GoalTo study the impact of various factors on individual health Choose yLung Capacity, measured in cc. List possible XsMinutes of Exercise per day, # of days/week of exercise, ethnicity, gender, age, height, altitude at which lived. Collect DataPrimary, Secondary sources Preliminary AnalysesUnivariate, bivariate Build Regression ModelHow is y related to all the Xs? Evaluate ModelHow good is the model at predicting y? Implement/MonitorCreate DSS, monitor, update
5 The Data YX1X2X3X4X5 Lung Capacity (cc)GenderHeightSmokerExerciseAge A portion of the data is shown below. See Spreadsheet for all data.
6 Preliminary Analyses Lung Capacity (cc)GenderHeightSmokerExerciseAge Mean Stdev Min Max The table below shows some descriptive statistics for each variable. What basic statements about our data can we make from this?
7 Capacity by Gender, Smoking Gender SmokerDataFemaleMaleGrand Total Non-SmokerAverage of Lung Capacity (cc) StdDev of Lung Capacity (cc) Count of Smoker SmokerAverage of Lung Capacity (cc) StdDev of Lung Capacity (cc) Count of Smoker Total Average of Lung Capacity (cc) Total StdDev of Lung Capacity (cc) Total Count of Smoker Does there appear to be a relationship between, Smoking, Gender, and Lung Capacity?
8 Distributions
9 Bivariate Analysis – Matrix Plot
10 Capacity distribution by Gender, Smoking Men have a larger lung capacity than women, on average. Non-Smokers have a larger lung capacity than smokers on average. What about the variance?
11 Simple Regression How well can exercise time alone predict the lung capacity?
12 Multiple Regression How do all the Xs together help predict y? SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations100 Coefficients Standard Errort StatP-value Intercept Gender E-06 Height E-10 Smoker E-07 Exercise Age
13 Final Model SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations100 CoefficientsStandard Errort StatP-value Intercept Gender E-06 Height E-10 Smoker E-07 Exercise * Gender * Height – * Smoker * Exercise
14 Prediction Exercise 1. Predict the lung capacity for a non- smoking female who does not exercise, and is 66 inches tall, based on the model above. 2. What would be the predicted value if she smoked? 3. What would it be for a male in both the above cases?