Simple Linear Regression

Simple Linear Regression
Correlation and Simple Linear Regression © Scott Evans, Ph.D.

Correlation Used to assess the relationship between two continuous variables. Example: weight and number of hours of exercise per week Scatterplots are useful graphical tools to visualize the relationship between two continuous variables. Plot of y vs. x Plots are generally recommended as they are informative, easy to understand, and do not cost anything from a statistical sense. © Scott Evans, Ph.D.

Correlation Coefficient
Ranges from –1 (perfect negative correlation) to 1 (perfect positive correlation) Correlation of –1 means that all of the data lie in a straight line with negative (but further unknown) slope. Correlation of 1 means that all of the data lie in a straight line with positive (but further unknown) slope. Correlation of 0, means that the variables are not correlated. A plot of the data would reveal a cloud with no discernable pattern. © Scott Evans, Ph.D.

 denotes the population correlation coefficient (a parameter) r denotes the sample correlation coefficient (a statistic) Measures the strength of a linear relationship © Scott Evans, Ph.D.

Can conduct hypothesis tests for  using the t distribution. H0: =0 is often tested (but is not limiting) CIs for  may also be obtained. © Scott Evans, Ph.D.

Two major correlation coefficients: Pearson correlation: parametric Sensitive to extreme observations Spearman correlation: nonparametric Based on ranks Robust to extreme observations and thus is recommended particularly when “outliers” are present. I use Spearman almost exclusively © Scott Evans, Ph.D.

Correlation Warnings:
Correlation does NOT necessarily imply causation. Correlation does NOT measure the magnitude of the regression slope. © Scott Evans, Ph.D.

Used to describe and estimate the relationship between 2 continuous variables We attempt to characterize this relationship with two parameters: (1) an intercept, and (2) a slope. Has several assumptions © Scott Evans, Ph.D.

The model: yi = 0 + 1xi yi is the dependent variable xi is the independent variable (predictor) 0 is the y-intercept 1 is the slope of the line It describes how steep the line is and which way it leans It is also the “effect” (e.g., a treatment effect) on y of a 1 unit change in x Note that if there was no association between x and y, then 1 will be close to 0 (i.e., x has no effect on y) © Scott Evans, Ph.D.

Plot the data first: Scatterplots This enables you to see the relationship between the 2 variables If the relationship is non-linear (as evidenced by the plot) then a simple linear regression model is the wrong approach. © Scott Evans, Ph.D.

Uses the method of least squares Identifies the line that minimizes the sum of squared deviations between observed and predicted values. Produces an ANOVA table with an F test Note that F simplifies to t when the numerator d.f. = 1. © Scott Evans, Ph.D.

Can be used to estimate the correlation between x and y. Hypothesis tests and CIs may be obtained for 1 One may make predictions of the effect of changes in x on y. © Scott Evans, Ph.D.

We desire to capture all of the structure in the data with the model (finding the systematic component) Thus leaving only random errors. Thus if we plotted the errors, we would hope that no discernable pattern exists If a pattern exists, then we have not captured all of the systematic structure in the data, and we should look for a better model. © Scott Evans, Ph.D.

Multiple Regression Can incorporate (and control for) many variables
A single (continuous) dependent variable Multiple independent variables (predictors) These variables may be of any scale (continuous, nominal, or ordinal) Outcome = function of many variables (e.g., sex, age, race, smoking status, exercise, education, treatment, genetic factors, etc.) © Scott Evans, Ph.D.

Multiple Regression Multiple regression can estimate the effect of each of these variables while controlling for (adjusting for) the effects of other (potentially confounding variables) in the model Confounding occurs when the effect of a variable of interest is distorted when you do not control for the effect of another “confounding” variable. © Scott Evans, Ph.D.

Multiple Regression Interactions (effect modification) may be investigated The effect of one variable depends on the level of another variable. Example: the effect of treatment may depend on whether you are male or female © Scott Evans, Ph.D.

Multiple Regression Indicator variables are created and used for categorical variables. Selection procedures can help a research choose a final model from a shopping list of potential independent variables. Backwards Forwards Stepwise Best subsets © Scott Evans, Ph.D.

Multiple Regression Models require an assessment of model adequacy and goodness of fit. Examination of residuals (comparison of observed vs. predicted values). © Scott Evans, Ph.D.

Other Regression Models
Generalized Linear Models (GLMs) A function of the dependent variable is a linear function of the covariates Multiple regression (link = identity) Logistic Regression Used when the dependent variable is binary Very common in public health/medical studies (e.g., disease vs. no disease) Poisson Regression Used when the dependent variable is a count © Scott Evans, Ph.D.

Other Regression Models
(Cox) Proportional Hazards (PH) Regression Used when the dependent variable is a “event time” with censoring © Scott Evans, Ph.D.

Simple Linear Regression

Similar presentations

Presentation on theme: "Simple Linear Regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Simple Linear Regression

Similar presentations

Presentation on theme: "Simple Linear Regression"— Presentation transcript:

Similar presentations

About project

Feedback