Presentation on theme: "Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton"— Presentation transcript:
Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton
Simple linear regression is a standard technique in the Analysis of Biological Data: The main idea is assessing the relationship between two variables, assuming that the relationship is direction and linear…and assuming that one variable is a driver of the relationship. The Response variable (plotted on X) is assumed to respond in a linear relationship to changes in the Predictor variable (plotted on Y). The reverse is not assumed in this analysis (that X drives Y). Think heart rate and exercise. Other examples?
But if you have a cloud of points…where do you put the line?
Best fit lines & “Least Squares” regression The idea is to drive the line through the cloud in the area that minimizes the distance between the points and the line.
Regression residuals You can generate a table of residuals.. a new data set! How much does each point deviate from the regression line?
Detrending… a scientific siren song
Regression lines can have varying slopes from a single Y intercept.
Regression lines can have identical slopes, but different Y intercepts.
We will be running a test of this sort in R. The thing I want to you to understand is that the statistical test…. The P-value generated… relates to the null hypothesis of NO SLOPE. That the line is indeed flat. That would mean the response variable is NOT changing in relation to the predictor.
IMPORTANT! The P-value from a regression, tells you whether the line is statistically flat….it does not tell you how much variation is captured!
It may be more useful to calculate a confidence interval
You might wish to have replicate values
Your relationship might not be linear! Polynomial Regression
Regression Diagnostics! A stepwise process of adding factors to the regression. Testing P value, r 2, etc. If you are going to take this on, you need to grind! Read, analyze, read some more
Correlation is a related form of analysis, but is different in one fundamental way…a correlation is testing for a relationship between two factors, but NOT ASSUMING one causes the other. Thus, no predictor and response
You would use a correlation analysis if you are not making assumptions about one factor driving another. Pearson correlation for normally distributed data Spearman (rank) correlation for non normally distributed data.
Logistic regression: To be used if your data are categorical……