Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Correlation and regression

Similar presentations

Presentation on theme: "1 Correlation and regression"— Presentation transcript:

1 1 Correlation and regression

2 2 Introduction b Scientific rules and principles are often expressed mathematically b There are two main approaches to finding a mathematical relationship between variables b Analytical  Based on theory b Empirical  Based on observation and experience

3 3 The straight line (1) b Most graphs based on numerical data are curves. b The straight line is a special case b Data is often manipulated to yield straight line graphs as the straight line is relatively easy to analyse

4 4 The Straight line (2) b Straight line equation b y = mx + c b slope = m  m =  y/  x b Intercept = c

5 5 Correlation & Regression b These are statistical processes which;  Suggest the existence of a relationship  Determine the best equation to fit the data b Correlation is a measure of the strength of a relationship between two variables b Regression is the process of determining that relationship

6 6 Correlation and Regression The next few slides illustrate correlation and regression

7 7 No Correlation

8 8 Positive correlation

9 9 Negative correlation

10 10 Curvilinear correlation

11 11 Correlation coefficient b A statistical measure of the strength of a relationship between two variables.  Pearson’ product-moment correlation coefficient, r  Spearman’s rank correlation coefficient,  b All these take a value in the range -1.0 to + 1.0  r or  = +1.0 represents a perfect positive correlation  r or  = -1.0 represents a perfect negative correlation  r or  = 0.0 represents a no correlation  values of r or  are associated with a probability of there being a relationship.

12 12 Linear regression b Is the process of trying to fit the best straight line to a set of data. b The usual method is based on minimising the squares of the errors between the data and the predicted line b For this reason, it is called “the method of least squares”

13 13 Linear regression - assumptions b The error in the independent (x) variable is negligible relative to the error in the dependant (y) variable  The errors are normally, independently and identically distributed with mean 0 and constant variance - NIID(0,  2 )

14 14 Linear regression model b For a set of data, (x,y), there is an equation that best fits the data of the form  Y =  +  x +   x is the independent variable or the predictor  y is the measured dependant or predicted variable  Y is the calculated dependant or predicted variable   is the error term and accounts for that part of Y not “explained” by x. b For any individual data point, i, the difference between the observed and predicted value of y is called the residual, r i  i.e. r i = y i – Y i = y i - (  +  x i )  The residuals provide a measure of the error term

15 15 Regression analysis (1) b Check the correlation coefficient b Null Hypothesis  H 0 : There is no correlation between x & y  H 1 : There is a correlation between x & y  Decision rule  reject H 0 if |r|  critical value at  = 0.05 b If you cannot reject H 0 then proceed no further, otherwise carry out a full regression

16 16 Regression analysis (2) b Regression analysis can be carried out using either Excel or Minitab. Excel will need the analysis ToolPak add-in installed. b The output from both Minitab and Excel will give the following information  The regression equation ( in the form y = a + bx)  Probabilities that a  0 and b  0  The coefficient of determination, R 2  Analysis of variance b In addition you will need to produce at least one of  Residuals vs. fitted values  Residuals vs. x-values  Residuals vs. y values

17 17 Interpreting output b Regression equation:- this is the equation that best fits the data and provides the predicted values of y b Analysis of variance:- Determines the proportion of the variation in x & y that can be accounted for by the regression equation and what proportion is accounted for by the error term. The p-value arising out of this tells us how well the regression equation fits the data.  The proportion of the variation in the data accounted for by the regression equation is called the coefficient of determination, R 2 and is equal to the square of the correlation coefficient

18 18 Output plots b The output plots are used to check the assumptions about the errors b The normal probability plot should show the residuals lying on a straight line. b The residual plots should have no obvious pattern and should not show the residuals increasing or decreasing with increase in the fitted or measured values.

19 19 Non linear relationships b Many functions can be manipulated mathematically to yield a straight line equation. b Some examples are given in the next few slides

20 20 Linearisation (2)

21 21 Linearisation (3)

22 22 Functions involving logs (1) b Some functions can be linearised by taking logs b These are  y = A x n  and y = A e kx

23 23 Functions involving logs (2) b For y = Ax n, taking logs gives b log y = log a + n log x b A graph of log y vs. log x gives a straight line, slope n and intercept log A. b To find A you must take antilogs (= 10 x )

24 24 Functions involving logs (3) b For y = Ae kx, we must use natural logs b ln y = ln A + kx b This gives a straight line slope k and intercept ln A b To find A we must take antilogs (= e x )

25 25 Polynomials b These are functions of general formula  y = a + bx + cx 2 + dx 3 + … b They cannot be linearised b Techniques for fitting polynomials exist  Both Excel and Minitab provide for fitting polynomials to data

Download ppt "1 Correlation and regression"

Similar presentations

Ads by Google