Download presentation

Presentation is loading. Please wait.

1
1 Correlation and regression http://sst.tees.ac.uk/external/U0000504

2
2 Introduction b Scientific rules and principles are often expressed mathematically b There are two main approaches to finding a mathematical relationship between variables b Analytical Based on theory b Empirical Based on observation and experience

3
3 The straight line (1) b Most graphs based on numerical data are curves. b The straight line is a special case b Data is often manipulated to yield straight line graphs as the straight line is relatively easy to analyse

4
4 The Straight line (2) b Straight line equation b y = mx + c b slope = m m = y/ x b Intercept = c

5
5 Correlation & Regression b These are statistical processes which; Suggest the existence of a relationship Determine the best equation to fit the data b Correlation is a measure of the strength of a relationship between two variables b Regression is the process of determining that relationship

6
6 Correlation and Regression The next few slides illustrate correlation and regression

7
7 No Correlation

8
8 Positive correlation

9
9 Negative correlation

10
10 Curvilinear correlation

11
11 Correlation coefficient b A statistical measure of the strength of a relationship between two variables. Pearson’ product-moment correlation coefficient, r Spearman’s rank correlation coefficient, b All these take a value in the range -1.0 to + 1.0 r or = +1.0 represents a perfect positive correlation r or = -1.0 represents a perfect negative correlation r or = 0.0 represents a no correlation values of r or are associated with a probability of there being a relationship.

12
12 Linear regression b Is the process of trying to fit the best straight line to a set of data. b The usual method is based on minimising the squares of the errors between the data and the predicted line b For this reason, it is called “the method of least squares”

13
13 Linear regression - assumptions b The error in the independent (x) variable is negligible relative to the error in the dependant (y) variable The errors are normally, independently and identically distributed with mean 0 and constant variance - NIID(0, 2 )

14
14 Linear regression model b For a set of data, (x,y), there is an equation that best fits the data of the form Y = + x + x is the independent variable or the predictor y is the measured dependant or predicted variable Y is the calculated dependant or predicted variable is the error term and accounts for that part of Y not “explained” by x. b For any individual data point, i, the difference between the observed and predicted value of y is called the residual, r i i.e. r i = y i – Y i = y i - ( + x i ) The residuals provide a measure of the error term

15
15 Regression analysis (1) b Check the correlation coefficient b Null Hypothesis H 0 : There is no correlation between x & y H 1 : There is a correlation between x & y Decision rule reject H 0 if |r| critical value at = 0.05 b If you cannot reject H 0 then proceed no further, otherwise carry out a full regression

16
16 Regression analysis (2) b Regression analysis can be carried out using either Excel or Minitab. Excel will need the analysis ToolPak add-in installed. b The output from both Minitab and Excel will give the following information The regression equation ( in the form y = a + bx) Probabilities that a 0 and b 0 The coefficient of determination, R 2 Analysis of variance b In addition you will need to produce at least one of Residuals vs. fitted values Residuals vs. x-values Residuals vs. y values

17
17 Interpreting output b Regression equation:- this is the equation that best fits the data and provides the predicted values of y b Analysis of variance:- Determines the proportion of the variation in x & y that can be accounted for by the regression equation and what proportion is accounted for by the error term. The p-value arising out of this tells us how well the regression equation fits the data. The proportion of the variation in the data accounted for by the regression equation is called the coefficient of determination, R 2 and is equal to the square of the correlation coefficient

18
18 Output plots b The output plots are used to check the assumptions about the errors b The normal probability plot should show the residuals lying on a straight line. b The residual plots should have no obvious pattern and should not show the residuals increasing or decreasing with increase in the fitted or measured values.

19
19 Non linear relationships b Many functions can be manipulated mathematically to yield a straight line equation. b Some examples are given in the next few slides

20
20 Linearisation (2)

21
21 Linearisation (3)

22
22 Functions involving logs (1) b Some functions can be linearised by taking logs b These are y = A x n and y = A e kx

23
23 Functions involving logs (2) b For y = Ax n, taking logs gives b log y = log a + n log x b A graph of log y vs. log x gives a straight line, slope n and intercept log A. b To find A you must take antilogs (= 10 x )

24
24 Functions involving logs (3) b For y = Ae kx, we must use natural logs b ln y = ln A + kx b This gives a straight line slope k and intercept ln A b To find A we must take antilogs (= e x )

25
25 Polynomials b These are functions of general formula y = a + bx + cx 2 + dx 3 + … b They cannot be linearised b Techniques for fitting polynomials exist Both Excel and Minitab provide for fitting polynomials to data

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google