Download presentation

Presentation is loading. Please wait.

Published byMax Toogood Modified over 2 years ago

1
3/2003 Rev 1 I.2.15-16 – slide 1 of 33 Session I.2.15-16 Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection Session 15-16Data Analysis IAEA Post Graduate Educational Course Radiation Protection and Safety of Radiation Sources

2
3/2003 Rev 1 I.2.15-16 – slide 2 of 33 Upon completion of this section the student should be able to demonstrate an understanding of the following statistical concepts pertaining to sample data: Upon completion of this section the student should be able to demonstrate an understanding of the following statistical concepts pertaining to sample data: Regression Regression Correlation Correlation Objectives

3
3/2003 Rev 1 I.2.15-16 – slide 3 of 33 A regression is a statistical technique used to investigate the relationship among two or more variables A regression is a statistical technique used to investigate the relationship among two or more variables An independent variable, x, is linked to a dependent variable, y An independent variable, x, is linked to a dependent variable, y The relationship may follow the generic form of a straight line, for example, y = mx + b, where m is the slope and b is the intercept The relationship may follow the generic form of a straight line, for example, y = mx + b, where m is the slope and b is the intercept Regression

4
3/2003 Rev 1 I.2.15-16 – slide 4 of 33 Other forms of relationships include: Other forms of relationships include: Y = a + bx + cx 2 a parabola Y = a + bx + cx 2 a parabola Y = ab x an exponential curve Y = ab x an exponential curve Y = ax b a geometric curve Y = ax b a geometric curve Regression

5
3/2003 Rev 1 I.2.15-16 – slide 5 of 33 Linear Least Squares Regression Linear least squares regression is by far the most widely used modeling method. It is what most people mean when they say they have used "regression", "linear regression" or "least squares" to fit a model to their data. Linear least squares regression is by far the most widely used modeling method. It is what most people mean when they say they have used "regression", "linear regression" or "least squares" to fit a model to their data. Not only is linear least squares regression the most widely used modeling method, but it has been adapted to a broad range of situations that are outside its direct scope. Not only is linear least squares regression the most widely used modeling method, but it has been adapted to a broad range of situations that are outside its direct scope. It plays a strong underlying role in many other modeling methods. It plays a strong underlying role in many other modeling methods.

6
3/2003 Rev 1 I.2.15-16 – slide 6 of 33 Linear Least Squares Regression Used directly, with an appropriate data set, linear least squares regression can be used to fit the data with any function of the form: F(x, ) = 1 + 2 x 1 + 3 x 2 + … in which (see next slide)

7
3/2003 Rev 1 I.2.15-16 – slide 7 of 33 Linear Least Squares Regression Each explanatory variable in the function is multiplied by an unknown parameter Each explanatory variable in the function is multiplied by an unknown parameter There is at most one unknown parameter with no corresponding explanatory variable There is at most one unknown parameter with no corresponding explanatory variable All of the individual terms are summed to produce the final function value All of the individual terms are summed to produce the final function value

8
3/2003 Rev 1 I.2.15-16 – slide 8 of 33 Linear Least Squares Regression In statistical terms, any function that meets these criteria would be called a "linear function" In statistical terms, any function that meets these criteria would be called a "linear function" The term "linear" is used, even though the function may not be a straight line The term "linear" is used, even though the function may not be a straight line

9
3/2003 Rev 1 I.2.15-16 – slide 9 of 33 Linear Least Squares Regression The unknown parameters are considered to be variables and the explanatory variables are considered to be known coefficients corresponding to those "variables The unknown parameters are considered to be variables and the explanatory variables are considered to be known coefficients corresponding to those "variables The problem becomes a system of linear equations that can be solved for the values of the unknown parameters The problem becomes a system of linear equations that can be solved for the values of the unknown parameters

10
3/2003 Rev 1 I.2.15-16 – slide 10 of 33 Linear Least Squares Regression In the least squares method the unknown parameters are estimated by minimizing the sum of the squared deviations between the data and the model In the least squares method the unknown parameters are estimated by minimizing the sum of the squared deviations between the data and the model The minimization process reduces the system of equations formed by the data to a sensible system of p (where p is the number of parameters in the functional part of the model) equations in p unknowns The minimization process reduces the system of equations formed by the data to a sensible system of p (where p is the number of parameters in the functional part of the model) equations in p unknowns This new system of equations is then solved to obtain the parameter estimates This new system of equations is then solved to obtain the parameter estimates

11
3/2003 Rev 1 I.2.15-16 – slide 11 of 33 Linear Least Squares Regression Linear models are not limited to being straight lines or planes, but include a fairly wide range of shapes Linear models are not limited to being straight lines or planes, but include a fairly wide range of shapes For example, a simple quadratic curve is linear in the statistical sense For example, a simple quadratic curve is linear in the statistical sense A straight-line model or a polynomial is also linear in the statistical sense because they are linear in the parameters, though not with respect to the observed explanatory variable, x A straight-line model or a polynomial is also linear in the statistical sense because they are linear in the parameters, though not with respect to the observed explanatory variable, x

12
3/2003 Rev 1 I.2.15-16 – slide 12 of 33 Linear Least Squares Regression Just as models that are linear in the statistical sense do not have to be linear with respect to the explanatory variables, nonlinear models can be linear with respect to the explanatory variables, but not with respect to the parameters Just as models that are linear in the statistical sense do not have to be linear with respect to the explanatory variables, nonlinear models can be linear with respect to the explanatory variables, but not with respect to the parameters

13
3/2003 Rev 1 I.2.15-16 – slide 13 of 33 Linear Least Squares Regression For example, F(x, ) = 1 + 1 2 x is linear in x but it cannot be written in the general form of a linear model For example, F(x, ) = 1 + 1 2 x is linear in x but it cannot be written in the general form of a linear model This is because the slope of this line is expressed as the product of two parameters This is because the slope of this line is expressed as the product of two parameters As a result, nonlinear least squares regression could be used to fit this model, but linear least squares cannot be used As a result, nonlinear least squares regression could be used to fit this model, but linear least squares cannot be used

14
3/2003 Rev 1 I.2.15-16 – slide 14 of 33 Linear Least Squares Regression Advantages: Advantages: Although there are types of data that are better described by functions that are nonlinear in the parameters, many processes in science and engineering are well-described by linear models Although there are types of data that are better described by functions that are nonlinear in the parameters, many processes in science and engineering are well-described by linear models This is because either the processes are inherently linear or because, over short ranges, any process can be well-approximated by a linear model This is because either the processes are inherently linear or because, over short ranges, any process can be well-approximated by a linear model

15
3/2003 Rev 1 I.2.15-16 – slide 15 of 33 Linear Least Squares Regression Disadvantages: Disadvantages: The main disadvantages of linear least squares are: The main disadvantages of linear least squares are: limitations in the shapes that linear models can assume over long ranges limitations in the shapes that linear models can assume over long ranges poor extrapolation properties poor extrapolation properties sensitivity to outliers sensitivity to outliers Linear models with nonlinear terms in the predictor variables curve relatively slowly, so for inherently nonlinear processes it becomes increasingly difficult to find a linear model that fits the data well as the range of the data increases. Linear models with nonlinear terms in the predictor variables curve relatively slowly, so for inherently nonlinear processes it becomes increasingly difficult to find a linear model that fits the data well as the range of the data increases.

16
3/2003 Rev 1 I.2.15-16 – slide 16 of 33 Linear Least Squares Regression Finally, while the method of least squares often gives optimal estimates of the unknown parameters, it is very sensitive to the presence of unusual data points in the data used to fit a model Finally, while the method of least squares often gives optimal estimates of the unknown parameters, it is very sensitive to the presence of unusual data points in the data used to fit a model One or two outliers can sometimes seriously skew the results of a least squares analysis One or two outliers can sometimes seriously skew the results of a least squares analysis

17
3/2003 Rev 1 I.2.15-16 – slide 17 of 33 Linear Least Squares Regression Application of this concept enables us to model a relationship to data Application of this concept enables us to model a relationship to data Do not expect the data to perfectly follow the model Do not expect the data to perfectly follow the model The idea is to establish a model that will provide a best fit in representing the data The idea is to establish a model that will provide a best fit in representing the data

18
3/2003 Rev 1 I.2.15-16 – slide 18 of 33 Regression Example For the following data, what is the function that relates the Y values to the X values? X12357 Y43489

19
3/2003 Rev 1 I.2.15-16 – slide 19 of 33 Plot of sample data Regression Example

20
3/2003 Rev 1 I.2.15-16 – slide 20 of 33 A line is drawn through the data points A line is drawn through the data points The difference between the value represented by the line and the observed value is determined The difference between the value represented by the line and the observed value is determined This value is squared (which gets rid of negative values) This value is squared (which gets rid of negative values) Regression Example

21
3/2003 Rev 1 I.2.15-16 – slide 21 of 33 Sum of Squares - Total The total sum of the squares is calculated for each data point. This is called the sum of squares, or SS The total sum of the squares is calculated for each data point. This is called the sum of squares, or SS The line that is the best fit is one that has the smallest SS value The line that is the best fit is one that has the smallest SS value SS TOTAL = (Y i – Y AVG ) 2

22
3/2003 Rev 1 I.2.15-16 – slide 22 of 33 Sum of Squares - Residual The residual sum of the squares, SS RES, is calculated by determining all the residuals, squaring them, and summing the squares. It has n-2 degrees of freedom SS RES = (Y i – Ý i ) 2 where Ý i is the fitted or predicted value of Y Another procedure for calculating SS RES is: SS RES = SS TOTAL - SS REG

23
3/2003 Rev 1 I.2.15-16 – slide 23 of 33 Sum of Squares - Regression The regression sum of the squares, SS REG, is calculated by determining all the residuals, squaring them, and summing the squares. It has 1 degree of freedom. SS REG = (Ý i - Y AVG ) 2 where Ý i is the fitted or predicted value of Y

24
3/2003 Rev 1 I.2.15-16 – slide 24 of 33 F Statistic When comparing the variance of two different groups (populations), the null hypothesis, H 0, is that the variance of the two groups is equal ( A 2 = B 2 ) When comparing the variance of two different groups (populations), the null hypothesis, H 0, is that the variance of the two groups is equal ( A 2 = B 2 ) The alternative hypothesis, H 1, is that the variance of the two groups is not equal ( A 2 B 2 ) The alternative hypothesis, H 1, is that the variance of the two groups is not equal ( A 2 B 2 )

25
3/2003 Rev 1 I.2.15-16 – slide 25 of 33 F Statistic From the first population, you make n A observations and calculate the sample variance, S A 2 with df A = n A –1 From the first population, you make n A observations and calculate the sample variance, S A 2 with df A = n A –1 From the second population, you make n B observations and calculate the sample variance, S B 2 with df B = n B –1 From the second population, you make n B observations and calculate the sample variance, S B 2 with df B = n B –1

26
3/2003 Rev 1 I.2.15-16 – slide 26 of 33 F Statistic Let S MAX 2 and S MIN 2 denote the larger and smaller of S A 2 and S B 2, respectively Let S MAX 2 and S MIN 2 denote the larger and smaller of S A 2 and S B 2, respectively Likewise, let df MAX and df MIN denote their respective degrees of freedom Likewise, let df MAX and df MIN denote their respective degrees of freedom The test statistic, F, also known as the F ratio or variance ratio is: The test statistic, F, also known as the F ratio or variance ratio is: F = S max 2 S min 2

27
3/2003 Rev 1 I.2.15-16 – slide 27 of 33 F Statistic If H 0 is correct, the F ratio should not be much larger than 1 If H 0 is correct, the F ratio should not be much larger than 1 The question is how large is large? The question is how large is large? The 0.975 quartile (for a 2 tail distribution) is f 0.975 (df MAX, df MIN ) The 0.975 quartile (for a 2 tail distribution) is f 0.975 (df MAX, df MIN )

28
3/2003 Rev 1 I.2.15-16 – slide 28 of 33 Source of Variation Sum of Squares Degrees of Freedom Mean Square F Regression SS REG 1 MS REG = SS REG /df MS REG MS RES Residual SS RES n - 2 MS RES = SS RES /df Total SS TOTAL n - 1 n - 1 Regression Analysis Table

29
3/2003 Rev 1 I.2.15-16 – slide 29 of 33 Slope The equation for the slope of a regression line is: The equation for the slope of a regression line is: B = For the data in the example, the slope, For the data in the example, the slope, B = 1.04 [n X 2 – ( X) 2 ] [n (XY) – ( X)( Y)]

30
3/2003 Rev 1 I.2.15-16 – slide 30 of 33 Regression analysis plot of sample data Regression Analysis

31
3/2003 Rev 1 I.2.15-16 – slide 31 of 33 A regression analysis of these sample data indicates the slope to be 1.04, with an intercept value of 1.84 (see next slide) Regression Analysis

32
3/2003 Rev 1 I.2.15-16 – slide 32 of 33 Regression Analysis

33
3/2003 Rev 1 I.2.15-16 – slide 33 of 33 Where to Get More Information Cember, H., Johnson, T. E., Introduction to Health Physics, 4th Edition, McGraw-Hill, New York (2008) Cember, H., Johnson, T. E., Introduction to Health Physics, 4th Edition, McGraw-Hill, New York (2008) Martin, A., Harbison, S. A., Beach, K., Cole, P., An Introduction to Radiation Protection, 6 th Edition, Hodder Arnold, London (2012) Martin, A., Harbison, S. A., Beach, K., Cole, P., An Introduction to Radiation Protection, 6 th Edition, Hodder Arnold, London (2012) Firestone, R.B., Baglin, C.M., Frank-Chu, S.Y., Eds., Table of Isotopes (8 th Edition, 1999 update), Wiley, New York (1999) Firestone, R.B., Baglin, C.M., Frank-Chu, S.Y., Eds., Table of Isotopes (8 th Edition, 1999 update), Wiley, New York (1999)

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google