Presentation on theme: "10-1 Chapter 10 Curve Fitting and Regression Analysis Correlation and regression analyses can be used to establish the relationship between two or more."— Presentation transcript:
10-1 Chapter 10 Curve Fitting and Regression Analysis Correlation and regression analyses can be used to establish the relationship between two or more variables. Correlation and regression analyses should be preceded by a graphical analysis to determine: 1.if the relationship is linear or nonlinear 2.if the relationship is direct or indirect (a decrease in Y as X increases) 3.if there are any extreme events that might control the relationship
10-2 Graphical Analysis 1.Degree of common variation, which is an indication of the degree to which the two variable are related. 2.Range and distribution of the sample data points. 3.Presence of extreme events. 4.Form of the relationship between the two variables. 5.Type of relationship.
10-3 Figure: Different Degrees of Correlation between Variables (X and Y): (a) R = 1.0; (b) R = 0.5; (c) R = 0.0; (d) R = -0.5; (e) R = -1.0; (f) R = 0.3. Note |R|=1 means “very high”. |R|=0 means “very low”.
10-4 Figure: Instability in the Relationship between Random Variables.
10-5 Figure: Effect of an Extreme Event in a Data Sample on the Correlation:(a) High Correlation; (b) Low Correlation.
10-6 Correlation Analysis Correlation is the degree of association between two variables. Correlation analysis provides a quantitative index of the degree to which one or more variables can be used to predict the values of another variable. Correlation analyses are most often performed after an equation relating one variable to another has been developed.
10-7 Separation of Variation TV=EV+UV TV: total variation, the sum of the squares of the sample data points about the mean of the data points, EV: explained variation UV: unexplained variation It can be represented by
10-8 Figure: Separation of Variation: (a) Total Variation; (b) Explained Variation;(c) Unexplained Variation.
10-9 Correlation: Fraction of Explained Variation R is the correlation coefficient. If the explained variation equals the total variation (that is, the estimated data fits the data points exactly), R=1. If the relationship between X and Y is inverse, R=-1. If R=0, it is called the null correlation. There is no linear association between X and Y.
10-10 Zero-intercept Model Linear model with zero-intercept: Objective function F : minimizing the sum of squares of the differences (e i ) between the predicted values ( ) and the measured values (Y i ) of Y:
10-12 Example: Fitting a Zero-intercept Model Data set Perform the necessary computations:
10-13 Introduction to Regression Elements of statistical optimization: (1)An objective function: to define what is meant by best fit. (2)A mathematical model: an explicit function relating a criterion variable (dependent variable) to unknowns and predictor variables (independent variable). (3)A data set: a matrix of measured data.
10-14 Linear bivariate model: b 0 = the intercept coefficient b 1 = the slope coefficient b 0 and b 1 are called regression coefficients. Linear multivariate model: relating a criterion variable to two or more predictor variables. p = # of predictor variables, X i = ith predictor variable, b i = ith slope coefficient, b 0 = intercept coefficient, where i = 1,2,..., p.
10-15 It is a regression method. Error: Objective function: By the linear model: Principle of Least Squares
10-16 In solving a linear model using the least- squares principle, the sum of the errors always equals zero.
10-17 Example: Least-squares Analysis
10-18 Reliability of the Regression Equation 1.Correlation coefficient (least squares principle) 2.Standard error of estimate 3.The rationality of the coefficients and the relative importance of the predictor variables (can be assessed using the standardized partial regression coefficients).
10-19 Standard Error of Estimate Standard deviation of Y: Standard error = standard deviation of the errors. where v = degree of freedom, the sample size minus # of unknowns. For the bivariate model, p=1, v=n-2. For the general linear model, v=n-p-1. When a regression equation fits the data points exactly, S e =0.
10-20 Computation of S e : Approximation of S e :
10-21 Standardized Partial Regression Coefficients t is called the standardized partial regression coefficient, and S X and S Y are the standard deviations of the predictor and criterion variables respectively. In fact, t=R.
10-22 Example: Linear Model with Least- Squares Principle A linear equation can be obtained:
10-23 Figure: Linear model with least-squares principle. Standard error of estimate: S e = Correlation coefficient R=0.945.
10-24 Multiple Linear Model The model: p = # of predictor variables X i = ith predictor variable b i = partial regression coefficient b 0 = intercept coefficient Y= criterion variable It is better that n 4 p, where n is the number of observations.
10-25 The objective function (least-squares principle): Consider p=2:
10-26 Solving the 3 simultaneous equations, we can get the values of b 0, b 1, and b 2.
10-27 Common Nonlinear Models 1. Bivariate a. Polynomial b. Power 2. Multivariate a. Polynomial b. Power
10-28 Nonlinear Polynomial Models A new set of predictor variables: This results in the model: The coefficients b j can be estimated using a standard linear multiple regression analysis.
10-29 A new set of predictor variables: The revised model: This is also a multiple linear model.
10-30 Power Models A new set of variables: This results in the bivariate linear model:
10-31 A new set of criterion and predictor variables: The resulting model: This is also a multiple linear model.
10-32 Additional Model Forms It can be transformed into: It can be easily solved by letting