# Use of regression analysis Regression analysis: –relation between dependent variable Y and one or more independent variables Xi Use of regression model.

## Presentation on theme: "Use of regression analysis Regression analysis: –relation between dependent variable Y and one or more independent variables Xi Use of regression model."— Presentation transcript:

Use of regression analysis Regression analysis: –relation between dependent variable Y and one or more independent variables Xi Use of regression model in general: –making forecasts/predictions/estimates for Y –investigation of functional relationship between Y and Xi –filling-in missing data in Y-series –validation of Y-series Use of regression model in data processing: –validation and in-filling of missing data using a relation curve and of discharges using RR-relation –transformation of water levels to discharges using a power type regression equation –estimation of rainfall/climatic variable on a catchment grid like in kriging OHS - 1

Linear and non-linear regression equations Linear regression –simple linear regression (i = 1) –multiple and stepwise regression (i > 1) in stepwise- regression the independent variables enter model one by one based on largest reduction of unexplained variance (free variables); forced variables always enter model Non-linear regression OHS - 2

Suitable regression model Model depends on: –variables considered –physics of the processes –range of the data of interest A non-linear relation may well be described by a linear regression equation within a particular range of the variables in regression –annual rainfall-runoff relation is in principle non-linear, but: *for low rainfall abstractions vary strongly due to evaporation *for very high rainfall evaporation has reached its potential and is almost constant *within a limited range relation assumption of linearity is often suitable OHS - 3

Evaporation Runoff = Rainfall General form of relation between annual rainfall and runoff OHS - 4

Use of regression model for discharge validation Steps –develop regression model where runoff/discharge is regressed on rainfall: Q t = f(P t, P t-1,…..) –by investigating the time-wise behaviour of the residuals stationarity of the relationship is tested –if rainfall is error free deviations from stationarity may be due to: *change in drainage characteristics *incorrect runoff data due to errors in the water level data and/or in the stage-discharge relation –visualisation of non-stationarity by double mass analysis of observed discharge and via regression computed discharge OHS - 5

ii ŶiŶi Residual = part of Y not explained by regression Part of Y explained by regression Distribution of residuals Simple linear regression model Ŷ =  +  X Y =  +  X +  Y - Y =   Y 2 =  Y 2 +   2 Ŷ =  +  X Y =  +  X +  Y - Y =   Y 2 =  Y 2 +   2 Total variance = explained variance + unexplained variance Ŷ =  +  X OHS - 6

Years Months Direction for parameter estimation 3-D plot of monthly rainfall DIRECTION OF DATA VECTOR FOR REGRESSION ANALYSIS OHS - 7

Estimation of regression coefficients Minimising the sum of squared errors to obtain Least Squares Estimators: First derivatives of M to a and b set to zero: normal equations: Solutions for b and a OHS - 8

Measure for goodness of fit Other forms of regression equation (Y - Y) = b(X - X) Or with correlation coefficient r = S XY /  X.  Y : (Y - Y) = r  Y /  X (X - X) By squaring previous equation and averaging   2 =  Y 2 (1 - r 2 ) r 2 = coefficient determination r 2 is a measure for the quality of the regression fit NOTE: A high r 2 is not sufficient; behaviour of residual about regression line and development with time also extremely important OHS - 9

Confidence limits Error variance Confidence limits regression line Confidence limits prediction MIND THE DIFFERENCE OHS - 10

Application of regression analysis for data validation 17 years of annual rainfall and runoff data Procedure: –Plotting of time series –Fitting of regression equation R = f(P) –Plot of residual versus P –Plot of residual versus time –Plot of accumulated residual with time –Double mass analysis of observed versus regression based runoff –Adjustment of runoff data –Repetition of above procedure and compare with above –Compare coefficients of determination –Compute confidence limits about regression and for prediction OHS - 11

Rainfall-runoff record 1961-1977 OHS - 12

Regression fit rainfall-runoff OHS - 13

Plot of residual versus rainfall OHS - 14

Plot of residual versus time OHS - 15

Plot of accumulated residual OHS - 16

Double mass analysis of observed versus computed runoff Break in measured runoff OHS - 17

Plot of rainfall versus corrected runoff OHS - 18

Plot of rainfall-corrected runoff regression OHS - 19

Plot of residual (corrected) versus rainfall OHS - 20

Plot of residual (corrected) versus time OHS - 21

Plot of regression line with confidence limits OHS - 22

Extrapolation Extrapolation of a regression equation beyond the measured range of X to obtain a value of Y not recommended: –confidence intervals become large –relation Y = f(X) may be non-linear for full range of X –extrapolation only if evidence of applicability of relation OHS - 23

Multiple linear regression models Model for monthly rainfall: R(t) =  +  1 P(t) +  2 P(t-1)+…. General linear model Y =  1 X 1 +  2 X 2 +….….+  p X p +  Matrix form: YX  Y = X  +  where: Y Y = (nx1) - data vector of (y i -y) X X = (nxp) - data matrix of (x i1 -x 1 ),…,(x ip -x p )   = (px1) - column vector of regression coeff.   = (nx1) - column vector of residuals Centered about the mean OHS - 24

Estimation of regression coefficients Minimisation of residual sum of squares  T  :  YX  YX   T  = (Y - X  ) T (Y - X  )   bDifferentiating with respect to  and replacing  by its estimate b normal equations: XXbXY X T Xb = X T Y bFor b it follows: bXXXY b = (X T X) -1 X T Y b  with: E[b] =  b =   2 (XX) Cov(b) =   2 (X T X) -1 OHS - 25

Analysis of variance table (ANOVA) Total sum of squares about the mean = regression sum of squares + + residual sum of squares Total sum of squares about the mean = regression sum of squares + + residual sum of squares Coefficient of determination = R m 2 = S R /S Y = 1 - S e /S Y Coefficient of determination = R m 2 = S R /S Y = 1 - S e /S Y OHS - 26

Coefficient of determination From ANOVA table Coefficient of determination R m 2 R m 2 = S R /S Y = 1 - S e /S Y Coefficient of determination adjusted for number of independent variables in regression R ma 2 R ma 2 = 1 - MS e /MS Y = 1 - (1 - R m 2 ).(n - 1)/(n - p - 1) OHS - 27

Comments Points of concern in using multiple regression: –can a linear model be used –what independent variables should be included Independent variables may be mutually correlated –investigate through the correlation matrix Retaining variables in regression that are highly correlated complicate interpretation of regression coefficients, with physically nonsense values Apply stepwise regression to select the “best” regression equation In stepwise regression a distinction can be made between “free” and “forced” variables; May May enter regression dependent on correlation Will Will enter regression irrespective of correlation OHS - 28

Non-linear models By transformation non-linear models can be transformed to linear models, e.g. Y =  X  to: ln Y = ln  +  ln X or: Y T =  T +  T X T where: Y T = ln Y X T = ln X  T = ln   T =  Remarks: –The transformed residual sum of squares is minimised rather than the residual sum of squares –Error term is additive in the transformed state, i.e. multiplicative in the power model:  T = ln  OHS - 29

Filling-in missing data Filling-in of missing water level and rainfall data in previous modules Filling in of discharge data using regression relation with rainfall often suitable for monthly, seasonal or annual data Monthly regression model e.g.: Q k,m = a k + b 1k P k,m + b 2k P k-1,m + s e,k e Addition of random component yes or no –Note: E[e] = 0, hence for single value no random component –For longer in-filling: could be considered dependent on use as no addition reduces the variance of series Regression model for month k, computing values for Q in year m OHS - 30

Type of regression model for filling-in missing flows Previously the following rainfall-discharge relation was proposed: Often regression coefficients do not vary much from month to month, but rather with wetness of month. Two sets of parameters are used in a regression model for all or a number of months: –one set for dry conditions –another set for wet conditions In the latter approach the non-linear relationship is fitted by two linear models Q k,m = a k + b 1k P k,m + b 2k P k-1,m + s e,k e OHS - 31

Download ppt "Use of regression analysis Regression analysis: –relation between dependent variable Y and one or more independent variables Xi Use of regression model."

Similar presentations