Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experimental design and analyses of experimental data Lesson 2 Fitting a model to data and estimating its parameters.

Similar presentations


Presentation on theme: "Experimental design and analyses of experimental data Lesson 2 Fitting a model to data and estimating its parameters."— Presentation transcript:

1 Experimental design and analyses of experimental data Lesson 2 Fitting a model to data and estimating its parameters

2 (-2,16) (-1,7) (0,4)(1,6) (2,10) where x 1 = x and x 2 = x 1 2

3 (-2,16) (-1,7) (0,4)(1,6) (2,10) where x 1 = x and x 2 = x 1 2 ε i is the residual for the ith observation

4 The best fit of a model is the one that minimizes the sum of squared deviations between observed and predicted values, i.e.

5 How to do the calculations where x 1 = x and x 2 = x 1 2 (x,y) = (-2,16) => y = β 0 (1) + β 1 (-2) + β 2 (4) + ε = 16 (x,y) = (-1,7) => y = β 0 (1) + β 1 (-1) + β 2 (1) + ε = 7 (x,y) = (0,4) => y = β 0 (1) + β 1 (0) + β 2 (0) + ε = 4 (x,y) = (1,6) => y = β 0 (1) + β 1 (1) + β 2 (1) + ε = 6 (x,y) = (2,10) => y = β 0 (1) + β 1 (2) + β 2 (4) + ε = 10 x 0 x 1 x 2 y

6 Transposed X matrix

7

8

9

10

11

12

13

14 Inverse X’X matrix

15 (X’X) -1 is called the inverse matrix of X’X. It is defined as

16

17 Variance- covariance matrix

18 Estimation of residual variance (s 2 ) Sum of Squared Errors Degrees of freedom for s 2

19 Variance of estimated parameters Variance-covariance matrix:

20 Covariance of estimated parameters Variance-covariance matrix:

21 Confidence limits for β i

22 Variance of the predicted line Let us assume that we want to predict y for a given value of x The chosen value of x is called a We can now write the equation as

23 Ex. a = -4 Fejl! Skulle have været -1.3

24 V(x+y) = V(x) + V(y) + 2Cov(x,y) V(x-y) = V(x) + V(y) – 2Cov(x,y) V(ax) = a 2 V(x) Cov(ax,by) = abCov(x,y) An alternative way of computation

25 The variance of a new observation of y a = -4 V(y) = ( )0.829 = SE(y) = 3.73 Variance of line Variance of new obs

26 Confidence limits 95% confidence limits for the line: a = -4 95% confidence limits for a single observation:

27 95% confidence limits

28 How to do it with SAS?

29 DATA eks21; INPUT x y; CARDS; ; PROC GLM; MODEL y = x x*x/solution ; OUTPUT out= new p= yhat L95M= low_mean U95M = up_mean L95 = low U95 = upper; RUN; PROC PRINT; RUN;

30 Number of observations in data set = 5 General Linear Models Procedure Dependent Variable: Y Source DF Sum of Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE Y Mean Source DF Type I SS Mean Square F Value Pr > F X X*X Source DF Type III SS Mean Square F Value Pr > F X X*X T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT X X*X OBS X Y YHAT LOW_MEAN UP_MEAN LOW UPPER s2s2 s

31 DATA eks21; INPUT x y; CARDS; ; PROC GLM; MODEL y = x x*x/solution ; OUTPUT out= new p= yhat L95M= low_mean U95M = up_mean L95 = low U95 = upper; RUN; PROC PRINT; RUN;

32 OBS X Y YHAT LOW_MEAN UP_MEAN LOW UPPER

33 A more complex problem Fit a model to these data

34 DATA polynom; INPUT x y; CARDS; ; DATA add; SET polynom; x2 = x**2; x3 = x**3; x4 = x**4; PROC REG; MODEL y = x x2 x3 x4; RUN;

35 The SAS System 08:22 Tuesday, October 29, The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept x x x x A fourth order polynomium

36 The SAS System 08:22 Tuesday, October 29, The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept x x x A third order polynomium

37 The SAS System 08:22 Tuesday, October 29, The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept x x A second order polynomium

38 The SAS System 08:22 Tuesday, October 29, The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept x A first order polynomium (a straight line)

39 True relationship: y = x – 0.02x x 3 + ε ε is normally distributed with 0 mean and σ = 10 Estimated relationship: y = – 1.415x x 2 s = Estimated relationship: y = x s = This is a better fit than this

40 Matrix Notation Of particular interest to us is the fact that not even in regression analysis was much use made of matrix algebra. In fact one of us, as a statistics graduate student at Cambridge University in the early 1950s, had lectures on multiple regression that were couched in scalar notation! This absence of matrices and vectors is surely surprising when one thinks of A.C. Aitken. His two books, Matrices and Determinants and Statistical Mathematics were both first published in 1939, had fourth and fifth editions, respectively, in 1947 and 1948, and are still in print. Yet, very surprisingly, the latter makes no use of matrices and vectors which are so thoroughly dealt with in the former. There were exceptions, of course, as have already been noted, such as Kempthorne (1952) and his co-workers, e.g. Wilk and Kempthorne (1955, 1956) – and others, too. Even with matrix expressions available, arithmetic was a real problem. A regression analysis in the New Zealand Department of Agriculture in the mid-1950s involved 40 regressors. Using electromechanical calculators, two calculators (people) using row echelon methods needed six weeks to invert the 40 x 40 matrix. One person could do a row, then the other checked it (to a maximum capacity of 8 to 10 digits, hoping for 4- or 5-digit accuracy in the final result). That person did the next row and passed it to the first person for checking; and so on. This was the impasse: matrix algebra was appropriate and not really difficult. But the arithmetic stemming therefrom could be a nightmare. (From Linear Models by Shayle R. Searle and Charles E. McCulloch in Advances in Biometry (eds. Peter Armitage and Herbert A. David), John Wiley & Sons, 1996)


Download ppt "Experimental design and analyses of experimental data Lesson 2 Fitting a model to data and estimating its parameters."

Similar presentations


Ads by Google