Download presentation
Presentation is loading. Please wait.
Published byKory Jeffery Fields Modified over 9 years ago
1
Applications The General Linear Model
2
Transformations
3
Transformations to Linearity Many non-linear curves can be put into a linear form by appropriate transformations of the either – the dependent variable Y or –some (or all) of the independent variables X 1, X 2,..., X p. This leads to the wide utility of the Linear model. We have seen that through the use of dummy variables, categorical independent variables can be incorporated into a Linear Model. We will now see that through the technique of variable transformation that many examples of non-linear behaviour can also be converted to linear behaviour.
4
Intrinsically Linear (Linearizable) Curves 1 Hyperbolas y = x/(ax-b) Linear form: 1/y = a -b (1/x) or Y = 0 + 1 X Transformations: Y = 1/y, X=1/x, 0 = a, 1 = -b
5
2. Exponential y = e x = x Linear form: ln y = ln + x = ln + ln x or Y = 0 + 1 X Transformations: Y = ln y, X = x, 0 = ln , 1 = = ln
6
3. Power Functions y = a x b Linear from: ln y = lna + blnx or Y = 0 + 1 X
7
Logarithmic Functions y = a + b lnx Linear from: y = a + b lnx or Y = 0 + 1 X Transformations: Y = y, X = ln x, 0 = a, 1 = b
8
Other special functions y = a e b/x Linear from: ln y = lna + b 1/x or Y = 0 + 1 X Transformations: Y = ln y, X = 1/x, 0 = lna, 1 = b
9
The Box-Cox Family of Transformations
10
The Transformation Staircase
11
Graph of ln(x)
12
The effect of the transformation
13
The ln-transformation is a member of the Box-Cox family of transformations with = 0 If you decrease the value of the effect of the transformation will be greater. If you increase the value of the effect of the transformation will be less.
14
The effect of the ln transformation It spreads out values that are close to zero Compacts values that are large
15
The Bulging Rule x up y up y down x down
16
Non-Linear Models Nonlinearizable models
17
Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring ) “rate of increase in Y” =
18
The Logistic Growth Model or (ignoring ) “rate of increase in Y” = Equation:
19
The Gompertz Growth Model: or (ignoring ) “rate of increase in Y” = Equation:
20
Polynomial Regression models
21
Polynomial Models y = 0 + 1 x + 2 x 2 + 3 x 3 Linear form Y = 0 + 1 X 1 + 2 X 2 + 3 X 3 Variables Y = y, X 1 = x, X 2 = x 2, X 3 = x 3
22
Suppose that we have two variables 1. Y – the dependent variable (response variable) 2. X – the independent variable (explanatory variable, factor)
23
Assume that we have collected data on two variables X and Y. Let ( x 1, y 1 ) ( x 2, y 2 ) ( x 3, y 3 ) … ( x n, y n ) denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)
24
1.independent random variables. 2.Normally distributed. 3.Have the common variance, . 4.The mean of y i is: The assumption will be made that y 1, y 2, y 3 …, y n are
25
Each y i is assumed to be randomly generated from a normal distribution with mean and standard deviation .
26
The Model The matrix formulation
27
The Normal Equations
28
Example In the following example two quantities are being measured X = amount of an additive to a chemical process Y = the yield of the process
29
Graph X vs Y
30
The Model – Cubic polynomial (degree 3) Comment: A cubic polynomial in x can be fitted to y by defining the variables X 1 = x, X 2 = x 2, and X 3 = x 3 Then fitting the linear model
31
Response Surface Models Extending polynomial regression models to k independent variables
32
Response Surface models (2 independent vars.) Dependent variable Y and two independent variables x 1 and x 2. (These ideas are easily extended to more the two independent variables) The Model (A cubic response surface model) Compare with a linear model:
34
The response surface model can be put into the form of a linear model : Y = 0 + 1 X 1 + 2 X 2 + 3 X 3 + 4 X 4 + 5 X 5 + 6 X 6 + 7 X 7 + 8 X 8 + 9 X 9 + by defining
35
More Generally, consider the random variable Y with 1. E[Y] = g(U 1,U 2,..., U k ) = 1 1 (U 1,U 2,..., U k ) + 2 2 (U 1,U 2,..., U k ) +... + p p (U 1,U 2,..., U k ) = and 2. var(Y) = 2 where 1, 2,..., p are unknown parameters and 1, 2,..., p are known functions of the nonrandom variables U 1,U 2,..., U k. Assume further that Y is normally distributed.
36
Now suppose that n independent observations of Y, (y 1, y 2,..., y n ) are made corresponding to n sets of values of (U 1,U 2,..., U k ) : (u 11,u 12,..., u 1k ), (u 21,u 22,..., u 2k ),... (u n1,u n2,..., u nk ). Let x ij = j (u i1,u i2,..., u ik ) j =1, 2,..., p; i =1, 2,..., n. Then or
37
Polynomial Regression Model: One variable U. Quadratic Response Surface Model: Two variables U 1, U 2.
38
Trigonometric Polynomial Models
39
y = 0 + 1 cos(2 f 1 x) + 1 sin(2 f 1 x) + … + k cos(2 f k x) + k sin(2 f k x) Linear form Y = 0 + 1 C 1 + 1 S 1 + … + k C k + k S k Variables Y = y, C 1 = cos(2 f 1 x), S 2 = sin(2 f 1 x), … C k = cos(2 f k x), S k = sin(2 f k x)
40
General set of models The Normal equations: given data
41
Two important Special Cases Polynomial Models Trig-polynomial Models
42
Orthogonal Polynomial Models
43
Definition Consider the values x 0, x 1, …, x n and the polynomials are orthogonal relative to x 0, x 1, …, x n if: If in addition, they are called orthonormal
44
Consider the model This is equivalent to a polynomial model. Rather than the basis for this model being The basis is,polynomials of degree 0, 1, 2, 3, etc
45
The Normal Equations given the data
46
Derivation of Orthogonal Polynomials With equally spaced data points
47
Suppose x 0 = a, x 1 = a + b, x 2 = a + 2b, …, x n = a + nb
65
To do the calculations we need the values of: These values depend only on 1. n = the number of observations 2. i = the degree of the polynomial, and 3. j = the index of x j.
66
Orthogonal Linear Contrasts for Polynomial Regression
68
The Use of Dummy Variables
69
In the examples so far the independent variables are continuous numerical variables. Suppose that some of the independent variables are categorical. Dummy variables are artificially defined variables designed to convert a model including categorical independent variables to the standard multiple regression model.
70
Example: Comparison of Slopes of k Regression Lines with Common Intercept
71
Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both –Y (the response variable) and –X (an independent variable) Y is assumed to be linearly related to X with –the slope dependent on treatment (population), while –the intercept is the same for each treatment
72
The Model:
73
This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments. Dummy variables are variables that are artificially defined
74
In this case we define a new variable for each category of the categorical variable. That is we will define X i for each category of treatments as follows :
75
Then the model can be written as follows: The Complete Model: where
76
In this case Dependent Variable: Y Independent Variables: X 1, X 2,..., X k
77
In the above situation we would likely be interested in testing the equality of the slopes. Namely the Null Hypothesis (q = k – 1)
78
The Reduced Model: Dependent Variable: Y Independent Variable: X = X 1 + X 2 +... + X k
79
Example: In the following example we are measuring –Yield Y as it depends on –the amount (X) of a pesticide. Again we will assume that the dependence of Y on X will be linear. (I should point out that the concepts that are used in this discussion can easily be adapted to the non- linear situation.)
80
Suppose that the experiment is going to be repeated for three brands of pesticides: A, B and C. The quantity, X, of pesticide in this experiment was set at 3 different levels: –2 units/hectare, –4 units/hectare and –8 units per hectare. Four test plots were randomly assigned to each of the nine combinations of test plot and level of pesticide.
81
Note that we would expect a common intercept for each brand of pesticide since when the amount of pesticide, X, is zero the four brands of pesticides would be equivalent.
82
The data for this experiment is given in the following table: 248 A29.6328.1628.45 31.8733.4837.21 28.0228.1335.06 35.2428.2533.99 B32.9529.5544.38 24.7434.9738.78 23.3836.3534.92 32.0838.3827.45 C28.6833.7946.26 28.7043.9550.77 22.6736.8950.21 30.0233.5644.14
84
PesticideX (Amount)X1X1 X2X2 X3X3 Y A220029.63 A220031.87 A220028.02 A220035.24 B202032.95 B202024.74 B202023.38 B202032.08 C200228.68 C200228.70 C200222.67 C200230.02 A440028.16 A440033.48 A440028.13 A440028.25 B404029.55 B404034.97 B404036.35 B404038.38 C400433.79 C400443.95 C400436.89 C400433.56 A880028.45 A880037.21 A880035.06 A880033.99 B808044.38 B808038.78 B808034.92 B808027.45 C800846.26 C800850.77 C800850.21 C800844.14 The data as it would appear in a data file. The variables X 1, X 2 and X 3 are the “dummy” variables
85
Fitting the complete model : ANOVA dfSSMSFS ignificance F Regression31095.815813365.271937818.331147884.19538E-07 Residual32637.641575419.92629923 Total351733.457389 Coefficients Intercept26.24166667 X1X1 0.981388889 X2X2 1.422638889 X3X3 2.602400794
86
Fitting the reduced model : ANOVA dfSSMSFSignificance F Regression1623.8232508 19.114399780.000110172 Residual341109.63413832.63629818 Total351733.457389 Coefficients Intercept26.24166667 X1.668809524
87
The Anova Table for testing the equality of slopes dfSSMSFSignificance F common slope zero 1623.8232508 31.30652833.51448E-06 Slope comparison 2471.9925627235.996281311.843457660.000141367 Residual32637.641575419.92629923 Total351733.457389
88
Example: Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)
89
Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both Y (then response variable) and X (an independent variable) Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment. Y is called the response variable, while X is called the covariate.
90
The Model:
91
Equivalent Forms of the Model: 1) 2)
92
This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments.
93
In this case we define a new variable for each category of the categorical variable. That is we will define X i for categories I i = 1, 2, …, (k – 1) of treatments as follows:
94
Then the model can be written as follows: The Complete Model: where
95
In this case Dependent Variable: Y Independent Variables: X 1, X 2,..., X k-1, X
96
In the above situation we would likely be interested in testing the equality of the intercepts. Namely the Null Hypothesis (q = k – 1)
97
The Reduced Model: Dependent Variable: Y Independent Variable: X
98
Example: In the following example we are interested in comparing the effects of five workbooks (A, B, C, D, E) on the performance of students in Mathematics. For each workbook, 15 students are selected (Total of n = 15×5 = 75). Each student is given a pretest (pretest score ≡ X) and given a final test (final score ≡ Y). The data is given on the following slide
99
The data The Model:
100
Graphical display of data
101
Some comments 1.The linear relationship between Y (Final Score) and X (Pretest Score), models the differing aptitudes for mathematics. 2.The shifting up and down of this linear relationship measures the effect of workbooks on the final score Y.
102
The Model:
103
The data as it would appear in a data file.
104
The data as it would appear in a data file with Dummy variables, (X1, X2, X3, X4 )added
105
Here is the data file in SPSS with the Dummy variables, (X1, X2, X3, X4 )added. The can be added within SPSS
106
Fitting the complete model The dependent variable is the final score, Y. The independent variables are the Pre-score X and the four dummy variables X 1, X 2, X 3, X 4.
107
The Output
108
The Output - continued
109
The interpretation of the coefficients The common slope
110
The interpretation of the coefficients The intercept for workbook E
111
The interpretation of the coefficients The changes in the intercept when we change from workbook E to other workbooks.
112
1.When the workbook is E then X 1 = 0,…, X 4 = 0 and The model can be written as follows: The Complete Model: 2.When the workbook is A then X 1 = 1,…, X 4 = 0 and hence 1 is the change in the intercept when we change form workbook E to workbook A.
113
Testing for the equality of the intercepts The reduced model The dependent variable in only X (the pre-score)
114
Fitting the reduced model The dependent variable is the final score, Y. The independent variables is only the Pre-score X.
115
The Output for the reduced model Lower R 2
116
The Output - continued Increased R.S.S
117
The F Test
118
The Reduced model The Complete model
119
The F test
120
Testing for zero slope The reduced model The dependent variables are X 1, X 2, X 3, X 4 (the dummies)
121
The Reduced model The Complete model
122
The F test
123
The Analysis of Covariance This analysis can also be performed by using a package that can perform Analysis of Covariance (ANACOVA) The package sets up the dummy variables automatically
124
Here is the data file in SPSS. The Dummy variables are no longer needed.
125
In SPSS to perform ANACOVA you select from the menu – Analysis->General Linear Model->Univariatee
126
This dialog box will appear
127
You now select: 1.The dependent variable Y (Final Score) 2.The Fixed Factor (the categorical independent variable – workbook) 3.The covariate (the continuous independent variable – pretest score)
128
Compare this with the previous computed table The output: The ANOVA TABLE
129
This is the sum of squares in the numerator when we attempt to test if the slope is zero (and allow the intercepts to be different) The output: The ANOVA TABLE
130
The Use of Dummy Variables
131
Example: Comparison of Slopes of k Regression Lines with Common Intercept
132
Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both –Y (the response variable) and –X (an independent variable) Y is assumed to be linearly related to X with –the slope dependent on treatment (population), while –the intercept is the same for each treatment
133
The Model:
134
The model can be written as follows: The Complete Model: where
135
Example: Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)
136
Situation: k treatments or k populations are being compared. For each of the k treatments we have measured both Y (then response variable) and X (an independent variable) Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment. Y is called the response variable, while X is called the covariate.
137
The Model:
138
The model can be written as follows: The Complete Model: where
139
Another application of the use of dummy variables The dependent variable, Y, is linearly related to X, but the slope changes at one or several known values of X (nodes). Y X nodes
140
The model Y X x1x1 x2x2 xkxk 11 22 kk or
141
Now define Etc.
142
Then the model can be written
143
An Example In this example we are measuring Y at time X. Y is growing linearly with time. At time X = 10, an additive is added to the process which may change the rate of growth. The data
144
Graph
145
Now define the dummy variables
146
The data as it appears in SPSS – x1, x2 are the dummy variables
147
We now regress y on x1 and x2.
148
The Output
149
Graph
150
Testing for no change in slope Here we want to test H 0 : 1 = 2 vs H A : 1 ≠ 2 The reduced model is Y = 0 + 1 (X 1 + X 2 ) + = 0 + 1 X +
151
Fitting the reduced model We now regress y on x.
152
The Output
153
Graph – fitting a common slope
154
The test for the equality of slope
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.