Projection on Latent Variables

Projection on Latent Variables
PLS Partial Least Squares but also Projection on Latent Variables

First marginal (partial) regression through the origin, of X1 on Y
Step 1 First marginal (partial) regression through the origin, of X1 on Y

Second marginal regression through the origin, of X2 on Y

COVARIANCE DIRECTION COSINES STEP 2

The marginal slopes w after normalization describe the first LATENT VARIABLE

The scores t on latent variable are computed
STEP 3 The scores t on latent variable are computed

STEP 4 Regression of the response variable y on the latent variable t. Used to compute the model and the residuals to be used to compute the next latent variables.

KNOWLEDGE of DATA (EXPERIENCE)
A very simple numerical example: Object Predictor Predictor Response You can easily note that the two predictors are about equal. Really they are obtained by two repetition of the measure of the same quantity. KNOWLEDGE of DATA (EXPERIENCE)

Strategy A y = a + b x1 2y = b2 2  25 2
We know that there are really only one predictor, measured two times, so that we decide to use only the first With the method of least squares we compute slope and intercept. This is the regression model: y = a + b x1 with a = and b = With 2 variance of the predictor (obviously the same for the two predictors) the variance of the estimate of the response is (because of the law of propagation of variances) 2y = b2 2  25 2

We used only the chemical knowledge
Strategy A We used only the chemical knowledge 2y  2

Strategy B y = a + b x1 with a = -0.00685 and b = 5.00013
We know that the mean of two repetitions has variance one half that of a single repetition. So we decide to use as a single predictor, the mean m of the two determinations. With the method of least squares we compute slope and intercept. This is the regression model: y = a + b x1 with a = and b =

With 2 variance of the predictors the variance of the mean is:
2/2 So the variance of the response is computed as: 2y = b2 2/2  2

With strategy B, We used both the chemical knowledge
and the knowledge of statistics 2y = 2

We use the least square multiple regression (MLR or OLS)
STRATEGY C We use the least square multiple regression (MLR or OLS) With the method of least squares we compute two slopes and intercept. This is the regression model: y = a + b1 x1 + b2 x2 with a = , b1 = and b2 = The variance of the estimate of the response is obtained from the law of propagation of variances as: 2y = b21 2 + b22 2  2

We were very lucky!!! BUT ……. b1 = 1.5019 and b2 = 3.49844
In his effort to minimize the sum of the squares of residuals OLS can be even worse.. In fact it is possible to notice that the sum of the two slopes: b1 = and b2 = is , about the same as the unique slope obtained with strategies A and B. Apparently with two almost equal predictors what is important is the sum of the slopes. It must be about 5. So the result b1 = 15 and b2 = -10 seems acceptable, BUT …….

b21 = 152 = 225 and b22 = 102 = 100 2y = b21 2 + b22 2  325 2

Conclusion: OLS, using all the experimental information, never gives a model better than that of the strategy B (knowledge of data and of statistics) and the result can be worse than that obtained from strategy A that uses only a fraction of the information

First step: Regression of the two predictors on the response variable:
Strategy PLS First step: Regression of the two predictors on the response variable: x1 = c d1 y c1 = , d1 = x2 = c d2 y c2 = , d2 =

Normalization of slopes
Strategy PLS Second step Normalization of slopes Result: w1 = w2 =

Strategy PLS Step 3 Definition of a LATENT VARIABLE, combination of the two predictors by means of the coefficients W: t = w1 x w2 x2

Strategy PLS STEP 4 Regression of the response on the latent variable. We obtain the regression model as a function of the latent variable: y = e + f t with e = and f =

taking into account that
From y = e + f t taking into account that t = w1 x w2 x2 = x x2 and that f = we obtain: y = x x2 (PLS closed form)

we can compute the variance of the response:
Finally from y = x we can compute the variance of the response: 2y = b21 2 + b22 2 = ( ) 2 = 2

PLS is an intelligent technique
The PLS model gives the same uncertainty on the response as that of Strategy B (knowledge of data and statistics, use of all the information) PLS “understand” that the two predictors have the same importance (slopes more or less equal) PLS is an intelligent technique

Projection on Latent Variables

Similar presentations

Presentation on theme: "Projection on Latent Variables"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Projection on Latent Variables

Similar presentations

Presentation on theme: "Projection on Latent Variables"— Presentation transcript:

Similar presentations

About project

Feedback