Download presentation

Presentation is loading. Please wait.

Published byRichard Davis Modified over 3 years ago

1
PLS path modeling and Regularized Generalized Canonical Correlation Analysis for multi-block data analysis Michel Tenenhaus

2
**Sensory analysis of 21 Loire Red Wines**

3 Appellations 4 Soils 4 blocks of variables A famous example of Jérôme Pagès X1 X2 X3 X4 Illustrative variable X1 = Smell at rest, X2 = View, X3 = Smell after shaking, X4 = Tasting

3
**components positively**

PCA of each block: Correlation loadings 2 dimensions 1 dimension Are these first components positively correlated ? Same question for the second components. 2 dimensions 2 dimensions

4
**RGCCA is a compromise between PLS-mode B optimizes the**

Using XLSTAT-PSLPM / Mode PCA on variables more correlated to PC1 than to PC2 Model 1 Outer model PCA optimizes the Inner model RGCCA is a compromise between PCA and PLS-mode B PLS-mode B optimizes the

5
**Model 1 : PCA of each block All loadings are significant (except one).**

6
**Model 1 : PCA of each block All weights are significant.**

PCA is very stable. All weights are significant.

7
**Multi-Block Analysis is a factor analysis of tables :**

PLS-Mode B: F1h,…,FJh optimize the inner model. PCA: Fj1,…,Fjmj optimize the outer model. subject to constraints : Factors (LV, Scores, Components) are well explaining their own block . RGCCA gives a compromise between these two objectives. and/or Same order factors are well ( positively ) correlated ( to improve interpretation ).

8
**PLS-mode B and RGCCA for Multi-Block data Analysis**

Inner model: connections between LV’s Outer model: connections between MV’s and their LV’s. Maximizing correlations for inner model: PLS-mode B (H. Wold, 1982 and Hanafi, 2007). But, for each block, more observations than variables are needed. Maximizing correlations for inner model and explained variances for outer model: Regularized Generalized Canonical Correlation Analysis (A. & M. Tenenhaus, 2011) No constraints on block dimensions when the “shrinkage constants” are positive. PLS-mode B is a special case of RGCCA.

9
PLS-mode B where: H. Wold (1982) has described a monotone convergent algorithm related to this optimization problem. (Proof by Hanafi in 2007.)

10
**Wold’s algorithm: PLS-Mode B**

yj = Xjaj Outer component (summarizes the block) Inner component (takes into account relations between blocks) aj Initial step Iterate until convergence of the criterion. (Hanafi, 2007) Choice of inner weights ejk: Horst : ejk = cjk Centroid : ejk = cjksign(Cor(yk,yj)) Factorial : ejk = cjkCor(yk,yj) cjk = 1 if blocks are connected, 0 otherwise Limitation: nj > pj

11
**Optimizing the inner model (with XLSTAT)**

PLS-mode B, Centroid scheme <=> pj < nj Model 2 Inner model One step average two-block CCA

12
**Optimizing the inner model (with XLSTAT)**

Mode B, Factoriel <=> pj < nj Model 3 Inner model One step average two-block CCA

13
Model 3

14
**PLS-mode B is very unstable.**

Model 3 PLS-mode B is very unstable.

15
**Conclusion Many weights are not significant !!!**

If you want the butter (good correlations for the inner and outer models) and the money of the butter (significant weights) , you must switch to Regularized Generalized Canonical Correlation Analysis (RGCCA).

16
**Regularized generalized CCA**

A monotone convergent algorithm related to this optimization problem is proposed (A.& M. Tenenhaus, 2011). where: and:

17
**The PLS algorithm for RGCCA**

yj = Xjaj Outer component (summarizes the block) Inner component (takes into account relations between blocks) aj Initial step Iterate until convergence of the criterion. Choice of inner weights ejk: Horst : ejk = cjk Centroid : ejk = cjksign(Cor(yk,yj)) Factorial : ejk = cjkCov(yk,yj) cjk = 1 if blocks are connected, 0 otherwise. nj can be <= pj, for j > 0.

18
**All j = 0, RGCCA = PLS-Mode B**

yj = Xjaj Outer component (summarizes the block) Inner component (takes into account relations between blocks) aj Initial step Iterate until convergence of the criterion. Choice of inner weights ejk: Horst : ejk = cjk Centroid : ejk = cjksign(Cor(yk,yj)) Factorial : ejk = cjkCor(yk,yj) cjk = 1 if blocks are connected, 0 otherwise.

19
**All j = 1, RGCCA - Mode A aj yj = Xjaj Initial step**

Outer component (summarizes the block) Inner component (takes into account relations between blocks) aj Initial step Iterate until convergence of the criterion. Choice of inner weights ejk: Horst : ejk = cjk Centroid : ejk = cjksign(Cor(yk,yj)) Factorial : ejk = cjkCov(yk,yj) cjk = 1 if blocks are connected, 0 otherwise. nj can be <= pj.

20
**Latent variables have been afterwards standardized.**

Model 4 : RGCCA, factorial scheme, mode A One step average two-block PLS regression Latent variables have been afterwards standardized.

21
**All loadings are significant.**

Model 4 All loadings are significant.

22
**RGCCA-mode A is very stable. All weights are also significant.**

Model 4 RGCCA-mode A is very stable. All weights are also significant.

23
Model Comparison (Schäfer & Strimmer, 2005) R-code (Arthur T.)

24
**Mode A favors the outer model. Mode B favors the inner model.**

AVE outer Same for all blocks AVE inner Mode B : = 0 Mode A : = 1 Mode A favors the outer model. Mode B favors the inner model.

25
**Hierarchical model for wine data: Model 5**

RGCCA: Factorial, Mode A Dimension 1 One-step hierarchical PLS regression 2nd order block “Global” contains all the MV’s of the 1st order blocks

26
**Hierarchical model for wine data: Model 6**

RGCCA: Factorial, Mode A for initial blocks, Mode B for global block Mode A This method has been proposed independently at least three times: - Covariance criterion (J.D. Carroll, 1968) - Consensus PCA (S. Wold et al., 1987) - Multiple co-inertia analysis (Chessel & Hanafi, 1996) Mode B Mode A Mode A Mode A One-step hierarchical redundancy analysis 2nd order block “Global” contains all the MV’s of the 1st order blocks

27
**2nd order block “Global” contains all the MV’s of the 1st order blocks**

Hierarchical model for wine data: Model 7 RGCCA, Factorial, Mode A Dimension 2 Block View is given up. 2nd order block “Global” contains all the MV’s of the 1st order blocks

28
**Mapping of the correlations with the global components**

29
**Wine visualization in the global component space **

Wines marked by Appellation

30
**Wine visualization in the global component space**

Wines marked by Soil DAM = Dampierre-sur-Loire GOOD QUALITY

31
Cuvée Lisagathe 1995 A soft, warm, blackberry nose. A good core of fruit on the palate with quite well worked tannin and acidity on the finish; Good length and a lot of potential. DECANTER (mai 1997) (DECANTER AWARD ***** : Outstanding quality, a virtually perfect example)

32
References

33
**Final conclusion All the proofs of a pudding are in the eating, but**

it will taste even better if you know the cooking.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google