Presentation is loading. Please wait.

Presentation is loading. Please wait.

PLS path modeling and Regularized Generalized Canonical Correlation Analysis for multi-block data analysis Michel Tenenhaus.

Similar presentations


Presentation on theme: "PLS path modeling and Regularized Generalized Canonical Correlation Analysis for multi-block data analysis Michel Tenenhaus."— Presentation transcript:

1 PLS path modeling and Regularized Generalized Canonical Correlation Analysis for multi-block data analysis Michel Tenenhaus

2 Sensory analysis of 21 Loire Red Wines
3 Appellations 4 Soils 4 blocks of variables A famous example of Jérôme Pagès X1 X2 X3 X4 Illustrative variable X1 = Smell at rest, X2 = View, X3 = Smell after shaking, X4 = Tasting

3 components positively
PCA of each block: Correlation loadings 2 dimensions 1 dimension Are these first components positively correlated ? Same question for the second components. 2 dimensions 2 dimensions

4 RGCCA is a compromise between PLS-mode B optimizes the
Using XLSTAT-PSLPM / Mode PCA on variables more correlated to PC1 than to PC2 Model 1 Outer model PCA optimizes the Inner model RGCCA is a compromise between PCA and PLS-mode B PLS-mode B optimizes the

5 Model 1 : PCA of each block All loadings are significant (except one).

6 Model 1 : PCA of each block All weights are significant.
PCA is very stable. All weights are significant.

7 Multi-Block Analysis is a factor analysis of tables :
PLS-Mode B: F1h,…,FJh optimize the inner model. PCA: Fj1,…,Fjmj optimize the outer model. subject to constraints : Factors (LV, Scores, Components) are well explaining their own block . RGCCA gives a compromise between these two objectives. and/or Same order factors are well ( positively ) correlated ( to improve interpretation ).

8 PLS-mode B and RGCCA for Multi-Block data Analysis
Inner model: connections between LV’s Outer model: connections between MV’s and their LV’s. Maximizing correlations for inner model: PLS-mode B (H. Wold, 1982 and Hanafi, 2007). But, for each block, more observations than variables are needed. Maximizing correlations for inner model and explained variances for outer model: Regularized Generalized Canonical Correlation Analysis (A. & M. Tenenhaus, 2011) No constraints on block dimensions when the “shrinkage constants” are positive. PLS-mode B is a special case of RGCCA.

9 PLS-mode B where: H. Wold (1982) has described a monotone convergent algorithm related to this optimization problem. (Proof by Hanafi in 2007.)

10 Wold’s algorithm: PLS-Mode B
yj = Xjaj Outer component (summarizes the block) Inner component (takes into account relations between blocks) aj Initial step Iterate until convergence of the criterion. (Hanafi, 2007) Choice of inner weights ejk: Horst : ejk = cjk Centroid : ejk = cjksign(Cor(yk,yj)) Factorial : ejk = cjkCor(yk,yj) cjk = 1 if blocks are connected, 0 otherwise Limitation: nj > pj

11 Optimizing the inner model (with XLSTAT)
PLS-mode B, Centroid scheme <=> pj < nj Model 2 Inner model One step average two-block CCA

12 Optimizing the inner model (with XLSTAT)
Mode B, Factoriel <=> pj < nj Model 3 Inner model One step average two-block CCA

13 Model 3

14 PLS-mode B is very unstable.
Model 3 PLS-mode B is very unstable.

15 Conclusion Many weights are not significant !!!
If you want the butter (good correlations for the inner and outer models) and the money of the butter (significant weights) , you must switch to Regularized Generalized Canonical Correlation Analysis (RGCCA).

16 Regularized generalized CCA
A monotone convergent algorithm related to this optimization problem is proposed (A.& M. Tenenhaus, 2011). where: and:

17 The PLS algorithm for RGCCA
yj = Xjaj Outer component (summarizes the block) Inner component (takes into account relations between blocks) aj Initial step Iterate until convergence of the criterion. Choice of inner weights ejk: Horst : ejk = cjk Centroid : ejk = cjksign(Cor(yk,yj)) Factorial : ejk = cjkCov(yk,yj) cjk = 1 if blocks are connected, 0 otherwise. nj can be <= pj, for j > 0.

18 All j = 0, RGCCA = PLS-Mode B
yj = Xjaj Outer component (summarizes the block) Inner component (takes into account relations between blocks) aj Initial step Iterate until convergence of the criterion. Choice of inner weights ejk: Horst : ejk = cjk Centroid : ejk = cjksign(Cor(yk,yj)) Factorial : ejk = cjkCor(yk,yj) cjk = 1 if blocks are connected, 0 otherwise.

19 All j = 1, RGCCA - Mode A aj yj = Xjaj Initial step
Outer component (summarizes the block) Inner component (takes into account relations between blocks) aj Initial step Iterate until convergence of the criterion. Choice of inner weights ejk: Horst : ejk = cjk Centroid : ejk = cjksign(Cor(yk,yj)) Factorial : ejk = cjkCov(yk,yj) cjk = 1 if blocks are connected, 0 otherwise. nj can be <= pj.

20 Latent variables have been afterwards standardized.
Model 4 : RGCCA, factorial scheme, mode A One step average two-block PLS regression Latent variables have been afterwards standardized.

21 All loadings are significant.
Model 4 All loadings are significant.

22 RGCCA-mode A is very stable. All weights are also significant.
Model 4 RGCCA-mode A is very stable. All weights are also significant.

23 Model Comparison (Schäfer & Strimmer, 2005) R-code (Arthur T.)

24 Mode A favors the outer model. Mode B favors the inner model.
AVE outer Same  for all blocks AVE inner Mode B :  = 0 Mode A :  = 1 Mode A favors the outer model. Mode B favors the inner model.

25 Hierarchical model for wine data: Model 5
RGCCA: Factorial, Mode A Dimension 1 One-step hierarchical PLS regression 2nd order block “Global” contains all the MV’s of the 1st order blocks

26 Hierarchical model for wine data: Model 6
RGCCA: Factorial, Mode A for initial blocks, Mode B for global block Mode A This method has been proposed independently at least three times: - Covariance criterion (J.D. Carroll, 1968) - Consensus PCA (S. Wold et al., 1987) - Multiple co-inertia analysis (Chessel & Hanafi, 1996) Mode B Mode A Mode A Mode A One-step hierarchical redundancy analysis 2nd order block “Global” contains all the MV’s of the 1st order blocks

27 2nd order block “Global” contains all the MV’s of the 1st order blocks
Hierarchical model for wine data: Model 7 RGCCA, Factorial, Mode A Dimension 2 Block View is given up. 2nd order block “Global” contains all the MV’s of the 1st order blocks

28 Mapping of the correlations with the global components

29 Wine visualization in the global component space
Wines marked by Appellation

30 Wine visualization in the global component space
Wines marked by Soil DAM = Dampierre-sur-Loire GOOD QUALITY

31 Cuvée Lisagathe 1995 A soft, warm, blackberry nose. A good core of fruit on the palate with quite well worked tannin and acidity on the finish; Good length and a lot of potential. DECANTER (mai 1997) (DECANTER AWARD ***** : Outstanding quality, a virtually perfect example)

32 References

33 Final conclusion All the proofs of a pudding are in the eating, but
it will taste even better if you know the cooking.


Download ppt "PLS path modeling and Regularized Generalized Canonical Correlation Analysis for multi-block data analysis Michel Tenenhaus."

Similar presentations


Ads by Google