Presentation on theme: "1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”"— Presentation transcript:
1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”
2 Orange juice example (J. Pagès) X 1 = Physico-chemical, X 2 = Sensorial, X = [X 1, X 2 ], Y = Hedonic
3 Structural Equation Modeling The PLS approach of Herman WOLD Study of a system of linear relationships between latent variables. Each latent variable is described by a set of manifest variables, or summarizes them. Variables can be numerical, ordinal or nominal (no need for normality assumptions). The number of observations can be small compare to the number of variables.
4 Orange juice example on a homogenous group of judges Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Judge 2 Judge 3 Judge 96 Physico-chemical Sensorial Manifest variable Endogenous latent variable Hedonic Exogenous latent variable Measurement modelStructural model w 11 w 12 w 19 21 11 22 w 21 w 22 w 27 w 32 w 33 w 396
5 A SEM tree Chatelin-Esposito Vinzi Fahmy-Jäger-Tenenhaus XLSTAT-PLSPM (2007) W. Chin PLS-Graph Herman Wold NIPALS (1966) PLS approach (1975) J.-B. Lohmöller LVPLS 1.8 (1984) SEM Component-based SEM (Score computation) Covariance-based SEM (CSA) (Model estimation/validation) H. Hwang Y. Takane GSCA (2004) H. Hwang VisualGSCA 1.0 (2007) For good blocks (High Cronbach ): - Score = 1st PC - Score = MV’s For good blocks, all methods give almost the same results. AMOS 6.0, 2007 Score computed for each block using MV loadings Path analysis on the structural model defined on the scores
6 When all blocks are good, all the methods give practically the same results: M. Tenenhaus : Component-based SEM Total Quality Management, 2008 For all data, PLS and SEM yield to highly correlated LV scores: M. Tenenhaus : SEM for small samples HEC Working paper, Results Data structures are stronger than statistical methods.
7 PLS algorithm (Mode A, Centroid scheme) Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Juge 2 Juge 3 Juge 96 21 11 22 w 11 w 12 w 19 w 21 w 22 w 27 w 32 w 33 w 396 Y 1 =X 1 w 1 (outer estimate) Y 2 =X 2 w 2 Y 3 =X 3 w 3 Z 1 =Y 2 +Y 3 (inner estimate) Z 2 =Y 1 +Y 3 Z 3 =Y 1 +Y 2 w 11 = Cor(glucose,Z 1 ) w 12 = Cor(fructose,Z 1 ) w 19 = Cor(vitamin C,Z 1 ) w 21 = Cor(smell int.,Z 2 ) w 22 = Cor(odor typ.,Z 2 ) w 27 = Cor(Sweetness,Z 2 ) w 32 = Cor(judge2,Z 3 ) w 33 = Cor(judge3,Z 3 ) w 3,96 = Cor(judge96,Z 3 ) Iterate until convergence.
8 SPECIAL CASES OF PLS PATH MODELLING Principal component analysis Multiple factor analysis Canonical correlation analysis Redundancy analysis PLS regression Generalized canonical correlation analysis (Horst) Generalized canonical correlation analysis(Carroll) Analyse de la co-inertie multiple (Chessel & Hanafi) etc.…
9 Use of XLSTAT-PLSPM
10 Outer weight w Non significant variables are in red
16 Model estimation by PLS : Inner model and correlations Glucose 1 2 3 Judge 2 Judge 3 Judge 96 Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness (t = 1.522).713 (t = 3.546) >0 R 2 = (t = 2.864) Non significant variables are in red
17 Estimation of the inner model by PLS regression R 2 = The correlation between the physico-chemical and the sensorial variables can be taken into account by using PLS regression: Physico-chemical sensorial CoeffCS(hédonic) Validation of PLS regression by Jack-knife
18 Use of the PLS option of XLSTAT-PLSPM Physico-chemical has no direct effect on Hedonic, but a strong indirect effect.
21 Structural Equation Modeling Measurement model (outer model) : VM VL VM Endogenous Exogenous
22 Structural Equation Modeling MV covariance matrix : Outer model Inner model Cov. for exo. LV Variance for structural residuals Variance for measurement residuals
23 Covariance-based SEM ULS algorithm (Unweighted Least Squares) : S = Observed covariance matrix for MV’s Goodness-of-fit Index (Jöreskog & Sorbum): Generalization of PCA
24 Use of AMOS 6.0 Method = ULS This is a computational trick: Residual variances are passed to errors and can always be computed afterwards. First Roderick McDonald’s idea (1996) Measurement residual variances are canceled:
25 Covariance-based SEM ULS algorithm with the McDonald’s constraints: S = Observed covariance matrix for MV Goodness-of-fit Index (Jöreskog & Sorbum):
26 Use of AMOS Method = ULS - Measurement residual variances = 0
27 Results Outer LV Estimates: 2 nd McDonald’s idea PLS estimate of LV: - Mode A - LV inner estimate = theoretical LV - LV inner estimate computation is useless. GFI =.903
30 Model estimation by SEM-ULS : Inner model and correlations Glucose 1 2 3 Judge 2, Judge 3, Judge 96 Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness (P =.35).64 (P =.05) >0 R 2 = (P =.01) Non significant variables in red. Constraint weights in blue.
32 Comparison between the PLS and SEM-ULS scores
33 Path analysis on scores with AMOS Bootstrap validation
34 Direct, indirect and total effects
35 When mode A is chosen, outer LV estimates using Covariance-based SEM (ULS or ML) or Component based SEM (PLS) are always very close. It is possible to mimic PLS with a covariance-based SEM software (McDonald,1996, Tenenhaus, 2001). Covariance-based SEM authorizes to implement constraints on the model parameters. This is impossible with PLS. Conclusion 1: SEM-ULS > PLS
36 When SEM-ULS does not converge or does not give an admissible solution, PLS is an attractive alternative. PLS offers many optimization criterions for the LV search (but rigorous proofs are still to be found). PLS still works when the number of MV is very high and the number of cases very small (for example 38 MV and 6 cases). PLS allows to use formative LV in a much easier way than SEM-ULS. Conclusion 2: PLS > SEM-ULS
37 Second particular case : Multi-block data analysis
Sensory analysis of 21 Loire Red Wines (J. Pagès) X 1 = Smell at rest, X 2 = View, X 3 = Smell after shaking, X 4 = Tasting X1X1 X2X2 X3X3 X4X4 3 Appellations4 Soils Illustrative variable 4 blocks of variables
PCA of each block: Correlation loadings
PCA of each block with AMOS: Correlation loadings GFI =.301
41 Multi-block data analysis = Confirmatory Factor Analysis VIEW SMELL AFTER SHAKING SMELL AT REST SMELL AT REST TASTING GFI =.849
42 First dimension Using MV with significant loadings
43 First global score GFI =.973 2nd order CFA
44 Validation of the first dimension Correlations Rest1 View Shaking1 Tasting1 Score1 Rest1ViewShaking1Tasting1
45 Second dimension
46 2 nd global score GFI =.905
47 Validation of the second dimension Correlations Rest2 Shaking2 Tasting2 Score2 Rest2Shaking2Tasting2
48 Mapping of the correlations with the global scores Score 1 related with quality Score 2 unrelated with quality
49 Correlation with global quality New result: Not obtained with other multi-block data analysis methods, nor with factor analysis of the whole data set.
50 Wine visualization in the global score space Wines marked by Appellation
51 Wine visualization in the global score space Wines marked by Soil
DAM = Dampierre-sur-Loire
A soft, warm, blackberry nose. A good core of fruit on the palate with quite well worked tannin and acidity on the finish; Good length and a lot of potential. DECANTER (mai 1997) (DECANTER AWARD ***** : Outstanding quality, a virtually perfect example) Cuvée Lisagathe 1995
Final conclusion « All the proofs of a pudding are in the eating, not in the cooking ». William Camden (1623)