Download presentation

Presentation is loading. Please wait.

Published byShon Carroll Modified about 1 year ago

1
1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”

2
2 Orange juice example (J. Pagès) X 1 = Physico-chemical, X 2 = Sensorial, X = [X 1, X 2 ], Y = Hedonic

3
3 Structural Equation Modeling The PLS approach of Herman WOLD Study of a system of linear relationships between latent variables. Each latent variable is described by a set of manifest variables, or summarizes them. Variables can be numerical, ordinal or nominal (no need for normality assumptions). The number of observations can be small compare to the number of variables.

4
4 Orange juice example on a homogenous group of judges Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Judge 2 Judge 3 Judge 96 Physico-chemical Sensorial Manifest variable Endogenous latent variable Hedonic Exogenous latent variable Measurement modelStructural model w 11 w 12 w 19 21 11 22 w 21 w 22 w 27 w 32 w 33 w 396

5
5 A SEM tree Chatelin-Esposito Vinzi Fahmy-Jäger-Tenenhaus XLSTAT-PLSPM (2007) W. Chin PLS-Graph Herman Wold NIPALS (1966) PLS approach (1975) J.-B. Lohmöller LVPLS 1.8 (1984) SEM Component-based SEM (Score computation) Covariance-based SEM (CSA) (Model estimation/validation) H. Hwang Y. Takane GSCA (2004) H. Hwang VisualGSCA 1.0 (2007) For good blocks (High Cronbach ): - Score = 1st PC - Score = MV’s For good blocks, all methods give almost the same results. AMOS 6.0, 2007 Score computed for each block using MV loadings Path analysis on the structural model defined on the scores

6
6 When all blocks are good, all the methods give practically the same results: M. Tenenhaus : Component-based SEM Total Quality Management, 2008 For all data, PLS and SEM yield to highly correlated LV scores: M. Tenenhaus : SEM for small samples HEC Working paper, 2008. Results Data structures are stronger than statistical methods.

7
7 PLS algorithm (Mode A, Centroid scheme) Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Juge 2 Juge 3 Juge 96 21 11 22 w 11 w 12 w 19 w 21 w 22 w 27 w 32 w 33 w 396 Y 1 =X 1 w 1 (outer estimate) Y 2 =X 2 w 2 Y 3 =X 3 w 3 Z 1 =Y 2 +Y 3 (inner estimate) Z 2 =Y 1 +Y 3 Z 3 =Y 1 +Y 2 w 11 = Cor(glucose,Z 1 ) w 12 = Cor(fructose,Z 1 ) w 19 = Cor(vitamin C,Z 1 ) w 21 = Cor(smell int.,Z 2 ) w 22 = Cor(odor typ.,Z 2 ) w 27 = Cor(Sweetness,Z 2 ) w 32 = Cor(judge2,Z 3 ) w 33 = Cor(judge3,Z 3 ) w 3,96 = Cor(judge96,Z 3 ) Iterate until convergence.

8
8 SPECIAL CASES OF PLS PATH MODELLING Principal component analysis Multiple factor analysis Canonical correlation analysis Redundancy analysis PLS regression Generalized canonical correlation analysis (Horst) Generalized canonical correlation analysis(Carroll) Analyse de la co-inertie multiple (Chessel & Hanafi) etc.…

9
9 Use of XLSTAT-PLSPM

10
10 Outer weight w Non significant variables are in red

11
11 Outer weight w

12
12 Correlation MV-LV

13
13 Correlation MV-LV

14
14 Use of XLSTAT-PLSPM Latent variables =========================================================== Physico-chimical Sensorial Hedonic ----------------------------------------------------------- Fruivita refr. 0.917 0.964 1.253 Tropicana refr. 0.630 1.378 0.946 Tropicana r.t. 1.120 0.462 0.742 ----------------------------------------------------------- Pampryl refr. -0.176 -0.570 -0.747 Joker r.t. -1.680 -0.852 -0.991 Pampryl r.t. -0.810 -1.381 -1.203 ===========================================================

15
15 Use of XLSTAT-PLSPM

16
16 Model estimation by PLS : Inner model and correlations Glucose 1 2 3 Judge 2 Judge 3 Judge 96 Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness -.89.93.1.95.94 -. 97 -. 98. 98.41 -. 19.71 -.64 -.93 -.95.97.306 (t = 1.522).713 (t = 3.546) >0 R 2 = 0.96.820 (t = 2.864) Non significant variables are in red

17
17 Estimation of the inner model by PLS regression R 2 = 0.946 The correlation between the physico-chemical and the sensorial variables can be taken into account by using PLS regression: 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 Physico-chemical sensorial CoeffCS[1](hédonic) Validation of PLS regression by Jack-knife

18
18 Use of the PLS option of XLSTAT-PLSPM Physico-chemical has no direct effect on Hedonic, but a strong indirect effect.

19
19 Direct, indirect and total effects

20
20 Covariance-based Structural Equation Modeling Latent variables : Structural model (inner model) : Ici :

21
21 Structural Equation Modeling Measurement model (outer model) : VM VL VM Endogenous Exogenous

22
22 Structural Equation Modeling MV covariance matrix : Outer model Inner model Cov. for exo. LV Variance for structural residuals Variance for measurement residuals

23
23 Covariance-based SEM ULS algorithm (Unweighted Least Squares) : S = Observed covariance matrix for MV’s Goodness-of-fit Index (Jöreskog & Sorbum): Generalization of PCA

24
24 Use of AMOS 6.0 Method = ULS This is a computational trick: Residual variances are passed to errors and can always be computed afterwards. First Roderick McDonald’s idea (1996) Measurement residual variances are canceled:

25
25 Covariance-based SEM ULS algorithm with the McDonald’s constraints: S = Observed covariance matrix for MV Goodness-of-fit Index (Jöreskog & Sorbum):

26
26 Use of AMOS 6.0 - Method = ULS - Measurement residual variances = 0

27
27 Results Outer LV Estimates: 2 nd McDonald’s idea PLS estimate of LV: - Mode A - LV inner estimate = theoretical LV - LV inner estimate computation is useless. GFI =.903

28
28

29
29

30
30 Model estimation by SEM-ULS : Inner model and correlations Glucose 1 2 3 Judge 2, Judge 3, Judge 96 Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness -.77 -.76.89.22 1 1.00 -. 87 -. 88. 94.26 -. 08.66 -.56 -.94 -.97 1.22 (P =.35).64 (P =.05) >0 R 2 = 0.96.79 (P =.01) Non significant variables in red. Constraint weights in blue.

31
31 Use of SEM-ULS Latent variable estimates (Scores) Latent variables =========================================================== Physico-chemical Sensorial Hedonic ----------------------------------------------------------- Fruivita refr. 0.915 0.866 1.141 Tropicana refr. 0.526 1.270 0.868 Tropicana r.t. 0.832 0.422 0.672 ----------------------------------------------------------- Pampryl refr. -0.158 -0.526 -0.686 Joker r.t. -1.740 -0.774 -0.867 Pampryl r.t. -0.375 -1.258 -1.127 ===========================================================

32
32 Comparison between the PLS and SEM-ULS scores

33
33 Path analysis on scores with AMOS Bootstrap validation

34
34 Direct, indirect and total effects

35
35 When mode A is chosen, outer LV estimates using Covariance-based SEM (ULS or ML) or Component based SEM (PLS) are always very close. It is possible to mimic PLS with a covariance-based SEM software (McDonald,1996, Tenenhaus, 2001). Covariance-based SEM authorizes to implement constraints on the model parameters. This is impossible with PLS. Conclusion 1: SEM-ULS > PLS

36
36 When SEM-ULS does not converge or does not give an admissible solution, PLS is an attractive alternative. PLS offers many optimization criterions for the LV search (but rigorous proofs are still to be found). PLS still works when the number of MV is very high and the number of cases very small (for example 38 MV and 6 cases). PLS allows to use formative LV in a much easier way than SEM-ULS. Conclusion 2: PLS > SEM-ULS

37
37 Second particular case : Multi-block data analysis

38
Sensory analysis of 21 Loire Red Wines (J. Pagès) X 1 = Smell at rest, X 2 = View, X 3 = Smell after shaking, X 4 = Tasting X1X1 X2X2 X3X3 X4X4 3 Appellations4 Soils Illustrative variable 4 blocks of variables

39
PCA of each block: Correlation loadings

40
PCA of each block with AMOS: Correlation loadings GFI =.301

41
41 Multi-block data analysis = Confirmatory Factor Analysis VIEW SMELL AFTER SHAKING SMELL AT REST SMELL AT REST TASTING GFI =.849

42
42 First dimension Using MV with significant loadings

43
43 First global score GFI =.973 2nd order CFA

44
44 Validation of the first dimension Correlations 1.6211.865.7621.682.813.8951.813.920.942.944 Rest1 View Shaking1 Tasting1 Score1 Rest1ViewShaking1Tasting1

45
45 Second dimension

46
46 2 nd global score GFI =.905

47
47 Validation of the second dimension Correlations 1.7891.782.8031.944.904.928 Rest2 Shaking2 Tasting2 Score2 Rest2Shaking2Tasting2

48
48 Mapping of the correlations with the global scores Score 1 related with quality Score 2 unrelated with quality

49
49 Correlation with global quality New result: Not obtained with other multi-block data analysis methods, nor with factor analysis of the whole data set.

50
50 Wine visualization in the global score space Wines marked by Appellation

51
51 Wine visualization in the global score space Wines marked by Soil

52
DAM = Dampierre-sur-Loire

53
A soft, warm, blackberry nose. A good core of fruit on the palate with quite well worked tannin and acidity on the finish; Good length and a lot of potential. DECANTER (mai 1997) (DECANTER AWARD ***** : Outstanding quality, a virtually perfect example) Cuvée Lisagathe 1995

54
Final conclusion « All the proofs of a pudding are in the eating, not in the cooking ». William Camden (1623)

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google