Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”

Similar presentations


Presentation on theme: "1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”"— Presentation transcript:

1 1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”

2 2 Orange juice example (J. Pagès) X 1 = Physico-chemical, X 2 = Sensorial, X = [X 1, X 2 ], Y = Hedonic

3 3 Structural Equation Modeling The PLS approach of Herman WOLD Study of a system of linear relationships between latent variables. Each latent variable is described by a set of manifest variables, or summarizes them. Variables can be numerical, ordinal or nominal (no need for normality assumptions). The number of observations can be small compare to the number of variables.

4 4 Orange juice example on a homogenous group of judges Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Judge 2 Judge 3 Judge 96 Physico-chemical Sensorial Manifest variable Endogenous latent variable Hedonic Exogenous latent variable Measurement modelStructural model w 11 w 12 w 19  21 11  22 w 21 w 22 w 27 w 32 w 33 w 396

5 5 A SEM tree Chatelin-Esposito Vinzi Fahmy-Jäger-Tenenhaus XLSTAT-PLSPM (2007) W. Chin PLS-Graph Herman Wold NIPALS (1966) PLS approach (1975) J.-B. Lohmöller LVPLS 1.8 (1984) SEM Component-based SEM (Score computation) Covariance-based SEM (CSA) (Model estimation/validation) H. Hwang Y. Takane GSCA (2004) H. Hwang VisualGSCA 1.0 (2007) For good blocks (High Cronbach  ): - Score = 1st PC - Score =  MV’s For good blocks, all methods give almost the same results. AMOS 6.0, 2007 Score computed for each block using MV loadings Path analysis on the structural model defined on the scores

6 6 When all blocks are good, all the methods give practically the same results: M. Tenenhaus : Component-based SEM Total Quality Management, 2008 For all data, PLS and SEM yield to highly correlated LV scores: M. Tenenhaus : SEM for small samples HEC Working paper, Results Data structures are stronger than statistical methods.

7 7 PLS algorithm (Mode A, Centroid scheme) Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Juge 2 Juge 3 Juge 96  21 11  22 w 11 w 12 w 19 w 21 w 22 w 27 w 32 w 33 w 396 Y 1 =X 1 w 1 (outer estimate) Y 2 =X 2 w 2 Y 3 =X 3 w 3 Z 1 =Y 2 +Y 3 (inner estimate) Z 2 =Y 1 +Y 3 Z 3 =Y 1 +Y 2 w 11 = Cor(glucose,Z 1 ) w 12 = Cor(fructose,Z 1 ) w 19 = Cor(vitamin C,Z 1 ) w 21 = Cor(smell int.,Z 2 ) w 22 = Cor(odor typ.,Z 2 ) w 27 = Cor(Sweetness,Z 2 ) w 32 = Cor(judge2,Z 3 ) w 33 = Cor(judge3,Z 3 ) w 3,96 = Cor(judge96,Z 3 ) Iterate until convergence.

8 8 SPECIAL CASES OF PLS PATH MODELLING Principal component analysis Multiple factor analysis Canonical correlation analysis Redundancy analysis PLS regression Generalized canonical correlation analysis (Horst) Generalized canonical correlation analysis(Carroll) Analyse de la co-inertie multiple (Chessel & Hanafi) etc.…

9 9 Use of XLSTAT-PLSPM

10 10 Outer weight w Non significant variables are in red

11 11 Outer weight w

12 12 Correlation MV-LV

13 13 Correlation MV-LV

14 14 Use of XLSTAT-PLSPM Latent variables =========================================================== Physico-chimical Sensorial Hedonic Fruivita refr Tropicana refr Tropicana r.t Pampryl refr Joker r.t Pampryl r.t ===========================================================

15 15 Use of XLSTAT-PLSPM

16 16 Model estimation by PLS : Inner model and correlations Glucose  1  2  3 Judge 2 Judge 3 Judge 96  Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness (t = 1.522).713 (t = 3.546) >0 R 2 = (t = 2.864) Non significant variables are in red

17 17 Estimation of the inner model by PLS regression R 2 = The correlation between the physico-chemical and the sensorial variables can be taken into account by using PLS regression: Physico-chemical sensorial CoeffCS[1](hédonic) Validation of PLS regression by Jack-knife

18 18 Use of the PLS option of XLSTAT-PLSPM Physico-chemical has no direct effect on Hedonic, but a strong indirect effect.

19 19 Direct, indirect and total effects

20 20 Covariance-based Structural Equation Modeling Latent variables : Structural model (inner model) : Ici :

21 21 Structural Equation Modeling Measurement model (outer model) : VM VL VM Endogenous Exogenous

22 22 Structural Equation Modeling MV covariance matrix : Outer model Inner model Cov. for exo. LV Variance for structural residuals Variance for measurement residuals

23 23 Covariance-based SEM ULS algorithm (Unweighted Least Squares) : S = Observed covariance matrix for MV’s Goodness-of-fit Index (Jöreskog & Sorbum): Generalization of PCA

24 24 Use of AMOS 6.0 Method = ULS This is a computational trick: Residual variances are passed to errors and can always be computed afterwards. First Roderick McDonald’s idea (1996) Measurement residual variances are canceled:

25 25 Covariance-based SEM ULS algorithm with the McDonald’s constraints: S = Observed covariance matrix for MV Goodness-of-fit Index (Jöreskog & Sorbum):

26 26 Use of AMOS Method = ULS - Measurement residual variances = 0

27 27 Results Outer LV Estimates: 2 nd McDonald’s idea PLS estimate of LV: - Mode A - LV inner estimate = theoretical LV - LV inner estimate computation is useless. GFI =.903

28 28

29 29

30 30 Model estimation by SEM-ULS : Inner model and correlations Glucose  1  2  3 Judge 2, Judge 3, Judge 96  Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness (P =.35).64 (P =.05) >0 R 2 = (P =.01) Non significant variables in red. Constraint weights in blue.

31 31 Use of SEM-ULS Latent variable estimates (Scores) Latent variables =========================================================== Physico-chemical Sensorial Hedonic Fruivita refr Tropicana refr Tropicana r.t Pampryl refr Joker r.t Pampryl r.t ===========================================================

32 32 Comparison between the PLS and SEM-ULS scores

33 33 Path analysis on scores with AMOS Bootstrap validation

34 34 Direct, indirect and total effects

35 35 When mode A is chosen, outer LV estimates using Covariance-based SEM (ULS or ML) or Component based SEM (PLS) are always very close. It is possible to mimic PLS with a covariance-based SEM software (McDonald,1996, Tenenhaus, 2001). Covariance-based SEM authorizes to implement constraints on the model parameters. This is impossible with PLS. Conclusion 1: SEM-ULS > PLS

36 36 When SEM-ULS does not converge or does not give an admissible solution, PLS is an attractive alternative. PLS offers many optimization criterions for the LV search (but rigorous proofs are still to be found). PLS still works when the number of MV is very high and the number of cases very small (for example 38 MV and 6 cases). PLS allows to use formative LV in a much easier way than SEM-ULS. Conclusion 2: PLS > SEM-ULS

37 37 Second particular case : Multi-block data analysis

38 Sensory analysis of 21 Loire Red Wines (J. Pagès) X 1 = Smell at rest, X 2 = View, X 3 = Smell after shaking, X 4 = Tasting X1X1 X2X2 X3X3 X4X4 3 Appellations4 Soils Illustrative variable 4 blocks of variables

39 PCA of each block: Correlation loadings

40 PCA of each block with AMOS: Correlation loadings GFI =.301

41 41 Multi-block data analysis = Confirmatory Factor Analysis VIEW SMELL AFTER SHAKING SMELL AT REST SMELL AT REST TASTING GFI =.849

42 42 First dimension Using MV with significant loadings

43 43 First global score GFI =.973 2nd order CFA

44 44 Validation of the first dimension Correlations Rest1 View Shaking1 Tasting1 Score1 Rest1ViewShaking1Tasting1

45 45 Second dimension

46 46 2 nd global score GFI =.905

47 47 Validation of the second dimension Correlations Rest2 Shaking2 Tasting2 Score2 Rest2Shaking2Tasting2

48 48 Mapping of the correlations with the global scores Score 1 related with quality Score 2 unrelated with quality

49 49 Correlation with global quality New result: Not obtained with other multi-block data analysis methods, nor with factor analysis of the whole data set.

50 50 Wine visualization in the global score space Wines marked by Appellation

51 51 Wine visualization in the global score space Wines marked by Soil

52 DAM = Dampierre-sur-Loire

53 A soft, warm, blackberry nose. A good core of fruit on the palate with quite well worked tannin and acidity on the finish; Good length and a lot of potential. DECANTER (mai 1997) (DECANTER AWARD ***** : Outstanding quality, a virtually perfect example) Cuvée Lisagathe 1995

54 Final conclusion « All the proofs of a pudding are in the eating, not in the cooking ». William Camden (1623)


Download ppt "1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”"

Similar presentations


Ads by Google