Presentation is loading. Please wait.

Presentation is loading. Please wait.

10.02.08 1 WSC-6 Critical levels in projection Alexey Pomerantsev Semenov Institute of Chemical Physics, Moscow.

Similar presentations


Presentation on theme: "10.02.08 1 WSC-6 Critical levels in projection Alexey Pomerantsev Semenov Institute of Chemical Physics, Moscow."— Presentation transcript:

1 10.02.08 1 WSC-6 Critical levels in projection Alexey Pomerantsev Semenov Institute of Chemical Physics, Moscow

2 10.02.08 2 WSC-6 Projection approach

3 10.02.08 3 WSC-6 Scores & Orthogonal Distances OD: distance to the model SD: distance within the model

4 10.02.08 4 WSC-6 Where applied SIMCA Classification PLS/PCR Influence plot MSPC

5 10.02.08 5 WSC-6 Giants battle at ICS-L, April 2007 The ratios of residual variances of PCA are fairly well F-distributed. This is easy - the shape of the distribution of a ratio of two variances usually looks like an F. Svante Wold No, the residuals from PCA don't follow an F- distribution unless you fuss with the degrees of freedom, and there are better alternatives in any case. Barry Wise

6 10.02.08 6 WSC-6 Full PCA Decomposition K=rank(X) ≤ min (I, J) X=TP t  =T t T=diag( 1,.., K ) X I JK T I =× PtPt J K

7 10.02.08 7 WSC-6 Truncated PCA Decomposition A ≤ K I A TATA A PAPA EAEA + X I =× J J t I J

8 10.02.08 8 WSC-6 Score distance (SD), h i hihi Leverage = h i +1/I Mahalanobis = (h i ) ½

9 10.02.08 9 WSC-6 Orthogonal distance (OD), v i vivi Variance per sample=v i /J Q statistics = v i

10 10.02.08 10 WSC-6 Distribution of distances: the shape? =h/h 0 x= =v/v 0 x ~ χ 2 (N)/N N = DoF E(x) = 1 D(x) = 2/N

11 10.02.08 11 WSC-6 Example: Leon Rusinov data I=1440 A=6 N h =5 N v =1 SDOD

12 10.02.08 12 WSC-6 Distribution of distances: DoF? Method of MomentsInterquartile Approach x (1) ≤ x (2 ) ≤.... ≤ x (I-1) ≤ x (I) ¼ IQR ¼ = h/h 0 x= = v/v 0 x 1,...., x I ~ χ 2 (N)/N N = ?

13 10.02.08 13 WSC-6 Type I error  I=100  =0.01 1 point is out  =0.05 5 points are out  =0.1 11 points are out  =0.2 22 points are out  =0.4 43 points are out

14 10.02.08 14 WSC-6 SIM Data. MSPC task I=100 J=25 A=5  =0.05

15 10.02.08 15 WSC-6 SD & OD values

16 10.02.08 16 WSC-6 DoF Estimates Interquartile ApproachMethod of Moments N h = 5.7 N v =21.6 N h = 5.0 N v =20.0

17 10.02.08 17 WSC-6 Acceptance areas: conventional I=100  =0.05

18 10.02.08 18 WSC-6 Acceptance areas  =0.05: Sum of CHIs I=100  =0.05

19 10.02.08 19 WSC-6 Acceptance areas: Ratio of CHIs I=100  =0.05

20 10.02.08 20 WSC-6 Wilson-Hilferty approximation for Chi

21 10.02.08 21 WSC-6 Acceptance areas: Wilson-Hilferty I=100  =0.05

22 10.02.08 22 WSC-6 Modified Wilson-Hilferty approximation 1–γ=P 0 +P 1 +P 2 +P 3 = = Φ(r) – ¼exp(–½r 2 ) r=r(γ)

23 10.02.08 23 WSC-6 Acceptance areas: modified Wilson-Hilferty I=100  =0.05

24 10.02.08 24 WSC-6 Areas Validation: variation of 

25 10.02.08 25 WSC-6 BMT Data. SIMCA I=45 J=3501 A=2 N h =3 N v =2  =0.025

26 10.02.08 26 WSC-6 Extremes & Outliers in calibration set  is significance level for outliers  =1 – (1 –  ) 1/I extreme outlier Calibration set: I=45 γ  I = 0.025  45 = 1.25 I out =2

27 10.02.08 27 WSC-6 SIMCA Classification without G07-4 New set: I new =30 10 Genuine + 20 Fakes γ  I new = 0.025  10 = 0.25 I out =3

28 10.02.08 28 WSC-6 What’s up? This is absolutely wrong classification but Oxana will explain how fix it over.

29 10.02.08 29 WSC-6 GRAIN Data. Influence plots I=123 J=118 A=4  =0.01 N h =5.7 N v =3.0 N u =1.0 X Y

30 10.02.08 30 WSC-6 Orthogonal distance to Y

31 10.02.08 31 WSC-6 Back to WSC-4

32 10.02.05 32 WSC-4 Training set Model 1 Boundary subset l=19 Boundary samples (WSC-4)

33 10.02.08 33 WSC-6 Influence plots for X and Y YX Calibration Boundary (SIC)

34 10.02.08 34 WSC-6 Box or Egg? Box or Egg? I<30

35 10.02.08 35 WSC-6 Conclusion 1 The χ 2 -distribution can be used in the modeling of the score and orthogonal distances.

36 10.02.08 36 WSC-6 Conclusion 2 Any classification problem should be solved with respect to a given type I error. Five of such areas have been presented but only two are recommended. I>30 I<30

37 10.02.08 37 WSC-6 Conclusion 3 Estimation of DoF is a key challenge in the projection modeling. A data-driven estimator of DoF, rather than a theory-driven one should be used. The method of moments is effective, but sensitive to outliers. The IQR estimator is a robust but less effective alternative. More examples will be demonstrated in the subsequent presentation by Oxana.


Download ppt "10.02.08 1 WSC-6 Critical levels in projection Alexey Pomerantsev Semenov Institute of Chemical Physics, Moscow."

Similar presentations


Ads by Google