Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.

Similar presentations


Presentation on theme: "Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry."— Presentation transcript:

1 Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry

2 Why Explorative Data Analysis ? Classical Science ? [ System Paradigm change in natural sciences Hypothesis driven

3 Why Explorative Data Analysis? Classical Science Science with advanced technologies ? [ System Explorative Analysis of data ? System Paradigm change in natural sciences Hypothesis drivenData driven

4 Explorative Data Analysis Advanced technology: High throughput (high quality) analysis NMR, HPLC, GC, MS/MS, immune assays, Hybrids Nano/Sensor technology Genomics (gene expression profiling) Proteomics, Metabolomics Fingerprinting Profiling in drug design Overwhelming amount of data

5 Explorative Data Analysis Visualization (principal component analysis, projections) Unsupervised Pattern recognition (clustering) Supervised Pattern recognition (classification) Quantitative analysis (correlations, predictions)

6 Principal Component Analysis: an Example 150 samples of Italian wines from the same region 3 different cultivars Is it possible to characterise cultivars ? Which variables are relevant for which cultivars ?

7 p (13 properties) (variables) (150 wine samples) n (objects) X ij Flavanoid concentration of sample 75 X x ij 1 7 75 xjxj xixi Flavanoid concentration Data Matrix

8 Principal Component Analysis Barplot of 1 wine sample

9 Principal Component Analysis Line plot of 1 wine sample Barplot of 1 wine sample

10 Principal Component Analysis Line plot of 1 wine sample Barplot of 1 wine sample

11 Principal Component Analysis Line plot of 1 wine sampleBarplot of 1 wine sample

12 Data Matrix Representation Data Matrix Representation xjxj xixi X x ij 1p n xjxj xixi # samples # properties

13 xjxj xixi X x ij 113 150 13 1 p (13)- dimensional Variable space 150 samples j xixi Sample 75 S p (13)    Data Matrix Representation Data Matrix Representation

14 xjxj xixi X x ij 113 150 13 1 150 1 i p (13)- dimensional Variable space 13 variables150 samples n (150)-dimensional Object space j xixi Sample 75 Property 7 (flavanoids) S p (13) S n (150)       Data Matrix Representation Data Matrix Representation

15 Explorative Data Analysis

16 r (2)-dim. space of variables Principal Component Analysis Principal Component Analysis PCA: visualization : projection in 2 dimensions 1 p (13)- dim. space of variables S p (13) j xixi 1 i n (150)-dim. space of objects S n (150) 13 variables150 samples lv 2 lv 1 S2S2 13 variables x x xx xx x x x x x lv 1 lv 2 S2S2 150 samples r (2)-dim. space of objects 13 150

17 Principal Component Analysis x3 x1 x2 3 variables : S 3 12 samples

18 Principal Component Analysis x3 x1 x2 3 variables : S 3 12 samples

19 Principal Component Analysis S3S3 12 samples PC 1 PC 1 = l 11 x1 + l 12 x2 + l 13 x3 x3 x1 x2

20 x3 x1 x2 PC 1 PC 1 = l 11 x1 + l 12 x2 + l 13 x3 Criterion: Maximum variance of projections (x) x x x x x x x x x x x S3S3 12 samples Principal Component Analysis

21 PC 1 = l 11 x1 + l 12 x2 + l 13 x3 PC 2 = l 21 x1 + l 22 x2 + l 23 x3 Criterion: Maximum variance of projections (x) PC1 PC2 x2 x3 x1 x2 PC 1 x x x x x x x x x x x S3S3 12 samples PC 2 Principal Component Analysis

22 Principal Components Space PC 1 PC 2 S2S2 12 samples

23 r (2)-dim. space pc 2 pc 1 S2S2 1 p (13)- dim. space of variables S p (13) j xixi 13 150 samples Principal Component Analysis Score plot

24 r (2)-dim. space pc 2 pc 1 S2S2 1 p (13)- dim. space of variables S p (13) j xixi 13 150 samples Principal Component Analysis Score plot PC1 (38%) PC2 (20%) Wine data: score plot

25 pc 2 pc 1 S2S2 150 1 i n (150)- dim. Space of objects S n (150) 13 variables x x xx xx x x x x x Loading plot Principal Component Analysis

26 pc 2 pc 1 S2S2 150 1 i n (150)- dim. Space of objects S n (150) 13 variables x x xx xx x x x x x Loading plot Principal Component Analysis Wine data: loading plot PC1 (38%) PC2 (20%)

27 Singular Value Decomposition (SVD) X np = U nr D rr V T rp Left singular vectors PC scores Right singular vectors PC loadings p n r r r n p r X U VTVT = U T U =V T V =I

28 S2S2 S p (13) i S n (150) n 1 1 j xixi p S2S2 Loading plot 13 variables pc 1 pc 2 pc 1 Score plot 150 samples pc 2 x x xx xx x x x x x Principal Component Analysis : Biplot pc 2 pc 1 x xx x x x x x x x x 150 samples + 13 variables BIPLOT

29 Principal Component Analysis: an Example PC1 (38%) PC2 (20%)

30 Principal Component Analysis: Some Issues How many PC’s ? Scaling Outliers

31 How many PC’s ? No of PC’s Cumulative % of varianceScree plot 100% No of PC’s Log variance  231156423564

32 How many PC’s ? Wine data

33 How many PC’s ?

34 PCA: Scaling For better interpretation; may obscure results raw data; Mean-centering: (column wise, row wise, double) Auto-scaling (column wise, row wise) …..

35 Wine data mean-centered Wine data autoscaled PCA: Scaling

36 Wine data raw Wine data mean-centered PC1 (99.79%) PC2 (0.20%) PC1 (99.79%) PC2 (0.20%) PCA: Scaling

37 x3 x1 x2 3 variables : S 3 12 samples PC1 PCA: Outliers

38 x3 x1 x2 3 variables : S 3 12 + 1 outlier PC1 PCA: Outliers

39 x3 x1 x2 3 variables : S 3 PC1 Leverage effect PCA: Outliers

40 Gene expression values Principal Component Analysis: a Recent Research Example X x ij 1 4 Treatments genes 50.000 xjxj Organon Department of Cell Biology

41 PCA Interaction Gene Treatment


Download ppt "Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry."

Similar presentations


Ads by Google