Download presentation

Presentation is loading. Please wait.

Published bySierra Verrier Modified about 1 year ago

1
Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry

2
Why Explorative Data Analysis ? Classical Science ? [ System Paradigm change in natural sciences Hypothesis driven

3
Why Explorative Data Analysis? Classical Science Science with advanced technologies ? [ System Explorative Analysis of data ? System Paradigm change in natural sciences Hypothesis drivenData driven

4
Explorative Data Analysis Advanced technology: High throughput (high quality) analysis NMR, HPLC, GC, MS/MS, immune assays, Hybrids Nano/Sensor technology Genomics (gene expression profiling) Proteomics, Metabolomics Fingerprinting Profiling in drug design Overwhelming amount of data

5
Explorative Data Analysis Visualization (principal component analysis, projections) Unsupervised Pattern recognition (clustering) Supervised Pattern recognition (classification) Quantitative analysis (correlations, predictions)

6
Principal Component Analysis: an Example 150 samples of Italian wines from the same region 3 different cultivars Is it possible to characterise cultivars ? Which variables are relevant for which cultivars ?

7
p (13 properties) (variables) (150 wine samples) n (objects) X ij Flavanoid concentration of sample 75 X x ij xjxj xixi Flavanoid concentration Data Matrix

8
Principal Component Analysis Barplot of 1 wine sample

9
Principal Component Analysis Line plot of 1 wine sample Barplot of 1 wine sample

10
Principal Component Analysis Line plot of 1 wine sample Barplot of 1 wine sample

11
Principal Component Analysis Line plot of 1 wine sampleBarplot of 1 wine sample

12
Data Matrix Representation Data Matrix Representation xjxj xixi X x ij 1p n xjxj xixi # samples # properties

13
xjxj xixi X x ij p (13)- dimensional Variable space 150 samples j xixi Sample 75 S p (13) Data Matrix Representation Data Matrix Representation

14
xjxj xixi X x ij i p (13)- dimensional Variable space 13 variables150 samples n (150)-dimensional Object space j xixi Sample 75 Property 7 (flavanoids) S p (13) S n (150) Data Matrix Representation Data Matrix Representation

15
Explorative Data Analysis

16
r (2)-dim. space of variables Principal Component Analysis Principal Component Analysis PCA: visualization : projection in 2 dimensions 1 p (13)- dim. space of variables S p (13) j xixi 1 i n (150)-dim. space of objects S n (150) 13 variables150 samples lv 2 lv 1 S2S2 13 variables x x xx xx x x x x x lv 1 lv 2 S2S2 150 samples r (2)-dim. space of objects

17
Principal Component Analysis x3 x1 x2 3 variables : S 3 12 samples

18
Principal Component Analysis x3 x1 x2 3 variables : S 3 12 samples

19
Principal Component Analysis S3S3 12 samples PC 1 PC 1 = l 11 x1 + l 12 x2 + l 13 x3 x3 x1 x2

20
x3 x1 x2 PC 1 PC 1 = l 11 x1 + l 12 x2 + l 13 x3 Criterion: Maximum variance of projections (x) x x x x x x x x x x x S3S3 12 samples Principal Component Analysis

21
PC 1 = l 11 x1 + l 12 x2 + l 13 x3 PC 2 = l 21 x1 + l 22 x2 + l 23 x3 Criterion: Maximum variance of projections (x) PC1 PC2 x2 x3 x1 x2 PC 1 x x x x x x x x x x x S3S3 12 samples PC 2 Principal Component Analysis

22
Principal Components Space PC 1 PC 2 S2S2 12 samples

23
r (2)-dim. space pc 2 pc 1 S2S2 1 p (13)- dim. space of variables S p (13) j xixi samples Principal Component Analysis Score plot

24
r (2)-dim. space pc 2 pc 1 S2S2 1 p (13)- dim. space of variables S p (13) j xixi samples Principal Component Analysis Score plot PC1 (38%) PC2 (20%) Wine data: score plot

25
pc 2 pc 1 S2S i n (150)- dim. Space of objects S n (150) 13 variables x x xx xx x x x x x Loading plot Principal Component Analysis

26
pc 2 pc 1 S2S i n (150)- dim. Space of objects S n (150) 13 variables x x xx xx x x x x x Loading plot Principal Component Analysis Wine data: loading plot PC1 (38%) PC2 (20%)

27
Singular Value Decomposition (SVD) X np = U nr D rr V T rp Left singular vectors PC scores Right singular vectors PC loadings p n r r r n p r X U VTVT = U T U =V T V =I

28
S2S2 S p (13) i S n (150) n 1 1 j xixi p S2S2 Loading plot 13 variables pc 1 pc 2 pc 1 Score plot 150 samples pc 2 x x xx xx x x x x x Principal Component Analysis : Biplot pc 2 pc 1 x xx x x x x x x x x 150 samples + 13 variables BIPLOT

29
Principal Component Analysis: an Example PC1 (38%) PC2 (20%)

30
Principal Component Analysis: Some Issues How many PC’s ? Scaling Outliers

31
How many PC’s ? No of PC’s Cumulative % of varianceScree plot 100% No of PC’s Log variance

32
How many PC’s ? Wine data

33
How many PC’s ?

34
PCA: Scaling For better interpretation; may obscure results raw data; Mean-centering: (column wise, row wise, double) Auto-scaling (column wise, row wise) …..

35
Wine data mean-centered Wine data autoscaled PCA: Scaling

36
Wine data raw Wine data mean-centered PC1 (99.79%) PC2 (0.20%) PC1 (99.79%) PC2 (0.20%) PCA: Scaling

37
x3 x1 x2 3 variables : S 3 12 samples PC1 PCA: Outliers

38
x3 x1 x2 3 variables : S outlier PC1 PCA: Outliers

39
x3 x1 x2 3 variables : S 3 PC1 Leverage effect PCA: Outliers

40
Gene expression values Principal Component Analysis: a Recent Research Example X x ij 1 4 Treatments genes xjxj Organon Department of Cell Biology

41
PCA Interaction Gene Treatment

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google