Download presentation

Presentation is loading. Please wait.

1
**Principal Component Analysis**

Olympic Heptathlon Ch. 13

2
Principal components The Principal components method summarizes data by finding the major correlations in linear combinations of the obervations. Little information lost in process, usually Major application: Correlated variables are transformed into uncorrelated variables

3
**Olympic Heptathlon Data**

7 events: Hurdles, Highjump, Shot, run200m, longjump, javelin, run800m The scores for these events are all on different scales A relatively high number could be good or bad depending on the event 25 Olympic competitors

4
**R Commands Reorder the scores so that a high number means a good score**

heptathlon$hurdles <- max(heptathlon$hurdles) – heptathlon$hurdles Hurdles, Run200m, Run800m requires reordering

5
**Basic plot to look at data**

R Commands score <- which(colnames(heptathlon) == “score”) “which” searches the column names of the heptathlon data.frame for “score” and stores it in a variable “score” above plot(heptathlon[, -score]) Scatterplot matrix, excluding the score column

7
**Interpretation The data looks correlated except for the javelin event.**

The book speculates the javelin is a ‘technical’ event, whereas the others are all ‘power’ events

8
**To get numerical correlation values:**

round(cor(heptathlon[, -score]), 2) The cor(data.frame) function finds the actual correlation values The cor(data.frame) function is in agreement with this interpretation hurdles highjump shot run200m longjump javelin run800m hurdles highjump shot run200m longjump javelin run800m

9
**Running a Principal Component analysis**

heptathlon_pca <- prcomp(heptathlon[, -score], scale = TRUE) print(heptathlon_pca) Standard deviations: [1] Rotation: PC PC PC PC PC PC PC7 hurdles highjump shot run200m longjump javelin run800m

10
**a1 <- heptathlon_pca$rotation[, 1]**

This shows the coefficients for the first principal component y1 Y1 is the linear combination of observations that maximizes the sample variance as a portion of the overall sample variance. Y2 is the linear combination that maximizes out of the remaining portion of sample variance, with the added constraint of being uncorrelated with Y1

11
**> a1<-heptathlon_pca$rotation[,1] > a1 **

hurdles highjump shot run200m longjump javelin run800m

12
**Interpretation 200m and long jump is the most important factor**

Javelin result is less important

13
**Data Analysis using the first principal component**

center <- heptathlon_pca$center This is the center or mean of the variables, it can also be a flag in the prcomp() function that sets the center at 0. scale <- heptathlon_pca$scale This is also a flag in the prcomp() function that can scale the variables to fit between 0 and 1, as it is, its just storing the current scale. hm <- as.matrix(heptathlon[, -score]) This coerces the data.frame heptathlon into a matrix and excludes score drop(scale(hm, center = center, scale = scale) %*% heptathlon_pca$rotation[, 1]) rescales the raw heptathlon data to the Principal component scale performs matrix multiplication on the coefficients of the linear combination for the first principal component (Y1) Drop() prints the resulting matrix

14
**Joyner-Kersee (USA) John (GDR) Behmer (GDR) Sablovskaite (URS)**

Choubenkova (URS) Schulz (GDR) Fleming (AUS) Greiner (USA) Lajbnerova (CZE) Bouraga (URS) Wijnsma (HOL) Dimitrova (BUL) Scheider (SWI) Braun (FRG) Ruotsalainen (FIN) Yuping (CHN) Hagger (GB) Brown (USA) Mulliner (GB) Hautenauve (BEL) Kytola (FIN) Geremias (BRA) Hui-Ing (TAI) Jeong-Mi (KOR) Launa (PNG)

15
**An easier way predict(heptathlon_pca)[, 1]**

Accomplishes the same thing as the previous set of commands

16
**Principal Components Proportion of Sample Variances**

The first component contributes the vast majority of total sample variance Just looking at the first two (uncorrelated!) principal components will account for most of the overall sample variance (~81%) plot(heptathlon_pca)

17
**First two Principal Components biplot(heptathlon_pca,col=c("gray","black"))**

18
Interpretation The Olympians with the highest score seem to be at the bottom left of the graph, while The javelin event seems to give the scores a more fine variation and award the competitors a slight edge.

19
**How well does it fit the Scoring?**

The correlation between Y1 and the scoring looks very strong. cor(heptathlon$score, heptathlon_pca$x[,1]) [1]

20
Homework! (Ch.13) Use the “meteo” data on page 225 and create scatterplots to check for correlation (don’t recode/reorder anything, and remember not to include columns in the analysis that don’t belong! Is there correlation? Don’t have R calculate the numerical values unless you really want to Run PCA using the long way or the shorter “predict” command (remember not to include the unneccesary column!) Create a biplot, but use colors other than gray and black! Create a scatterplot like on page 224 of the 1st principle component and the yield What is the numerical value of the correlation? Don’t forget to copy and paste your commands into word and print it out for me (and include the scatterplot)!

Similar presentations

OK

Correlations: Relationship, Strength, & Direction Scatterplots are used to plot correlational data – It displays the extent that two variables are related.

Correlations: Relationship, Strength, & Direction Scatterplots are used to plot correlational data – It displays the extent that two variables are related.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on teacher's day messages Ppt on power sharing in democracy those who rule Ppt on condition monitoring devices Ppt on eid Ppt on introduction to software project management Creative ppt on leadership Ppt on intellectual property rights and global business Oppt one people's public trust Ppt on diode as rectifier Ppt on construction maths for class 10th