Download presentation

Presentation is loading. Please wait.

Published byLydia Leonards Modified about 1 year ago

1
Plotting Multivariate Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

2
Resources Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold. Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer

3
Edward Tufte’s Recommendations Show the data Induce the viewer to think about the substance of the data Avoid distorting what the data have to say Present many numbers in a small space Make large data sets coherent Encourage comparison Reveal the data at several levels of detail Serve a clear purpose Be closely integrated with the statistical and verbal descriptions of the data –Tufte, E R (2001), The Visual Display of Quantitative Information, Graphics Press.

4
Tufte’s Points Graphics reveal data. Graphics can be more precise and revealing than conventional statistics. Anscombe’s data –Anscombe, F J (1973) “Graphs in Statistical Analysis”, American Statistician, 27:17-21. All four data sets are described by the same linear model. IxIyIixIiyIIIxIIIyIVxIVy 108.04109.14107.4686.58 86.9588.1486.7785.76 137.58138.741312.7487.71 98.8198.7797.1188.84 118.33119.26117.8188.47 149.96148.10148.8487.04 67.2466.1366.0885.25 44.2643.1045.391912.5 1210.8 4 129.13128.1585.56 74.8277.2676.4287.91 55.6854.7455.7386.89

5
The Anscombe Graphics

6
Ways of Looking at Data Scatterplots –Demonstration “The convex hull of bivariate data” –Demonstration Chiplot –Demonstration Bivariate Boxplot –Demonstration

7
And More Multivariate Graphics Bivariate Densities –Demonstration Other Variables in a Scatterplot –Demonstration Scatterplot Matrix –Demonstration of pairs 3-D Plots –Demonstration Conditioning Plots –Demonstration

8
Demonstration Launch R Set the working directory to Statistics/RSPCMA/Data airpoll<-source("chap2airpoll.dat")$value Review exercises on pages 19-22

9
Convex Hull of Bivariate Data Scatterplots are often used during the calculation of the correlation coefficient of two variables. Used to detect outliers. Convex hull trimming generates a robust estimate of the correlation coefficient. Demonstration –attach(airpoll) –cor(SO2, Mortality)

10
Robust Estimation of the Correlation hull<-chull(SO2, Mortality) # finds the convex hull plot(SO2, Mortality, pch=1) polygon(SO2[hull],Mortality[hull], density=15, angle=30) cor(SO2[-hull],Mortality[-hull]) The results are almost identical, which is unusual.

11
Chiplot A way of augmenting the scatterplot to spot dependence/independence. See Statistics/RSCMPA/functions.txt chiplot(SO2,Mortality,vlabs=c("SO2", "Mortality") For independent data, the points will be scattered in a horiszontal band centered around 0. Departure from independence here is shown by the points missing from (-0.25,0.25)

12
Bivariate Boxplot Two-dimensional analogue of the boxplot A pair of concentric ellipses—the inner ellipse (the “hinge”) holds half the data, and the outer ellipse (the “fence”) identifiers outliers. Regression lines of x on y and y on x are shown. –bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality") Cleaned up (more robust): –bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality", method="O")

13
Bivariate Densities The goal of examining a scatterplot is to identify clusters and outliers. Humans are not particularly good at this, so graphical aids help. Adding a bivariate density estimate is good. Histograms are too rough, though.

14
Demo of Bivariate Density den1<-bivden(SO2,Mortality) persp(den1$seqx, den1$seqy, den1$den, xlab=“SO2”, ylab=“Mortality”, zlab=“Density”, lwd=2) plot(SO2, Mortality) contour(den1$seqx, den1$seqy, den1$den, lwd=2, nlevels=20, add=T)

15
Adding a Third Variable to the Scatterplot The bubbleplot plot(SO2, Mortality, pch=1, lwd=2, ylim=c(700,1200), xlim=c(-5,300)) # basic scatterplot. symbols(SO2, Mortality, circles=Rainfall, inches=0.4, add=TRUE, lwd=2) # adding Rainfall to each point.

16
Scatterplot Matrix pairs(airpoll) To add regression lines –pairs(airpoll,panel=function(x,y) { abline(lsfit(x,y)$coef,lwd=2) lines(lowess(x,y),lty=2,lwd=2) points(x,y)}) For 3D graphics, use cloud –cloud(Mortality~SO2+Rainfall)

17
Conditioning Plots coplot(Mortality~SO2|Popden) To add a local regression fit coplot(Mortality~SO2|Popden, panel=function(x,y,col,pch) panel.smooth(x,y,span=1))

18
Conclusions The purpose of graphics is to aid your intuition. Explore them—the appropriate graphics reflect your questions and the structure of the data. Next week: graphic presentations to avoid, because they mislead you and your audience. Look at the books by Edward Tufte in the library.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google