Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Visualization using R How to get, manage, and present data to tell a compelling science story William Head of Academic Outreach, Mendeley.

Similar presentations


Presentation on theme: "Data Visualization using R How to get, manage, and present data to tell a compelling science story William Head of Academic Outreach, Mendeley."— Presentation transcript:

1 Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley

2 1.A short history of graphical presentation of data 2.Introduction to R 3.Finding, cleaning, and presenting data 4.Reproducibility and data sharing

3 Data viz has a long history John Snow’s cholera map helped communicate the idea that cholera was a water-borne disease.

4 Florence Nightingale used dataviz

5 Modernization of dataviz

6 Chart junk: good, bad, and ugly Which presentation is better?

7

8 It can be elegant…

9

10 Tufte

11

12 How our eyes and brain perceive It takes 200 ms to initiate an eye movement, but the red dot can be found in 100 ms or less. This is due to pre-attentive processing.

13 Shape is a little slower than color!

14 Pre-attentive processing fails!

15 There are many “primitive” properties which we perceive Length Width Size Density Hue Color intensity Depth 3-D orientation

16 Length

17 Width

18 Density

19 Hue

20 Color Intensity

21 Depth

22 3D orientation

23

24 Types of color schemes Sequential – suited for ordered data that progress from low to high. Use light colors for low values and dark colors for higher. Diverging – uses hue to show the breakpoint and intensity to show divergent extremes. Qualitative – uses different colors to represent different categories. Beware of using hue/saturation to highlight unimportant categories.

25 Sequential http://colorbrewer2.org/

26 Diverging

27 Qualitative

28 Tips for maps Keep it to 5-7 data classes ~8% of men are red-green colorblind Diverging schemes don’t do well when printed or photocopied Colors will often render differently on different screens, especially low-end LCD screens http://colorbrewer2.org

29 Part 2 Introduction to R

30 Why R? Open source tool Huge variety of packages for any kind of analysis Saves time repeating data processing steps Allows working with more diverse types of data and much larger datasets than Excel Processing is much faster than Excel Scripts are easily shareable, promoting reproducible work

31 .csv and.xls / xlsx Excel files are designed to hold the appearance of the spreadsheet in addition to the data. R just wants the data, so always save as.csv if you have tabular data

32 data structures x<-c(1,2,3,4,5,6,7,8,9,10) x length(x) x[1] x[2] x<-c(1:10) x

33 types of data y<-c(“abc”, “def”, “g”, “h”, “i”) y class(y) y[2] length(y) data can be integer (1,2,3,…), numeric (1.0, 2.3, …), character (a, b, c,…), logical (TRUE, FALSE) or other things

34 Vectors R can hold data organized a few different ways vectors (1,2,3,4) but not (1,2,3,x,y,z) lists – can hold heterogeneous data –1 –2 –a x arrays – multi-dimensional dataframes – lists of vectors - like spreadsheets

35 Vector operations x + 1 x sum(x) mean(x) mean(x+1) x[2]<-x[2]+1 x x+c(2:3) x[2:10] + c(2:3)

36 working with lists y<-list(name = “Bob”, age = 24) y y$name y[1] y[[1]] class(y[1]) class(y[[1]]) y<-list(y$name, “Sue”) y$name y$age[2]<-list(33)

37 Loading data data<-read.csv("C:/Users/William Gunn/Desktop/Dropbox/Scripting/Data/t raffic_accidents/accidents2010_all.csv", header = TRUE, stringsAsFactors = FALSE)

38 Selecting subsets of data “[“ “$” which grep and grepl subset

39 PLOTS ggplot2 – an implementation of the “grammar of graphics” in R a set of graph types and a way of mapping variables to graph features graph types are called “geoms” mappings are “aesthetics” graphs are built up by layering geoms

40 Types of geoms point – dotplot – takes x,y coords of points abline – line layer – takes slope, intercept line – connect points with a line smooth – fit a curve bar – aka histogram – takes vector of data boxplot – box and whiskers density – to show relative distributions errorbar – what it says on the tin


Download ppt "Data Visualization using R How to get, manage, and present data to tell a compelling science story William Head of Academic Outreach, Mendeley."

Similar presentations


Ads by Google