Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON.

Similar presentations


Presentation on theme: "Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON."— Presentation transcript:

1 Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

2

3

4

5

6 (from Jessica Hagys thisisindexed.com) Hard-working Middle Class Hypothesis

7 gdp <- read.csv('gdp.csv') hours <- read.csv('hours.csv') gdp.hours <- merge(hours,gdp) gdp.hours$freetime <- 4380 - gdp.hours$hours attach(gdp.hours) plot(freetime ~ gdp) m <- lm (freetime ~ gdp,data=gdp.hours) abline(m,col=3,lw=2) pm <- loess(freetime ~ gdp) lines(spline(gdp,fitted(pm))) Munge & Model OECD Data

8 Visualize the Analysis: is it True?

9 modeling Big Data

10 100 thousand gene measures

11

12 1 million transactions during this presentation

13 If You Liked ____, Youll Love ___ !

14 1 billion clicks during this presentation

15

16 1 million pitches thrown since 2007

17 A Tale of Two Pitchers Hamels Webb

18 xyplot(x ~ y, data=pitch)

19 xyplot(x ~ y, groups=type, data=pitch)

20 xyplot(x ~ y | type, data=pitch)

21 xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x,y, fill= fill, …) })

22 xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x, y, fill= fill, …) })

23 visualizing Big Data

24

25 ggplot2 = grammar of graphics

26

27 qplot(carat, price, data = diamonds)

28 qplot(log(carat), log(price), data = diamonds) qplot(carat, price, log=xy, data = diamonds) OR

29 qplot(log(carat), log(price), data = diamonds, alpha = I(1/20))

30 qplot(log(carat), log(price), data = diamonds, alpha=I(1/20)) + facet_grid(. ~ color)

31

32 R on the cloud

33 Data Desktop

34 CodingClicking vs

35 Linux Apache MySQL R http://labs.dataspora.com/gameday

36

37

38 Final thoughts

39


Download ppt "Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON."

Similar presentations


Ads by Google