Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON.

Slides:



Advertisements
Similar presentations
(3) Language learning Grammar Vocabulary Language skills Phonology.
Advertisements

© 2009 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Innovators Forum Editorial Update Jess Wells, Editorial Director As.
Local Geography: Wright County Approaching 1880 Grade 8 Buffalo Community Middle School Fall 2009.
Directed Reading/ Thinking Activity
Medicare is as sustainable as we want it to be Michael M Rachlis MD MSc FRCPC LLD November 5,
© Worldpanel TM division of TNS 2009 Edward Garner Communications Director Worldpanel – UK Middle-Years Consumers Staverton Park – July 2009.

Direct Instruction RF Grade Level Meeting March 9, 2009 Jen Nedrebo.
1 Notes: * CMS projection as of February 2009 assuming no reform; ** CMS projection as of September 2010 after enactment of reform; *** CMS projection.
H2020-LEIT-ICT WP ICT 15 and ICT 16 Big Data and Open Data.
RESOURCES: Issues—Part 1. Putting Skin on Sin. The Magic of Thinking Big, by Joseph Schwartz, Ph.D. (New York: Prentice-Hall, Inc.) In accordance with.
Significance Testing.  A statistical method that uses sample data to evaluate a hypothesis about a population  1. State a hypothesis  2. Use the hypothesis.
Differentiating between directional and non-directional hypotheses.
Understanding the Idealized Intended Curriculum and the Realized Enacted Curriculum Dr. Heather Driscoll.
BEGINNING EXPERIMENTAL CONDITIONS IODINE OUTSIDE OF DIALYSIS TUBING STARCH INSIDE OF DIALYSIS TUBING.
 SMALL  50  BIG  100 ANTE 0 End NEXT LEVEL
Mariandre salazar 7-A.  is the process of collection of data through observation and experimentation. steps Choose problem research Develop a hypothesis.
Big Data Analytics Survey: How Enterprises are REALLY Using Big Data Rebecca Shockley Global Research Leader, Business Analytics IBM Institute for Business.
8.2 What Is The Surface Area? Pg. 7 Surface Area of Prisms and Cylinders.
Evaluating Limits Analytically
Learning analytics is the use of intelligent data, learner-produced data, and analysis models to discover information and social connections, and to predict.
I'm thinking of a number. 12 is a factor of my number. What other factors MUST my number have?
Openlab Workshop on Data Analytics 16 th November 2012.
7.2 What Is The Surface Area? Pg. 6 Surface Area of Prisms and Cylinders.
GOT SCIENCE? ESSENTIAL QUESTION The essential question for this unit is: How do all of the components of a science fair project relate?
Describe Your Big Question What are you investigating? Why did you choose this topic? What did you expect the outcome to be? See
North Olmsted City Schools Butternut Ridge Road North Olmsted, OH Michael E. Zalar, Ph.D. Superintendent Building Our Future Together North.
1 The Good  HPC brings a wealth of parallelization experience, petaflop scaling and hybrid architectures.  Analytics brings new algorithms and new markets.
Quality Information IS Components 5-Component Framework
Sampling variability & the effect of spread of population.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
The Three E’s of Big Data and What DB People can do About Them UC BERKELEY Michael Franklin – UC Berkeley Beckman Database Get Together October 14, 2013.
FINDING THE MEDIAN BY JOEY & BILLY WHAT IS THE MEDIAN? The median is simply the middle number For example: 7,8,9,10,11,12,13, In this situation the median.
Baptism: No Water Down July 12, 2015 Baptism: No Water Down July 12, 2015.
Turning Data into Decisions in a Big Data World Rachel Hawley.
Have you ever had a student dress up for Halloween as a magnifier or CCTV? 704 Using an iPad to Access Books and Distance Information in the Elementary.
Big Data: Final Project Aidan Donohue & Justina Breen December 4, 2015.
REPENTURN July 5, 2015 REPENTURN REPENTURN REPENTURN.
Text Analytics Gateway Project Background Michael Black Drew Schmidt.
Building a tourism intelligence system using big data Jon Kepa Gerrikagoitia, Ph.D. OPTIMA / Optimization Modelling & Analytics ICT - European Software.
Science is a process, or method, that usually starts with an observation.
浙江省. 浙江省位于中国 东南沿海东濒东海 ,南界福建,西与 江西、安徽相连, 北与上海,江苏为 邻。境内最大的河 流钱塘江,因江流 曲折,又称浙江, 省以江名,简称为 浙。
“ 东方明珠 ”── 香港和澳门 东北师大附中 王 瑶. 这是哪个城市的夜景? 香港 这是哪个城市的夜景? 澳门.
BIG DATA Initiative SMART SubstationBig Data Solution.
L4: Effect of pH on Amylase Rate of Reaction
IoT Business Maturity Model 1. Operational efficiency
Empower your Data Analyst
L8: Exercise Learning Objectives:
Mentor Teacher Workshop July 15 – 17, 2009

These building blocks and everything made from them are called matter.
Writing a story .
ارائه دهنده : رضا دادآفرين مرداد ماه 1389
Tidal Streams Tidal Current Turbines: The Next Stage.
Physics-based simulation for visual computing applications
الادارة الصحية: المفهوم والأهمية والخصوصية
The Big 6 Research Model Step 2: Information Seeking Strategies
علم النفس التحليلي كارل غوستاف يونغ
Economics 102 Introduction
The Big 6 Research Model Step 3: Location and Access
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Reading Strategies.
Number in the Middle.
Eggs-periment! A science lesson!.
Geospatial Analytics of Dynamic Landscapes
Hypothesis Statement There is a negative weak correlation between the amount of minutes spent reading per day and the amount of minutes watching television.
MIchaele gebre.
Heart Failure, Aneurysms and Telangiectases, Oh My!
Маңғыстау облысы, Маңғыстау ауданы, Өтес селосы
Data Analytics course.
Presentation transcript:

Open Source Analytics Visualization and Predictive Modeling of Big Data with R Michael E. Driscoll, Ph.D. July 22, 2009 OSCON

(from Jessica Hagys thisisindexed.com) Hard-working Middle Class Hypothesis

gdp <- read.csv('gdp.csv') hours <- read.csv('hours.csv') gdp.hours <- merge(hours,gdp) gdp.hours$freetime < gdp.hours$hours attach(gdp.hours) plot(freetime ~ gdp) m <- lm (freetime ~ gdp,data=gdp.hours) abline(m,col=3,lw=2) pm <- loess(freetime ~ gdp) lines(spline(gdp,fitted(pm))) Munge & Model OECD Data

Visualize the Analysis: is it True?

modeling Big Data

100 thousand gene measures

1 million transactions during this presentation

If You Liked ____, Youll Love ___ !

1 billion clicks during this presentation

1 million pitches thrown since 2007

A Tale of Two Pitchers Hamels Webb

xyplot(x ~ y, data=pitch)

xyplot(x ~ y, groups=type, data=pitch)

xyplot(x ~ y | type, data=pitch)

xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x,y, fill= fill, …) })

xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x, y, fill= fill, …) })

visualizing Big Data

ggplot2 = grammar of graphics

qplot(carat, price, data = diamonds)

qplot(log(carat), log(price), data = diamonds) qplot(carat, price, log=xy, data = diamonds) OR

qplot(log(carat), log(price), data = diamonds, alpha = I(1/20))

qplot(log(carat), log(price), data = diamonds, alpha=I(1/20)) + facet_grid(. ~ color)

R on the cloud

Data Desktop

CodingClicking vs

Linux Apache MySQL R

Final thoughts