Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2011 Deloitte Touche Tohmatsu About me Educational background – Applied Econometrics 4 years statistical modelling experience R experience – 2 years.

Similar presentations


Presentation on theme: "© 2011 Deloitte Touche Tohmatsu About me Educational background – Applied Econometrics 4 years statistical modelling experience R experience – 2 years."— Presentation transcript:

1

2 © 2011 Deloitte Touche Tohmatsu About me Educational background – Applied Econometrics 4 years statistical modelling experience R experience – 2 years Currently Senior Analyst at Deloitte Hobby – rock climbing, data mining competitions Why? - Early retirement Current interest – Text analytics

3 © 2011 Deloitte Touche Tohmatsu 2 Topic: The benefits of R from a data mining competitor’s point of view and from the point of view of an employee at Deloitte Work Professional and pragmatic Home The playful scientist

4 © 2011 Deloitte Touche Tohmatsu 3 Agenda 1.Quick introduction to R 2.What I use R for 3.R at work Introduction to Deloitte Frequently used tools Some of the work we do using R Examples Challenges: Data Storage Challenges: Standardisation How Deloitte is addressing this issue 4.R at home: Some of the work I do using R, at home Flexibility and convenience Examples Prototyping and experimenting Examples 5.Questions 6.Essential R packages for everyday use

5 © 2011 Deloitte Touche Tohmatsu 4 Quick introduction to R “A statistical software created by statisticians, for statisticians” Personally, I use R for data analysis and statistical modelling Unique features worth noting: Open source – free, easy to find help in the active community Understands mathematical computations and matrix operations naturally Thousands of packages, implementations of almost any algorithm

6 © 2011 Deloitte Touche Tohmatsu 5 Introduction to R Thousands of packages, implementations of almost any algorithm ggplot2 EBImage randomForest etc N = 500+ Packages

7 © 2011 Deloitte Touche Tohmatsu 6 R at work

8 © 2011 Deloitte Touche Tohmatsu 7 Introduction to Deloitte 1.We help clients capture, manage and analyse data to help solve important business problems to make informed decisions 2.A holistic process of data mining

9 © 2011 Deloitte Touche Tohmatsu 8 Introduction to Deloitte: Typical activity involved in a project at Deloitte Initiating processes Planning processes Modeling Closing processes Level of Activity Time line Data loading Data preparation But not everything is R 20% - 40% time spent on modelling

10 © 2011 Deloitte Touche Tohmatsu 9 Frequently used tools Geospatial analytics - Tactician Segmentation - Self Organising maps Modelling Visualisation SQL server

11 © 2011 Deloitte Touche Tohmatsu 10 Some of the work we do using R In Deloitte Statistical Analysis and Predictive modelling Time series analysis Social Network Analysis Data visualisation Text analytics (NEW!)

12 © 2011 Deloitte Touche Tohmatsu 11 Examples: Time Series y – retail activity? Time (days) --- Estimate Actual Fitted R package: forecast

13 © 2011 Deloitte Touche Tohmatsu 12 Challenges: Data Storage We have a dedicated tool to store and clean data – SQL R cannot handle large data sets Error: cannot allocate vector of size Kb

14 © 2011 Deloitte Touche Tohmatsu 13 Challenges: Standardisation ‘You’re not the only one using it” One of the reason’s why other commercial tools are preferred over R Transferable skills across the team Reliability of packages Standardised functions and procedures

15 © 2011 Deloitte Touche Tohmatsu 14 How Deloitte is addressing this issue Creating standardised process: R package: RODBC

16 © 2011 Deloitte Touche Tohmatsu 15 How Deloitte is addressing this issue Creating standardised functions: # Density Plot for subject variable DensityPlot <- function(dataset, col) { ds <- data.frame(dataset);ds$c <- ds[,c(col)];a <- ggplot(data=ds, aes(x=c) ) a <- a + geom_density(kernel="biweight");a } DensityPlot (dataset, column number) Retrieving data from the database (RODBC): conn <- odbcDriverConnect("driver=SQL Server; database=DataBaseName; server=servername;") query <- “Select * from TableName” df <- sqlQuery(conn,query ) R package: RODBC

17 © 2011 Deloitte Touche Tohmatsu 16 R at home

18 © 2011 Deloitte Touche Tohmatsu 17 Some of the work I do using R, at home In Deloitte Statistical Analysis and Predictive modelling Time series analysis Social Network Analysis Data visualisation Text analytics (NEW!) (we don’t just use R) At home (data mining competitions) Statistical analysis and Predictive modelling Time series analysis Social Network Analysis Data visualisation Text analytics Image analysis (I mainly use R)

19 © 2011 Deloitte Touche Tohmatsu 18 Flexibility and convenience 1.Is one of the easier programming languages to pick up 2.Dive into the analysis quickly

20 © 2011 Deloitte Touche Tohmatsu 19 Examples Image analysis R package: EBImage

21 © 2011 Deloitte Touche Tohmatsu 20 Examples Image Analysis R package: EBImage

22 © 2011 Deloitte Touche Tohmatsu 21 Prototyping and experimenting 1.Access to the latest most innovative techniques 2.Great for prototyping new algorithms

23 © 2011 Deloitte Touche Tohmatsu 22 Examples: Text analytics + 1 The latest proof that Google can do no wrong | 2 Teen girls look to YouTube for self-image validation | 3 Why libraries need us now more than ever #sxsw | 4 PHOTOS: Amazing Photos of the Sun 5 Why libraries need us now more than ever #sxsw | 6 R package: twitteR

24 © 2011 Deloitte Touche Tohmatsu 23 Examples: Word cloud of twitter feeds R package: wordcloud

25 © 2011 Deloitte Touche Tohmatsu 24 Examples: Text analytics + = What are the common themes that are being tweeted by Time magazine? ?

26 © 2011 Deloitte Touche Tohmatsu 25 Tweet R package: ggplot2 Top words associated to the classification A B C D A B C D

27 © 2011 Deloitte Touche Tohmatsu 26 Classification results TweetsTopic 1Topic 2Topic 3Topic 4 1The latest proof that Google can do no wrong | 60% 2Teen girls look to YouTube for self-image validation | 100% 3Why libraries need us now more than ever #sxsw | 4PHOTOS: Amazing Photos of the Sun 5Why libraries need us now more than ever #sxsw | 6Why libraries need us now more than ever #sxsw | 7 responds q: "What is the most astounding fact about the Universe?" | beautiful vid0% 67%33% 8Why libraries need us now more than ever #sxsw | 9Living Alone Is The New Norm #teamhermit0% 100% 10PHOTOS: Seven days of strange landscapes | 11Subject for Debate: Are Women People? 100% 12 PHOTOS: Seven days of strange landscapes | Israel's bogus case for bombing Gaza obscures political motives | Al Akhbar English 0% 100%0%

28 © 2011 Deloitte Touche Tohmatsu 27 Questions?

29 © 2011 Deloitte Touche Tohmatsu 28 Essential R packages for everyday use Essential ggplot2 reshape RODBC randomForest rpart Nice to have caret forecast tm


Download ppt "© 2011 Deloitte Touche Tohmatsu About me Educational background – Applied Econometrics 4 years statistical modelling experience R experience – 2 years."

Similar presentations


Ads by Google