Presentation is loading. Please wait.

Presentation is loading. Please wait.

Saskatoon SAS user group

Similar presentations


Presentation on theme: "Saskatoon SAS user group"— Presentation transcript:

1 Saskatoon SAS user group
Efficiency and data mining?

2 Agenda Background Case Study

3 Agenda Background Case Study

4 It means different things to different people?
Predictive Analytics…Data science…Statistics…Machine Learning…Data mining It means different things to different people? Uses a variety of tools Data Scientist Business Analyst Heavy Excel user IT Management Executive Consistent answers Tries to avoid next migraine How do we manage this? Show me the easy button Show me the power So what? Data Scientist: Modern machine learning algorithms Quickly build hundreds or thousands of models. Reusable assets and best practices Business Users: Sound, reliable, analytically backed decisions Analytics integrated into day to day operations Easy to use and understand interfaces Easily combine analytical models and rules into business decisions in a single interface

5 The Data Mining Process
CRISP-DM Methodology CRISP-DM is good methodology SEMMA is a process in Enterprise Miner. It aligns well with CRISP-DM This process is your friend. Use it. Iterate. Fail fast. SEMMA Process Sample Explore Modify Model Assess Deploy

6 Building a predictive model
3 Approaches Rapid Predictive Modeler (RPM) Enterprise Miner Preconfigured Enterprise Miner workflow in Enterprise Guide Easy Quick Good models Auditable and reusable Visual workflows Powerful Medium difficulty Great models Auditable and reusable Programming Difficult to learn Some Data Scientists prefer this Not suitable for the business analyst

7 The Data Mining Process
How to add efficiency Understand the problem Understand the data Use visualization early in the process Don’t be afraid to build models, start with RPM Fail fast

8 Agenda Background Case Study

9 The Data Mining Process
Case study We have a problem! Use actionable, in-memory, big-data, cloud, machine-learning, analytics to fix it You mean use predictive modeling to find the trucks that are going to blow up Last time it was altitude related

10 40 000 vehicles – Fleet is ageing Trucks are equipped with Telematics
The data scientist is on vacation Dataset = 1,5GB (2M rows) !!!!!!!!!! - my spreadsheet won’t open it….. Business Analyst Data Scientist

11 What I am going to show you
Case study What I am going to show you Use visualization early in the process to formulate a strategy Sample Explore Modify Model Assess Deploy Demo 1 Visual exploration of timeline Cluster analysis

12 Case study What I am going to show you Don’t be afraid to model
Sample Explore Modify Model Assess Deploy Rapid Predictive Modeler Enterprise Miner Demo 2 Feature engineering 2 Minute model Enterprise Model

13 What I am going to show you
Case study What I am going to show you This is how we derive value from the model Sample Explore Modify Model Assess Deploy Demo 3 Create score-code Geo spatial representation of scored data

14 Sample & Explore Data Demo 1 Visual exploration of timeline
Modify Model Assess Deploy Missing data is a landmine. Identify and remediate. Visualize - Reconstruct a timeline Explore before sub setting or filtering Demo 1 Visual exploration of timeline Cluster Analysis

15

16

17 Sample & Explore Data Sample Explore Modify Model Assess Deploy
Cluster Analysis in Visual Analytics Now that I understand the data, I have a plan Sample only Alternator faults Focus on recent data. Using all the history may pollute my model

18 Modify Model Assess Demo 2 Feature engineering RPM Advanced EM Model
Sample Explore Modify Model Assess Deploy Use Rapid Predictive Modeler to fail fast Look at the variable importance chart Engineer features into the data Mitigate the risk of overfitting – (holdouts, model selection criteria) Demo 2 Feature engineering RPM Advanced EM Model

19 Modify Data Engineered Features Sample Explore Modify Model Assess
Deploy Engineered Features Binning into deciles Altitude Engine hours Years in service Odometer mileage Oil temp Water temp Computed variables RPM Days since service origin Water temp * Oil temp Binning into quartiles Speed RPM Water temp*oil temp Days since service origin

20 Modify Model Assess Sample Explore Modify Model Assess Deploy
Step Misclassification rate % % Improvement Champion Model Just do it – Model on full dataset 10.30 Logistic regression RPM - Regression on segmented data 8.56 16.89 Logistic regression (segmented dataset; sampled) RPM - Intermediate 8.02 6.31 Decision tree 2 RPM - Advanced 7.27 9.35 Decision Tree 3 Add feature engineered variables 6.94 4.54 Use Enterprise Miner 6.46 6.92 Ensemble (neural network and decision tree) We improve the model by iterating

21 Pre release version of SAS Visual Data Mining and Machine Learning

22 Deploy Sample Explore Modify Model Assess Deploy Demo 3
How will the model output be used by someone that knows nothing about data science? Scorecode is useful. A model is not. Visualize the output Demo 3 Create score-code Geo spatial representation of scored data

23 Deploy Sample Explore Modify Model Assess Deploy
Out of a truck fleet of 2000+ 72 have fault codes on alternators 12 are prioritized for maintenance based on the prediction This is where they are

24 The Data Mining Process
How to add efficiency Use visualization early in the process Don’t be afraid to build models, it is easy, start with RPM Fail fast

25 Ideas? Questions?


Download ppt "Saskatoon SAS user group"

Similar presentations


Ads by Google