Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute.

Similar presentations


Presentation on theme: "Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute."— Presentation transcript:

1 Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute

2 Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Landscape  1. Background  2. Modeling Overview  3. Models  4. Model Assessment and Selection  5. Model Deployment / Scoring

3 Copyright © 2006, SAS Institute Inc. All rights reserved. Use Cases for Data Mining 1. Offline applications Campaign planning Adverse event detection 2. On-demand applications Front Office data collection & recommendation 3. Real-time applications Transaction processing Fraud detection Website product recommendation 4. Real time modeling and scoring of data streams (the future!) Mega data streams Internet traffic Satellite transmissions Digital data acquisition

4 Copyright © 2006, SAS Institute Inc. All rights reserved. Background - Enterprise Miner Functionality ample xplore odify odel ssess

5 Copyright © 2006, SAS Institute Inc. All rights reserved. Background - Predictive Modeling Terminology Training Data ObservationsObservations Variables/Features/Attributes Actual Target Scoring Data Actual Target Validation and Test Data Actual Target Predicted Target (Output) Predicted Target (Output)

6 Copyright © 2006, SAS Institute Inc. All rights reserved. Modeling Overview  What do we mean by prediction?  What is a predictive model? Classification/descriminant model– target is categorical, usually binary Regression model– target continuous  Given {x(i),y(i)}, y=f(x,θ) E(y|x,θ) p(y|x,θ)

7 Copyright © 2006, SAS Institute Inc. All rights reserved. Consider the following data Predict the Response for a new value of Attribute Response Attribute

8 Copyright © 2006, SAS Institute Inc. All rights reserved. The Most Simple Model: y = Y Response Attribute

9 Copyright © 2006, SAS Institute Inc. All rights reserved. What about a polynomial ? Response Attribute

10 Copyright © 2006, SAS Institute Inc. All rights reserved. What about a better polynomial ? Response Attribute

11 Copyright © 2006, SAS Institute Inc. All rights reserved. Now acquire more data and call it “validation data” The blue model is said to overfit the training data. The mean model is said to underfit the training data. Response Attribute Training Validation

12 Copyright © 2006, SAS Institute Inc. All rights reserved. Models  Linear Regression y =  0 +  1 x 1 +  2 x 2  Logistic Regression (Generalized Linear Model) log(p j /(1-p j )) =  0 +  1 X 1 +  2 X 2 0-1 target/response variable Fit p j = p(y j =0|x) = 1- p(y j =1|x)

13 Copyright © 2006, SAS Institute Inc. All rights reserved. Idea: What if we break the data into smaller chunks to identify local phenomena ? Response Attribute

14 Copyright © 2006, SAS Institute Inc. All rights reserved. Decision Trees

15 Copyright © 2006, SAS Institute Inc. All rights reserved. Neural Networks ftp://ftp.sas.com/pub/neural/FAQ.html

16 Copyright © 2006, SAS Institute Inc. All rights reserved. Evolution of model training error and validation error Model Error Initialization Training Error Validation Error UnderfittingOverfitting Optimal fit

17 Copyright © 2006, SAS Institute Inc. All rights reserved. Memory Based Reasoning (Nearest Neighbors) X X * * * * * * * * * * * * Y * * * * 2 1 Neighbors

18 Copyright © 2006, SAS Institute Inc. All rights reserved. Model Assessment and Selection – Lift charts Test Data Actual Target Predicted Target (Output) 1 0 0 1.9.8.3.6 1 Decision 1 0 1

19 Copyright © 2006, SAS Institute Inc. All rights reserved. Model Assessment Selection – ROC CURVES

20 Copyright © 2006, SAS Institute Inc. All rights reserved.

21 5. $ Model Deployment / “Scoring” $  It is definitely not (just) about building the models.  Scoring and Score Code  Monitoring

22 Copyright © 2006, SAS Institute Inc. All rights reserved. Batch Score Delivery to Offline Applications ETL for model development and scoring Scores generated on nightly basis ID and Score data pre-loaded into data store Score requests contain ID Decision server translates score to action Campaign Planning Campaign Execution Data Mining

23 Copyright © 2006, SAS Institute Inc. All rights reserved. Thanks!


Download ppt "Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute."

Similar presentations


Ads by Google