Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wharton Department of Statistics Profiting from Data Mining Bob Stine Department of Statistics The Wharton School, Univ of Pennsylvania April 5, 2002 www-stat.wharton.upenn.edu/~bob.

Similar presentations


Presentation on theme: "Wharton Department of Statistics Profiting from Data Mining Bob Stine Department of Statistics The Wharton School, Univ of Pennsylvania April 5, 2002 www-stat.wharton.upenn.edu/~bob."— Presentation transcript:

1 Wharton Department of Statistics Profiting from Data Mining Bob Stine Department of Statistics The Wharton School, Univ of Pennsylvania April 5, 2002 www-stat.wharton.upenn.edu/~bob

2 Wharton Department of Statistics 2 Overview  Critical stages of data mining process - Choosing the right data, people, and problems - Modeling - Validation  Automated modeling - Feature creation and selection - Exploiting expert knowledge, “insights”  Applications - Little detail – Biomedical: finding predictive risk factors - More detail – Financial: predicting returns on the market - Lots of detail – Credit: anticipating the onset of bankruptcy

3 Wharton Department of Statistics 3 Predicting Health Risk  Who is at risk for a disease? - Example: detect osteoporosis without expense of x-ray  Goals - Improving public health - Savings on medical care - Confirm an informal model with data mining  Many types of features, interested groups - Clinical observations of doctors - Laboratory measurements, “genetic” - Self-reported behavior  Missing data

4 Wharton Department of Statistics 4 Predicting the Stock Market  Small, “hands-on” example  Goals - Better retirement savings? - Money for that special vacation? College? - Trade-offs: risk vs return  Lots of “free” data - Access to accurate historical time trends, macro factors - Recent data more useful than older data  “Simple” modeling technique  Validation

5 Wharton Department of Statistics 5 Predicting the Market: Specifics  Build a regression model - Response is return on the value-weighted S&P - Use standard forward/backward stepwise - Battery of 12 predictors with interactions  Train the model during 1992-1996 (training data) - Model captures most of variation in 5 years of returns - Retain only the most significant features (Bonferroni)  Predict returns in 1997 (validation data)  Another version in Foster, Stine & Waterman

6 Wharton Department of Statistics 6 Historical patterns? ?

7 Wharton Department of Statistics 7 Fitted model predicts... Exceptional Feb return?

8 Wharton Department of Statistics 8 What happened? Training Period

9 Wharton Department of Statistics 9 Claimed versus Actual Error Actual Claimed Squared Prediction Error

10 Wharton Department of Statistics 10 Over-confidence?  Over-fitting - Model fits the training data too well – better than it can predict the future. - Greedy fitting procedure “Optimization capitalizes on chance”  Some intuition - Coincidences Cancer clusters, the “birthday problem” - Illustration with an auction What is the value of the coins in this jar?

11 Wharton Department of Statistics 11 Auctions and Over-fitting What is the value of these coins?

12 Wharton Department of Statistics 12 Auctions and Over-fitting  Auction jar of coins to a class of MBA students  Histogram shows the bids of 30 students  Most were suspicious, but a few were not!  Actual value is $3.85  Known as “Winner’s Curse”  Similar to over-fitting: best model like high bidder

13 Wharton Department of Statistics 13 Profiting from data mining?  Where’s the profit in this? - “Mining the miners” vs getting value from your data - Lost opportunities  Importance of domain knowledge  Validation as a measure of success - Prediction provides an explicit check - Does your application predict something?

14 Wharton Department of Statistics 14 Pitfalls and Role of Management Over-fitting is dominated by other issues…  Management support - Life in silos - Coordination across domains  Responsibility and reward - Accountability - Who gets the credit when it succeeds? Who suffers if the project is not successful?

15 Wharton Department of Statistics 15 Specific Potholes  Moving targets - “Let’s try this with something else.”  Irrational expectations - “I could have done better than that.”  Not with my data - “It’s our data. You can’t use it.” - “You did not use our data properly.”

16 Wharton Department of Statistics 16 Back to a real application… Emphasis on the statistical issues…

17 Wharton Department of Statistics 17 Predicting Bankruptcy  Goal - Reduce losses stemming from personal bankruptcy  Possible strategies - If can identify those with highest risk of bankruptcy… Take some action Call them for a “friendly chat” about circumstances Unilaterally reduce credit limit  Trade-off - Good customers borrow lots of money - Bad customers also borrow lots of money

18 Wharton Department of Statistics 18 Predicting Bankruptcy  “Needle in a haystack” - 3,000,000 months of credit-card activity - 2244 bankruptcies - Simple predictor that all are OK looks pretty good.  What factors anticipate bankruptcy? - Spending patterns? Payment history? - Demographics? Missing data? - Combinations of factors? Cash Advance + Las Vegas = Problem  We consider more than 100,000 predictors!

19 Wharton Department of Statistics 19 Modeling: Predictive Models  Build the model Identify patterns in training data that predict future observations. - Which features are real? Coincidental?  Evaluate the model How do you know that it works? - During the model construction phase Only incorporate meaningful features - After the model is built Validate by predicting new observations

20 Wharton Department of Statistics 20 Are all prediction errors the same?  Symmetry - Is over-predicting as costly as under-predicting? - Managing inventories and sales - Visible costs versus hidden costs  Does a false positive = a false negative? - Classification in data mining - Credit modeling, flagging “risky” customers - False positive: call a good customer “bad” - False negative: fail to identify a “bad” - Differential costs for different types of errors

21 Wharton Department of Statistics 21 Building a Predictive Model So many choices…  Structure: What type of model? Neural net CART, classification tree Additive model or regression spline  Identification: Which features to use? Time lags, “natural” transformations Combinations of other features  Search: How does one find these features? Brute force has become cheap.

22 Wharton Department of Statistics 22 Our Choices  Structure - Linear regression with nonlinearity via interactions - All 2-way and some 3-way, 4-way interactions - Missing data handled with indicators  Identification - Conservative standard error - Comparison of conservative t-ratio to adaptive threshold  Search - Forward stepwise regression - Coming: Dynamically changing list of features Good choice affects where you search next.

23 Wharton Department of Statistics 23 Identifying Predictive Features  Classical problem of “variable selection”  Thresholding methods (compare t-ratio to threshold) - Akaike information criterion (AIC) - Bayes information criterion (BIC) - Hard thresholding and Bonferroni  Arguments for adaptive thresholds - Empirical Bayes - Information theory - Step-up/step-down tests

24 Wharton Department of Statistics 24 Adaptive Thresholding  Threshold changes to conform to attributes of data - Easier to add features as more are found.  Threshold for first predictor - Compare conservative t-ratio to Bonferroni. - Bonferroni is about Sqrt(2 log p) - If something significant is found, continue.  Threshold for second predictor - Compare t-ratio to reduced threshold - New threshold is about Sqrt(2 log p/2)

25 Wharton Department of Statistics 25 Adaptive Thresholding: Benefits  Easy As easy and fast as implementing the standard criterion that is used in stepwise regression.  Theory Resulting model provably as good as best Bayes model for the problem at hand.  Real world It works! Finds models with real signal, and stops when the signal runs out.

26 Wharton Department of Statistics 26 Bankruptcy Model: Construction  Data: reserve 80% for validation - Training data 600,000 months 458 bankruptcies - Validation data 2,400,000 months 1786 bankruptcies  Selection via adaptive thresholding - Compare sequence of t-statistics to Sqrt(2 log p/q) - Dynamic expansion of feature space

27 Wharton Department of Statistics 27 Bankruptcy Model: Preview  Predictors - Initial search identifies 39 Validation SS monotonically falls to 1650 Linear fit can do no better than 1735 - Expanded search of higher interactions finds a bit more Nature of predictors comprising the interactions Validation SS drops 10 more  Validation: Lift chart - Top 1000 candidates have 351 bankrupt  More validation: Calibration - Close to actual Pr(bankrupt) for most groups.

28 Wharton Department of Statistics 28 Bankruptcy Model: Fitting  Where should the fitting process be stopped?

29 Wharton Department of Statistics 29 Bankruptcy Model: Fitting  Our adaptive selection procedure stops at a model with 39 predictors.

30 Wharton Department of Statistics 30 Bankruptcy Model: Validation  The validation indicates that the fit gets better while the model expands. Avoids over-fitting.

31 Wharton Department of Statistics 31 Bankruptcy Model: Linear?  Choosing from linear predictors (no interactions) does not match the performance of the full search.

32 Wharton Department of Statistics 32 Bankruptcy Model: More?  Searching higher-order interactions offers modest improvement.

33 Wharton Department of Statistics 33 Lift Chart  Measures how well model classifies sought-for group  Depends on rule used to label customers - Very high threshold Lots of lift, but few bankrupt customers are found. - Lower threshold Lift drops, but finds more bankrupt customers.

34 Wharton Department of Statistics 34 Generic Lift Chart Model Random

35 Wharton Department of Statistics 35 Bankruptcy Model: Lift  Much better than diagonal!

36 Wharton Department of Statistics 36 Calibration  Classifier assigns Prob(“BR”) rating to a customer.  Weather forecast  Among those classified as 2/10 chance of “BR”, how many are BR?  Closer to diagonal is better.

37 Wharton Department of Statistics 37 Bankruptcy Model: Calibration  Over-predicts risk above claimed probability 0.4

38 Wharton Department of Statistics 38 Summary of Bankruptcy Model  Automatic, adaptive selection - Finds patterns that predict new observations - Predictive, but not easy to explain  Dynamic feature set - Current research - Information theory allows changing search space - Finds more structure than direct search could find  Validation - Essential only for judging fit. - Better than “hand-made models” that take years to create.

39 Wharton Department of Statistics 39 So, where’s the profit in DM?  Automated modeling has become very powerful, avoiding problems of over-fitting.  Role for expert judgment remains - What data to use? - Which features to try first? - What are the economics of the prediction errors?  Collaboration - Data sources - Data analysis - Strategic decisions


Download ppt "Wharton Department of Statistics Profiting from Data Mining Bob Stine Department of Statistics The Wharton School, Univ of Pennsylvania April 5, 2002 www-stat.wharton.upenn.edu/~bob."

Similar presentations


Ads by Google