Download presentation

Presentation is loading. Please wait.

Published byEsteban Camm Modified about 1 year ago

1
Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC

2
Two Separate Objectives Select prospects for a sales promotion ◦ Model: rank prospects by utility of promotion ◦ Black-box prediction Identify subgroups that a drug will help ◦ Model: plausible characterization ◦ Understandable (simple) description

3
Schism of Inference Rules Discover and replicate ◦ No p-values ◦ Requires more data Pre-specify hypothesis ◦ Multiple-testing limits variety of ideas ◦ Requires less data

4
Modeling Assume randomly assigned treatments Separate models for treated, untreated ◦ Focus on response blurs differential response ◦ Differential response is a weak signal Tree-based models most common ◦ Focus on differential response

5
TITANIC DECISION TREE TITANIC DECISION TREE N=788 P=35% N=288 P=67% Female N=156 P=89% 1 st & 2 nd Class N=132 P=41% 3 rd Class N=500 P=16% Male N=12 P=75% Age 10 N=488 P=15% Age > 10

6
TITANIC DECISION TREE 2 TITANIC DECISION TREE 2 N=801 P=34% N=279 P=66% Women N=144 P=89% 1 st & 2 nd Class N=135 P=41% 3 rd Class N=522 P=17% Men N=118 P=39% 1 st Class N=404 P=11% 2 nd & 3 rd Class

7
FICTITIOUS TITANIC DECISION TREE FICTITIOUS TITANIC DECISION TREE Randomized Treatment: Life Jackets

8
Two Splitting Criteria

9

10
Simulation to Compare Criteria

11

12
MineThatData Data Kevin Hillstrom’s 2008 challange Data (N=42,693): ◦ Customers who purchased within last year Treatment (N=21,387): ◦ Promotion of Women’s merchandise Response: ◦ Customer visited website in next two weeks Challenge: ◦ Rank customers by effect of treatment

13
MineThatData Covariates

14
Random Forest Average prediction over many trees To create different trees: ◦ Use different samples ◦ Exclude variables from a split search

15
Data Roles Data 100.0% N=42,693 ◦ Model 50.0% N=21,347 Train 25.0% N=10,673 Out-Of-Bag 12.5% N=5,336 Prune 12.5% N=5,337 ◦ Test 50.0% N=21,346

16
Cumulative Lift 1. Use treatment test data 2. Use forest to predict treatment effect 3. Sort by predicted treatment effect 4. Cumulate count of responders 5. Plot count as proportion vs percent cases

17
Predict Treatment Effect 2 Sort by Prediction 3 Cumulate Y Percent of treatment test cases Predicted Good -- Predicted Poor Cumulative Lift of Treatment Test Cases

18
Treated population Untreated population Uplift (difference) Uplift from random prediction Percent of population Cumulative Response and Uplift in Test Data

19
113 Subgroups (leaves) 51 Trees 10 Clusters Treatment Effect Overall Cluster 3 Train: Test: Cluster OOB Treatment Effect vs Cluster of Subgroups

20
Thank you for your attention!

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google