Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decision Tree Pruning. Problem Statement We like to output small decision tree  Model Selection The building is done until zero training error Option.

Similar presentations


Presentation on theme: "Decision Tree Pruning. Problem Statement We like to output small decision tree  Model Selection The building is done until zero training error Option."— Presentation transcript:

1 Decision Tree Pruning

2 Problem Statement We like to output small decision tree  Model Selection The building is done until zero training error Option I : Stop Early  Small decrease in index function  Cons: may miss structure Option 2: Prune after building.

3 Pruning Input: tree T Sample: S Output: Tree T’ Basic Pruning: T’ is a sub-tree of T  Can only replace inner nodes by leaves More advanced:  Replace an inner node by one of its children

4 Reduced Error Pruning Split the sample to two part S 1 and S 2 Use S 1 to build a tree. Use S 2 to sample whether to prune. Process every inner node v  After all its children has been process  Compute the observed error of T v and leaf(v)  If leaf(v) has less errors replace T v by leaf(v)

5 Reduced Error Pruning: Example

6 Pruning: CV & SRM Generate for each pruning size  compute the minimal error pruning  At most m different sub-trees Select between the prunings  Cross Validation  Structural Risk Minimization  Any other index method

7 Finding the minimum pruning Procedure Compute Inputs:  k : number of errors  T : tree  S : sample Output:  P : pruned tree  size : size of P

8 Procedure compute IF IsLeaf(T)  IF Errors(T)  k THEN size=1 ELSE size =   P=T; return; IF Errors(root(T))  k  size=1; P=root(T); return;

9 Procedure compute For i = 0 to k DO  Call Compute(i, T[0], S 0, size i,0,P i.0 )  Call Compute(k-i, T[1], S 1, size i,1,P i.1 ) size = minimum {size i,0 + size i,1 +1} I = arg min {size i,0 + size i,1 +1} P = MakeTree(root(T),P I,0, P I,1 } What is the time complexity?

10 Cross Validation Split the sample S 1 and S 2 Build a tree using S 1 Compute the candidate pruning Select using S 2 Output the tree with smallest error on S 2

11 SRM Build a Tree T using S Compute the candidate pruning k d the size of the pruning with d errors Select using the SRM formula

12 Drawbacks Running time  Since |T| = O(m)  Running time O(m 2 )  Many passes over the data Significant drawback for large data sets

13 Linear Time Pruning Single Bottom-up pass  linear time Use SRM like formula  Local soundness Competitiveness to any pruning

14 Algorithm Process a node after processing its children Local parameters:  T v current sub-tree at v, of size size v  S v sample reaching v, of size m v  l v length of path leading to v Local Test: obs(T v,S v ) + a(m v,size v,l v,  ) > obs (root(T v ),S v )

15 The function a() Parameters:  paths(H,l) set of paths of length l over H.  trees(H,s) set of trees of size s over H. Formula:

16 The function a() Finite Class H  |paths(H,l)| < |H| l.  |trees(H,s)| < (4|H|) s. Formula: Infinite Classes: VC-dim

17 Example l v =3 size v mvmv m a(m v,size v,l v,  )

18 Local uniform convergence Sample S  S c = { x  S |c(x)=1}, m c =|S c | Finite classes C and H  e(h|c) = Pr[ h(x)  f(x) | c(x)=1 ]  obs(h|c) Lemma:with probability 1- 

19 Global Analysis Notation  T original tree (depends on S)  T * pruned tree  T opt optimal tree  r v = (l v +size v )log|H| +log (m v /  )  a(m v,size v,l v,  ) = O( sqrt{ r v /m v })  1

20 Sub-Tree Property Lemma: with probability 1-  T * is a sub-tree of T opt Proof:  Assume the all the local lemmas hold.  Each pruning reduces the error.  Assume T * has a subtree outside T opt  Adding that subtree to T opt will improve it!

21 Comparing T * and T opt Additional pruned nodes: V={v 1, …, v t } Additional error:  e(T * ) - e(T opt ) =  (e(v i )-e(T * vi ))Pr[v i ] Claim: With high probability

22 Analysis Lemma: With probability 1-   If Pr[v i ] > 12(l opt log |H|+ log 1/  )/m =b  THEN Pr[v i ] > 2obs(v i ) Proof:  Relative Chernoff Bound.  Union over |H| l paths. V’ = {v i  V | Pr[v i ]>b}

23 Analysis of  Sum over V-V’ bounded by s opt b

24 Analysis of  Sum of m v < l opt size opt Sum of r v <size opt (l opt log |H|+ log m/  ) Putting it all together


Download ppt "Decision Tree Pruning. Problem Statement We like to output small decision tree  Model Selection The building is done until zero training error Option."

Similar presentations


Ads by Google