Presentation is loading. Please wait.

Presentation is loading. Please wait.

Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Similar presentations


Presentation on theme: "Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008."— Presentation transcript:

1 Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008

2 What is it anyway? Decision tree T – a tree with a root (in graph theory sense), in which we assign the following meanings to its elements: - inner nodes represent attributes, - edges represent values of the attribute, - leafs represent classification decisions. Using decision tree we can visualize a program with only ‘if-then’ instructions.

3

4

5 Testing functions Let us consider an attribute A (e.g. temperature). Let V A mean the set of all possible values of A (0K up to infinity). Let R t mean the set of all possible test results (hot, mild, cold). As a testing function we mean a map t: V A  R t We distinguish two main types of testing functions, depending on the set V A - discrete and continuous.

6 Quality of a decision tree (Occam's razor): - we prefer small, simple trees, - we want to gain maximum accuracy of classification (training set, test set) For example: For example: Q(T) =  *size(T) +  *accuracy(T)

7 Optimal tree – we are given: - a training set S, - a testing functions set TEST, - quality criterion Q. Target: T optimising Q(T). Fact: usually this is NP-hard problem. Conclusion: we have to use heuristics.

8 Building a decision tree: - top_down method: a. In the beginning the root includes all training examples b. We divide them recursively, choosing one attribute at a time - bottom_up: we remove subtrees or edges to gain precision for judging new cases.

9

10 Entropy – average bits amount to represent a decision d for a randomly chosen object from a given set S. Why? Because optimal binary representation assigns –log2(p) bits to a decision which probability is p. We have formula: entropy(p1,...pn)= - p1*log2(p1) -... - pn*log2(pn)

11

12

13 Information gain: gain(.) = info before dividing – info after dividing

14

15 Overtraining: We say that a model H overfits if there is a model H’ such that : - training_error(H) < training_error(H’), - testing_error(H) > testing_error(H’). Avoiding overtraining: - adequate stop criterions, - posprunning, - preprunning.

16 Some decision trees algorithms: - R1, - ID3 (), - ID3 ( Interactive dichotomizer version 3 ), - C4.5 ( - C4.5 ( ID3 + discretization + prunning ), - CART ( - CART ( Classification and Regression Trees ), - CHAID ( - CHAID ( CHi-squared Automatic Interaction ). -Detection ).


Download ppt "Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008."

Similar presentations


Ads by Google