Oliver Schulte Machine Learning 726 Decision Tree Classifiers.

Oliver Schulte Machine Learning 726 Decision Tree Classifiers

2/13 Overview Parent Node/ Child Node DiscreteContinuous DiscreteMaximum Likelihood Decision Trees logit distribution (logistic regression) Continuousconditional Gaussian (not discussed) linear Gaussian (linear regression)

3/13 Decision Tree Popular type of classifier. Easy to visualize. Especially for discrete values, but also for continuous. Learning: Information Theory.

4/13 Decision Tree Example

5/13 Exercise Find a decision tree to represent A OR B, A AND B, A XOR B. (A AND B) OR (C AND notD AND E)

6/13 Decision Tree Learning Basic Loop: 1. A := the “best” decision attribute for next node. 2. For each value of A, create new descendant of node. 3. Assign training examples to leaf nodes. 4. If training examples perfect classified, then STOP. Else iterate over new leaf nodes.

7/13 Entropy

8/13 Uncertainty and Probability The more “balanced” a probability distribution, the less information it conveys (e.g., about class label). How to quantify? Information Theory: Entropy = Balance. S is sample, p + is proportion positive, p - negative. Entropy(S) = -p + log2(p + ) - p - log2(p - )

9/13 Entropy: General Definition Important quantity in coding theory statistical physics machine learning

10/13 Intuition

11/13 Entropy

12/13 Coding Theory Coding theory: X discrete with 8 possible states (“messages”); how many bits to transmit the state of X ? Shannon information theorem: optimal code length assigns p(x) to each “message” X = x. All states equally likely

13/13 Another Coding Example

14/13 Zipf’s Law General principle: frequent messages get shorter codes. e.g., abbreviations. Information Compression.

15/13 The Kullback-Leibler Divergence Measures information-theoretic “distance” between two distributions p and q. Code length of x in true distribution Code length of x in wrong distribution

16/13 Information Gain

17/13 Splitting Criterion A new attribute value changes the entropy. Intuitively, want to split on attribute that has the greatest reduction in entropy, averaged over its attribute values. Gain(S,A) = expected reduction in entropy due to splitting on A.

18/13 Example

19/13 Playtennis

Oliver Schulte Machine Learning 726 Decision Tree Classifiers.

Similar presentations

Presentation on theme: "Oliver Schulte Machine Learning 726 Decision Tree Classifiers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Oliver Schulte Machine Learning 726 Decision Tree Classifiers.

Similar presentations

Presentation on theme: "Oliver Schulte Machine Learning 726 Decision Tree Classifiers."— Presentation transcript:

Similar presentations

About project

Feedback