Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 18 From Data to Knowledge

Similar presentations


Presentation on theme: "Chapter 18 From Data to Knowledge"— Presentation transcript:

1 Chapter 18 From Data to Knowledge
Decision Trees Chapter 18 From Data to Knowledge

2 Types of learning Classification: Regression Unsupervised Learning
From examples labelled with the “class” create a decision procedure that will predict the class. Regression From examples labelled with a real valued, create a decision procedure that will predict the value. Unsupervised Learning From examples generate groups that are interesting to the user.

3 Concerns Representational Bias Generalization Accuracy
Is the learned concept correct? Gold Standard Comprehensibility Medical diagnosis Efficiency of Learning Efficiency of Learned Procedure

4 Classification Examples
Medical records on patients with diseases. Bank loan records on individuals. DNA sequences corresponding to “motif”. Digit images of digits. See UCI Machine Learning Data Base

5 Regression Stock histories -> stock future price
Patient data -> internal heart pressure House data -> house value Representation is key.

6 Unsupervised Learning
Astronomical maps -> groups of stars that astronomers found useful. Patient Data -> new diseases Treatments depend on correct disease class Gene data -> corregulated genes and transcription factors Often exploratory

7 Weather Data: Four Features: windy, play, outlook: nominal
Temperature: numeric outlook = sunny | humidity <= 75: yes (2.0) | humidity > 75: no (3.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0)

8 Dumb DT Algorithm Build tree: ( discrete features only)
If all entries below node are homogenous, stop Else pick a feature at random, create a node for feature and form subtrees for each of the values of the feature. Recurse on each subtree. Will this work?

9 Properties of Dumb Algorithm
Complexity Homogeneity cost is O(DataSize) Splitting is O(DataSize) Times number of node in tree = bd on work Accuracy on training set perfect Accuracy on test set Not so perfect: almost random

10 Recall: Iris petalwidth <= 0.6: Iris-setosa (50.0) :
| | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0)

11 Heuristic DT algorithm
Entropy Set with mixed classes c1, c2,..ck Entropy(S) = - sum lg(pi)*pi where pi is probability of class ci. (estimated) Sum weighted entropies of each subtrees, where weight is proportion of examples in the subtree. This defines a quality measure on features.

12 Shannon Entropy Entropy is the only function that:
Is 0 when only 1 class present Is k if 2^k classes, equally present Is “additive” ie. E(X,Y) = E(X)+E(Y) if X and Y are independent. Entropy sometimes called uncertainty and sometimes information. Uncertainty defined on RV where “draws” are from the set of classes.

13 Shannon Entropy Properties
Probability of guessing the state/class is 2^{-Entropy(S)} Entropy(S) = average number of yes/no questions needed to reveal the state/class.

14 Majority Function Suppose 2n boolean features.
Class defined by n or more features are on. How big is the tree? At least 2n choose n leaves. Prototype Function: At least k of n are true is a common medical concept. Concepts that are prototypical do not match the representational bias of DTS.

15 Dts with real valued attributes
Idea: convert to solved problem For each real valued attribute f with values v1, v2,… vn (sorted) and binary features: f1< (v1+v2)/2 f2 < (v2+v3/2) etc Other approaches possible. E.g. fi<any vj so no sorting needed

16 DTs ->Rules (j48.part)
For each leaf, we make a rule by collecting the tests to the leaf. Number of rules = number of leaves Simplification: test each condition on a rule and see if dropping it harms accuracy. Can we go from Rules to DTs Not easily. Hint: no root.

17 Summary Comprehensible if tree is not large.
Effective if small number of features sufficient. Bias. Does multi-class problems naturally. Can be extended for regression. Easy to implement and low complexity


Download ppt "Chapter 18 From Data to Knowledge"

Similar presentations


Ads by Google