DECISION TREES
Decision trees One possible representation for hypotheses
Choosing an attribute Idea : a good attribute splits the examples into subsets that are ( ideally ) " all positive " or " all negative " Which is a better choice ? Patrons
Using information theory Implement Choose-Attribute in the DTL algorithm based on information content – measured by Entropy Entropy is the measure of uncertainty of a random variable More uncertainty leads to higher entropy More knowledge leads to lower entropy
Entropy
Entropy Examples
Information Gain Measures Reduction in Entropy achieved because of the split. Choose the split that achieves most reduction ( maximizes Information Gain ) Disadvantage : Tends to prefer splits that result in large number of partitions, each being small but pure.
Information Gain Example Consider the attributes Patrons and Type : Patrons has the highest Information Gain of all attributes and so is chosen by the DTL algorithm as the root
Learned Restaurant Tree Decision tree learned from the 12 examples : Substantially simpler than the full tree Raining and Reservation were not necessary to classify all the data.
Stopping Criteria Stop expanding a node when all the records belong to the same class Stop expanding a node when all the records have similar attribute values
Overfitting Overfitting results in decision trees that are more complex than necessary Training error does not provide a good estimate of how well the tree will perform on previously unseen records ( need a test set )
How to Address Overfitting 1 …
How to Address Overfitting 2 …
How to Address Overfitting… Is the early stopping rule strictly better than pruning ( i. e., generating the full tree and then cutting it )?
Remaining Challenges… Continuous values : Need to be split into discrete categories. Sort all values, then consider split points between two examples in sorted order that have different classifications. Missing values : Affect how an example is classified, information gain calculations, test set error rate. Pretend that the example has all possible values for the missing attribute, weight by its frequency among all the examples in the current node.
Summary Advantages of decision trees : Inexpensive to construct Extremely fast at classifying unknown records Easy to interpret for small - sized trees Accuracy is comparable to other classification techniques for many simple data sets Learning performance = prediction accuracy measured on test set
K - NEAREST NEIGHBORS
K - Nearest Neighbors What value do we assign to the green sample ?
K - Nearest Neighbors k = 1 k = 3
Decision Regions for 1- NN
K - Nearest Neighbors
Weighting the Distance to Remove Irrelevant Features o o oo o o o o o o o o o o o o o o ?
o o oo o o o o o o o o o o o o o o ?
oooooooooooooooooo ?
Nearest Neighbors Search q p
Quadtree
28 Quadtree Construction Input : point set P while Some cell C contains more than 1 point do Split cell C end j k fg l d a b c e i h X h b i a c de g f k j Y l X 25, Y 300 X 50, Y 200 X 75, Y 100
Nearest Neighbor Search
Quadtree - Query X Y X1,Y1 P≥X1 P≥Y1 P<X1 P<Y1 P≥X1 P<Y1 P<X1 P≥Y1 X1,Y1
Quadtree - Query X Y In many cases works X1,Y1 P<X1 P<Y1 P<X1 P≥Y1 X1,Y1 P≥X1 P≥Y1 P≥X1 P<Y1
Quadtree – Pitfall 1 X Y In some cases doesn’t: there could be points in adjacent buckets that are closer X1,Y1 P≥X1 P≥Y1 P<X1 P<Y1 P≥X1 P<Y1 P<X1 P≥Y1 X1,Y1
Quadtree – Pitfall 2 X Y Could result in Query time Exponential in dimensions
Simple data structure. Versatile, easy to implement. Often space and time inefficient. Quadtree
kd - trees ( k - dimensional trees ) Main ideas : one - dimensional splits instead of splitting in the middle, choose the split “ carefully ” ( many variations ) nearest neighbor queries same as for quad - trees
2- dimensional kd - trees Algorithm Choose x or y coordinate ( alternate between them ). Choose the median of the coordinate this defines a horizontal or vertical line. Recurse on both sides until there is only one point left, which is stored as a leaf. We get a binary tree Size O ( n ). Construction time O ( nlogn ). Depth O ( logn ).
Nearest Neighbor with KD Trees We traverse the tree looking for the nearest neighbor of the query point.
Examine nearby points first: Explore the branch of the tree that is closest to the query point first. Nearest Neighbor with KD Trees
Examine nearby points first: Explore the branch of the tree that is closest to the query point first. Nearest Neighbor with KD Trees
When we reach a leaf node: compute the distance to each point in the node. Nearest Neighbor with KD Trees
When we reach a leaf node: compute the distance to each point in the node. Nearest Neighbor with KD Trees
Then we can backtrack and try the other branch at each node visited. Nearest Neighbor with KD Trees
Each time a new closest node is found, we can update the distance bounds. Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees
Summary of K - Nearest Neighbor