Download presentation

Presentation is loading. Please wait.

Published byUriel Hargus Modified about 1 year ago

1
DECISION TREES

2
Decision trees One possible representation for hypotheses

3
Choosing an attribute Idea : a good attribute splits the examples into subsets that are ( ideally ) " all positive " or " all negative " Which is a better choice ? Patrons

4
Using information theory Implement Choose-Attribute in the DTL algorithm based on information content – measured by Entropy Entropy is the measure of uncertainty of a random variable More uncertainty leads to higher entropy More knowledge leads to lower entropy

5
Entropy

6
Entropy Examples

7
Information Gain Measures Reduction in Entropy achieved because of the split. Choose the split that achieves most reduction ( maximizes Information Gain ) Disadvantage : Tends to prefer splits that result in large number of partitions, each being small but pure.

8
Information Gain Example Consider the attributes Patrons and Type : Patrons has the highest Information Gain of all attributes and so is chosen by the DTL algorithm as the root

9
Learned Restaurant Tree Decision tree learned from the 12 examples : Substantially simpler than the full tree Raining and Reservation were not necessary to classify all the data.

10
Stopping Criteria Stop expanding a node when all the records belong to the same class Stop expanding a node when all the records have similar attribute values

11
Overfitting Overfitting results in decision trees that are more complex than necessary Training error does not provide a good estimate of how well the tree will perform on previously unseen records ( need a test set )

12
How to Address Overfitting 1 …

13
How to Address Overfitting 2 …

14
How to Address Overfitting… Is the early stopping rule strictly better than pruning ( i. e., generating the full tree and then cutting it )?

15
Remaining Challenges… Continuous values : Need to be split into discrete categories. Sort all values, then consider split points between two examples in sorted order that have different classifications. Missing values : Affect how an example is classified, information gain calculations, test set error rate. Pretend that the example has all possible values for the missing attribute, weight by its frequency among all the examples in the current node.

16
Summary Advantages of decision trees : Inexpensive to construct Extremely fast at classifying unknown records Easy to interpret for small - sized trees Accuracy is comparable to other classification techniques for many simple data sets Learning performance = prediction accuracy measured on test set

17
K - NEAREST NEIGHBORS

18
K - Nearest Neighbors What value do we assign to the green sample ?

19
K - Nearest Neighbors k = 1 k = 3

20
Decision Regions for 1- NN

22
K - Nearest Neighbors

23
Weighting the Distance to Remove Irrelevant Features + + + + + + + + o o oo o o o o o o o o o o o o o o ?

24
+ + + + + + + + o o oo o o o o o o o o o o o o o o ?

25
++++++++oooooooooooooooooo ?

26
Nearest Neighbors Search q p

27
Quadtree

28
28 Quadtree Construction Input : point set P while Some cell C contains more than 1 point do Split cell C end j k fg l d a b c e i h X 400 100 0 h b i a c de g f k j Y l X 25, Y 300 X 50, Y 200 X 75, Y 100

29
Nearest Neighbor Search

30
Quadtree - Query X Y X1,Y1 P≥X1 P≥Y1 P

31
Quadtree - Query X Y In many cases works X1,Y1 P

32
Quadtree – Pitfall 1 X Y In some cases doesn’t: there could be points in adjacent buckets that are closer X1,Y1 P≥X1 P≥Y1 P

33
Quadtree – Pitfall 2 X Y Could result in Query time Exponential in dimensions

34
Simple data structure. Versatile, easy to implement. Often space and time inefficient. Quadtree

35
kd - trees ( k - dimensional trees ) Main ideas : one - dimensional splits instead of splitting in the middle, choose the split “ carefully ” ( many variations ) nearest neighbor queries same as for quad - trees

36
2- dimensional kd - trees Algorithm Choose x or y coordinate ( alternate between them ). Choose the median of the coordinate this defines a horizontal or vertical line. Recurse on both sides until there is only one point left, which is stored as a leaf. We get a binary tree Size O ( n ). Construction time O ( nlogn ). Depth O ( logn ).

37
Nearest Neighbor with KD Trees We traverse the tree looking for the nearest neighbor of the query point.

38
Examine nearby points first: Explore the branch of the tree that is closest to the query point first. Nearest Neighbor with KD Trees

39
Examine nearby points first: Explore the branch of the tree that is closest to the query point first. Nearest Neighbor with KD Trees

40
When we reach a leaf node: compute the distance to each point in the node. Nearest Neighbor with KD Trees

41
When we reach a leaf node: compute the distance to each point in the node. Nearest Neighbor with KD Trees

42
Then we can backtrack and try the other branch at each node visited. Nearest Neighbor with KD Trees

43
Each time a new closest node is found, we can update the distance bounds. Nearest Neighbor with KD Trees

44
Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees

45
Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees

46
Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees

47
Summary of K - Nearest Neighbor

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google