Download presentation
Presentation is loading. Please wait.
1
SEG4630 2009-2010 Tutorial 1 – Classification Decision tree, Naïve Bayes & k-NN CHANG Lijun
2
2 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Decision tree, Naïve bayes & k-NN Goal: previously unseen records should be assigned a class as accurately as possible.
3
3 Decision Tree Goal Construct a tree so that instances belonging to different classes should be separated Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Test attributes are selected on the basis of a heuristics or statistical measure (e.g., information gain) Examples are partitioned recursively based on selected attributes
4
4 Attribute Selection Measure 1: Information Gain Let p i be the probability that a tuple belongs to class C i, estimated by |C i,D |/|D| Expected information (entropy) needed to classify a tuple in D: Information needed (after using A to split D into v partitions) to classify D: Information gained by branching on attribute A
5
5 Attribute Selection Measure 2: Gain Ratio Information gain measure is biased towards attributes with a large number of values C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain) GainRatio(A) = Gain(A)/SplitInfo(A)
6
6 Attribute Selection Measure 3: Gini index If a data set D contains examples from n classes, gini index, gini(D) is defined as where p j is the relative frequency of class j in D If a data set D is split on A into two subsets D 1 and D 2, the gini index gini(D) is defined as Reduction in Impurity:
7
7 Example OutlookTemperatureHumidityWindPlay Tennis Sunny>25HighWeakNo Sunny>25HighStrongNo Overcast>25HighWeakYes Rain15-25HighWeakYes Rain<15NormalWeakYes Rain<15NormalStrongNo Overcast<15NormalStrongYes Sunny15-25HighWeakNo Sunny<15NormalWeakYes Rain15-25NormalWeakYes Sunny15-25NormalStrongYes Overcast15-25HighStrongYes Overcast>25NormalWeakYes Rain15-25HighStrongNo
8
8 Tree induction example S[9+, 5-] Outlook Sunny [2+,3-] Overcast [4+,0-] Rain [3+,2-] S[9+, 5-] Temperature <15 [3+,1-] 15-25 [5+,1-] >25 [2+,2-] Info(S) = -9/14(log 2 (9/14))-5/14(log 2 (5/14)) = 0.94 Gain(Outlook) = 0.94 – 5/14[-2/5(log 2 (2/5))-3/5(log 2 (3/5))] – 4/14[-4/4(log 2 (4/4))-0/4(log 2 (0/4))] – 5/14[-3/5(log 2 (3/5))-2/5(log 2 (2/5))] = 0.94 – 0.69 = 0.25 Gain(Temperature) = 0.94 – 4/14[-3/4(log 2 (3/4))-1/4(log 2 (1/4))] – 6/14[-5/6(log 2 (5/6))-1/6(log 2 (1/6))] – 4/14[-2/4(log 2 (2/4))-2/4(log 2 (2/4))] = 0.94 – 0.80 = 0.14
9
9 S[9+, 5-] Humidity High [3+,4-] Normal [6+, 1-] S[9+, 5-] Wind Weak [6+, 2-] Strong [3+, 3-] Gain(Humidity) = 0.94 – 7/14[-3/7(log 2 (3/7))-4/7(log 2 (4/7))] – 7/14[-6/7(log 2 (6/7))-1/7(log 2 (1/7))] = 0.94 – 0.79 = 0.15 Gain(Wind) = 0.94 – 8/14[-6/8(log 2 (6/8))-2/8(log 2 (2/8))] – 6/14[-3/6(log 2 (3/6))-3/6(log 2 (3/6))] = 0.94 – 0.89 = 0.05
10
10 Outlook OvercastSunnyRain Yes ?? Gain(Outlook) = 0.25 Gain(Temperature)=0.14 Gain(Humidity) = 0.15 Gain(Wind) = 0.05 NoWeakHigh>25Sunny NoStrongHigh>25Sunny YesWeakHigh>25Overcast YesWeakHigh15-25Rain YesWeakNormal<15Rain NoStrongNormal<15Rain YesStrongNormal<15Overcast NoWeakHigh15-25Sunny YesWeakNormal<15Sunny YesWeakNormal15-25Rain YesStrongNormal15-25Sunny YesStrongHigh15-25Overcast YesWeakNormal>25Overcast NoStrongHigh15-25Rain Play Tennis WindHumidi ty Tempe rature Outlook
11
11 Info(Sunny) = -2/5(log 2 (2/5)) -3/5(log 2 (3/5)) = 0.97 Sunny[2+,3-] Temperature <15 [1+,0-] 15-25 [1+,1-] >25 [0+,2-] Gain(Temperature) = 0.97 – 1/5[-1/1(log 2 (1/1))-0/1(log 2 (0/1))] – 2/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] – 2/5[-0/2(log 2 (0/2))-2/2(log 2 (2/2))] = 0.97 – 0.4 = 0.37 Sunny[2+, 3-] Wind Weak [1+, 2-] Strong [1+, 1-] Gain(Humidity) = 0.97 – 3/5[-0/3(log 2 (0/3))-3/3(log 2 (3/3))] – 2/5[-2/2(log 2 (2/2))-0/2(log 2 (0/2))] = 0.97 – 0 = 0.97 Gain(Wind) = 0.97 – 3/5[-1/3(log 2 (1/3))-2/3(log 2 (2/3))] – 3/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] = 0.97 – 0.96 = 0.02 Sunny[2+,3-] Humidity High [0+,3-] Normal [2+, 0-]
12
12 Outlook OvercastSunnyRain Yes Humidity ?? Yes No NormalHigh
13
13 Info(Rain) = -3/5(log2(3/5)) -2/5(log 2 (2/5)) = 0.97 Rain[3+,2-] Temperature <15 [1+,1-] 15-25 [2+,1-] >25 [0+,0-] Gain(Outlook) = 0.97 – 2/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] – 3/5[-2/3(log 2 (2/3))-1/3(log 2 (1/3))] – 0/5[-0/0(log 2 (0/0))-0/0(log 2 (0/0))] = 0.97 – 0.75 = 0.22 Rain[3+,2-] Wind Weak [3+, 0-] Strong [0+, 2-] Gain(Humidity) = 0.97 – 2/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] – 3/5[-2/3(log 2 (2/3))-1/3(log 2 (1/3))] = 0.97 – 0.43 = 0.54 Gain(Wind) = 0.97 – 3/5[-3/3(log 2 (3/3))-0/3(log 2 (0/3))] – 2/5[-0/2(log 2 (0/2))-2/2(log 2 (2/2))] = 0.97 – 0 = 0.97 Rain[3+,2-] Humidity High [1+,1-] Normal [2+, 1-]
14
14 Outlook OvercastSunnyRain Yes Humidity Wind Yes No NormalHigh No Yes StrongWeak
15
15 Bayesian Classification A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities where x i is the value of attribute A i Choose the class label that has the highest probability Foundation: Based on Bayes’ Theorem. posteriori probability prior probability likelihood ? Model: compute from data
16
16 Naïve Bayes Classifier Problem: joint probabilities are difficult to estimate Naïve Beyes Classifier Assumption: attributes are conditionally independent
17
17 Naïve Bayes Classifier ABC mbt mst gqt hst gqt gqf gsf hbf hqf mbf P(C=t) = 1/2 P(C=f) = 1/2 P(A=m|C=t) = 2/5 P(A=m|C=f) = 1/5 P(B=q|C=t) = 2/5 P(B=q|C=f) = 2/5 Test Record: A=m, B=q, C=? SEG4630 Tutorial 6 Made by Wenting
18
18 Naïve Bayes Classifier For C = t P(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 * 1/2 = 2/25 P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q) For C = f P(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 * 1/2 = 1/25 P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q) Conclusion: A=m, B=q, C=t Higher! SEG4630 Tutorial 6 Made by Wenting
19
19 Nearest Neighbor Classification Input A set of stored records k: # of nearest neighbors Output Compute distance: Identify k nearest neighbors Determine the class label of unknown record based on class labels of nearest neighbors (i.e. by taking majority vote)
20
20 Nearest Neighbor Classification Input Given 8 training instances P1 (4, 2) Orange P2 (0.5, 2.5) Orange P3 (2.5, 2.5) Orange P4 (3, 3.5) Orange P5 (5.5, 3.5) Orange P6 (2, 4) Black P7 (4, 5) Black P8 (2.5, 5.5) Black k = 1 & k = 3 new instance: Pn (4, 4) ??? Calculate the distances: d(P1, Pn) = d(P2, Pn) = 3.80 d(P3, Pn) = 2.12 d(P4, Pn) = 1.12 d(P5, Pn) = 1.58 d(P6, Pn) = 2 d(P7, Pn) = 1 d(P8, Pn) = 2.12 A Discrete Example
21
21 k = 1 P1 P2P3 P4 P5 P6 P7 P8 Pn P1 P2 P3 P4 P5 P6 P7 P8 Pn k = 3 Nearest Neighbor Classification
22
22 Nearest Neighbor Classification … Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Each attribute must follow in the same range Min-Max normalization Example: Two data records: a = (1, 1000), b = (0.5, 1) dis(a, b) = ?
23
23 Lazy & Eager Learning Two Types of Learning Methodologies Lazy Learning Instance-based learning. (k-NN) Eager Learning Decision-tree and Bayesian classification. ANN & SVM P1 P2 P3 P4 P5 P6 P7 P8 Pn P1 P2 P3 P4 P5 P6 P7 P8 Pn
24
24 Lazy & Eager Learning Key Differences Lazy Learning Do not require model building Less time training but more time predicting Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function Eager Learning Require model building More time training but less time predicting must commit to a single hypothesis that covers the entire instance space
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.