Presentation is loading. Please wait.

Presentation is loading. Please wait.

SEG4630 2009-2010 Tutorial 1 – Classification Decision tree, Naïve Bayes & k-NN CHANG Lijun.

Similar presentations


Presentation on theme: "SEG4630 2009-2010 Tutorial 1 – Classification Decision tree, Naïve Bayes & k-NN CHANG Lijun."— Presentation transcript:

1 SEG4630 2009-2010 Tutorial 1 – Classification Decision tree, Naïve Bayes & k-NN CHANG Lijun

2 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.  Find a model for class attribute as a function of the values of other attributes. Decision tree, Naïve bayes & k-NN  Goal: previously unseen records should be assigned a class as accurately as possible.

3 3 Decision Tree  Goal Construct a tree so that instances belonging to different classes should be separated  Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Test attributes are selected on the basis of a heuristics or statistical measure (e.g., information gain) Examples are partitioned recursively based on selected attributes

4 4 Attribute Selection Measure 1: Information Gain  Let p i be the probability that a tuple belongs to class C i, estimated by |C i,D |/|D|  Expected information (entropy) needed to classify a tuple in D:  Information needed (after using A to split D into v partitions) to classify D:  Information gained by branching on attribute A

5 5 Attribute Selection Measure 2: Gain Ratio  Information gain measure is biased towards attributes with a large number of values  C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain) GainRatio(A) = Gain(A)/SplitInfo(A)

6 6 Attribute Selection Measure 3: Gini index  If a data set D contains examples from n classes, gini index, gini(D) is defined as where p j is the relative frequency of class j in D  If a data set D is split on A into two subsets D 1 and D 2, the gini index gini(D) is defined as  Reduction in Impurity:

7 7 Example OutlookTemperatureHumidityWindPlay Tennis Sunny>25HighWeakNo Sunny>25HighStrongNo Overcast>25HighWeakYes Rain15-25HighWeakYes Rain<15NormalWeakYes Rain<15NormalStrongNo Overcast<15NormalStrongYes Sunny15-25HighWeakNo Sunny<15NormalWeakYes Rain15-25NormalWeakYes Sunny15-25NormalStrongYes Overcast15-25HighStrongYes Overcast>25NormalWeakYes Rain15-25HighStrongNo

8 8 Tree induction example S[9+, 5-] Outlook Sunny [2+,3-] Overcast [4+,0-] Rain [3+,2-] S[9+, 5-] Temperature <15 [3+,1-] 15-25 [5+,1-] >25 [2+,2-] Info(S) = -9/14(log 2 (9/14))-5/14(log 2 (5/14)) = 0.94 Gain(Outlook) = 0.94 – 5/14[-2/5(log 2 (2/5))-3/5(log 2 (3/5))] – 4/14[-4/4(log 2 (4/4))-0/4(log 2 (0/4))] – 5/14[-3/5(log 2 (3/5))-2/5(log 2 (2/5))] = 0.94 – 0.69 = 0.25 Gain(Temperature) = 0.94 – 4/14[-3/4(log 2 (3/4))-1/4(log 2 (1/4))] – 6/14[-5/6(log 2 (5/6))-1/6(log 2 (1/6))] – 4/14[-2/4(log 2 (2/4))-2/4(log 2 (2/4))] = 0.94 – 0.80 = 0.14

9 9 S[9+, 5-] Humidity High [3+,4-] Normal [6+, 1-] S[9+, 5-] Wind Weak [6+, 2-] Strong [3+, 3-] Gain(Humidity) = 0.94 – 7/14[-3/7(log 2 (3/7))-4/7(log 2 (4/7))] – 7/14[-6/7(log 2 (6/7))-1/7(log 2 (1/7))] = 0.94 – 0.79 = 0.15 Gain(Wind) = 0.94 – 8/14[-6/8(log 2 (6/8))-2/8(log 2 (2/8))] – 6/14[-3/6(log 2 (3/6))-3/6(log 2 (3/6))] = 0.94 – 0.89 = 0.05

10 10 Outlook OvercastSunnyRain Yes ?? Gain(Outlook) = 0.25 Gain(Temperature)=0.14 Gain(Humidity) = 0.15 Gain(Wind) = 0.05 NoWeakHigh>25Sunny NoStrongHigh>25Sunny YesWeakHigh>25Overcast YesWeakHigh15-25Rain YesWeakNormal<15Rain NoStrongNormal<15Rain YesStrongNormal<15Overcast NoWeakHigh15-25Sunny YesWeakNormal<15Sunny YesWeakNormal15-25Rain YesStrongNormal15-25Sunny YesStrongHigh15-25Overcast YesWeakNormal>25Overcast NoStrongHigh15-25Rain Play Tennis WindHumidi ty Tempe rature Outlook

11 11 Info(Sunny) = -2/5(log 2 (2/5)) -3/5(log 2 (3/5)) = 0.97 Sunny[2+,3-] Temperature <15 [1+,0-] 15-25 [1+,1-] >25 [0+,2-] Gain(Temperature) = 0.97 – 1/5[-1/1(log 2 (1/1))-0/1(log 2 (0/1))] – 2/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] – 2/5[-0/2(log 2 (0/2))-2/2(log 2 (2/2))] = 0.97 – 0.4 = 0.37 Sunny[2+, 3-] Wind Weak [1+, 2-] Strong [1+, 1-] Gain(Humidity) = 0.97 – 3/5[-0/3(log 2 (0/3))-3/3(log 2 (3/3))] – 2/5[-2/2(log 2 (2/2))-0/2(log 2 (0/2))] = 0.97 – 0 = 0.97 Gain(Wind) = 0.97 – 3/5[-1/3(log 2 (1/3))-2/3(log 2 (2/3))] – 3/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] = 0.97 – 0.96 = 0.02 Sunny[2+,3-] Humidity High [0+,3-] Normal [2+, 0-]

12 12 Outlook OvercastSunnyRain Yes Humidity ?? Yes No NormalHigh

13 13 Info(Rain) = -3/5(log2(3/5)) -2/5(log 2 (2/5)) = 0.97 Rain[3+,2-] Temperature <15 [1+,1-] 15-25 [2+,1-] >25 [0+,0-] Gain(Outlook) = 0.97 – 2/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] – 3/5[-2/3(log 2 (2/3))-1/3(log 2 (1/3))] – 0/5[-0/0(log 2 (0/0))-0/0(log 2 (0/0))] = 0.97 – 0.75 = 0.22 Rain[3+,2-] Wind Weak [3+, 0-] Strong [0+, 2-] Gain(Humidity) = 0.97 – 2/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] – 3/5[-2/3(log 2 (2/3))-1/3(log 2 (1/3))] = 0.97 – 0.43 = 0.54 Gain(Wind) = 0.97 – 3/5[-3/3(log 2 (3/3))-0/3(log 2 (0/3))] – 2/5[-0/2(log 2 (0/2))-2/2(log 2 (2/2))] = 0.97 – 0 = 0.97 Rain[3+,2-] Humidity High [1+,1-] Normal [2+, 1-]

14 14 Outlook OvercastSunnyRain Yes Humidity Wind Yes No NormalHigh No Yes StrongWeak

15 15 Bayesian Classification  A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities where x i is the value of attribute A i Choose the class label that has the highest probability  Foundation: Based on Bayes’ Theorem. posteriori probability prior probability likelihood ? Model: compute from data

16 16 Naïve Bayes Classifier  Problem: joint probabilities are difficult to estimate  Naïve Beyes Classifier Assumption: attributes are conditionally independent

17 17 Naïve Bayes Classifier ABC mbt mst gqt hst gqt gqf gsf hbf hqf mbf P(C=t) = 1/2 P(C=f) = 1/2 P(A=m|C=t) = 2/5 P(A=m|C=f) = 1/5 P(B=q|C=t) = 2/5 P(B=q|C=f) = 2/5 Test Record: A=m, B=q, C=? SEG4630 Tutorial 6 Made by Wenting

18 18 Naïve Bayes Classifier  For C = t P(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 * 1/2 = 2/25 P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q)  For C = f P(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 * 1/2 = 1/25 P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q)  Conclusion: A=m, B=q, C=t Higher! SEG4630 Tutorial 6 Made by Wenting

19 19 Nearest Neighbor Classification  Input  A set of stored records  k: # of nearest neighbors  Output  Compute distance:  Identify k nearest neighbors  Determine the class label of unknown record based on class labels of nearest neighbors (i.e. by taking majority vote)

20 20 Nearest Neighbor Classification  Input Given 8 training instances P1 (4, 2)  Orange P2 (0.5, 2.5)  Orange P3 (2.5, 2.5)  Orange P4 (3, 3.5)  Orange P5 (5.5, 3.5)  Orange P6 (2, 4)  Black P7 (4, 5)  Black P8 (2.5, 5.5)  Black k = 1 & k = 3  new instance: Pn (4, 4)  ???  Calculate the distances: d(P1, Pn) = d(P2, Pn) = 3.80 d(P3, Pn) = 2.12 d(P4, Pn) = 1.12 d(P5, Pn) = 1.58 d(P6, Pn) = 2 d(P7, Pn) = 1 d(P8, Pn) = 2.12 A Discrete Example

21 21 k = 1 P1 P2P3 P4 P5 P6 P7 P8 Pn P1 P2 P3 P4 P5 P6 P7 P8 Pn k = 3 Nearest Neighbor Classification

22 22 Nearest Neighbor Classification …  Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes  Each attribute must follow in the same range  Min-Max normalization Example:  Two data records: a = (1, 1000), b = (0.5, 1)  dis(a, b) = ?

23 23 Lazy & Eager Learning  Two Types of Learning Methodologies Lazy Learning  Instance-based learning. (k-NN) Eager Learning  Decision-tree and Bayesian classification.  ANN & SVM P1 P2 P3 P4 P5 P6 P7 P8 Pn P1 P2 P3 P4 P5 P6 P7 P8 Pn

24 24 Lazy & Eager Learning  Key Differences Lazy Learning  Do not require model building  Less time training but more time predicting  Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function Eager Learning  Require model building  More time training but less time predicting  must commit to a single hypothesis that covers the entire instance space


Download ppt "SEG4630 2009-2010 Tutorial 1 – Classification Decision tree, Naïve Bayes & k-NN CHANG Lijun."

Similar presentations


Ads by Google