Download presentation

Presentation is loading. Please wait.

Published byKhalil Hilborn Modified over 3 years ago

1
Data Mining Techniques: Classification

2
Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists of the same set of multiple attributes as the tuples in the large database W additionally, each tuple has a known class identity –Derive the classification mechanism from the training set E, and then use this mechanism to classify general data (in W)

3
Learning Phase Learning –The class label attribute is credit_rating –Training data are analyzed by a classification algorithm –The classifier is represented in the form of classification rules

4
Testing Phase Testing (Classification) –Test data are used to estimate the accuracy of the classification rules –If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples

5
Classification by Decision Tree A top-down decision tree generation algorithm: ID-3 and its extended version C4.5 (Quinlan’93): J.R. Quinlan, C4.5 Programs for Machine Learning, Morgan Kaufmann, 1993

6
Decision Tree Generation At start, all the training examples are at the root Partition examples recursively based on selected attributes Attribute Selection –Favoring the partitioning which makes the majority of examples belong to a single class Tree Pruning (Overfitting Problem) –Aiming at removing tree branches that may lead to errors when classifying test data Training data may contain noise, …

7
EyeHairHeightOriental Black ShortYes BlackWhiteTallYes BlackWhiteShortYes Black TallYes BrownBlackTallYes BrownWhiteShortYes BlueGoldTallNo BlueGoldShortNo BlueWhiteTallNo BlueBlackShortNo BrownGoldShortNo 1 2 3 4 5 6 7 8 9 10 11 Another Examples

8
Decision Tree

10
Decision Tree Generation Attribute Selection (Split Criterion) –Information Gain (ID3/C4.5/See5) –Gini Index (CART/IBM Intelligent Miner) –Inference Power These measures are also called goodness functions and used to select the attribute to split at a tree node during the tree generation phase

11
Decision Tree Generation Branching Scheme –Determining the tree branch to which a sample belongs –Binary vs. K-ary Splitting When to stop the further splitting of a node –Impurity Measure Labeling Rule –A node is labeled as the class to which most samples at the node belongs

12
Decision Tree Generation Algorithm: ID3 (7.1) Entropy ID: Iterative Dichotomiser

13
Decision Tree Algorithm: ID3

16
yes

17
Decision Tree Algorithm: ID3

18
Exercise 2

19
Decision Tree Generation Algorithm: ID3

22
How to Use a Tree Directly –Test the attribute value of unknown sample against the tree. –A path is traced from root to a leaf which holds the label Indirectly –Decision tree is converted to classification rules –One rule is created for each path from the root to a leaf –IF-THEN is easier for humans to understand

23
Generating Classification Rules

25
There are 4 decision rules are generated by the tree –Watch the game and home team wins and out with friends then bear –Watch the game and home team wins and sitting at home then diet soda –Watch the game and home team loses and out with friend then bear –Watch the game and home team loses and sitting at home then milk Optimization for these rules –Watch the game and out with friends then bear –Watch the game and home team wins and sitting at home then diet soda –Watch the game and home team loses and sitting at home then milk

26
Decision Tree Generation Algorithm: ID3 All attributes are assumed to be categorical (discretized) Can be modified for continuous-valued attributes –Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals –A V | A < V Prefer Attributes with Many Values Cannot Handle Missing Attribute Values Attribute dependencies do not consider in this algorithm

27
Attribute Selection in C4.5

28
Handling Continuous Attributes

30
Sorted By First Cut Second Cut Third Cut

31
Handling Continuous Attributes Root Price On Date T+1 > 18.02 Price On Date T+1 <= 18.02 Price On Date T > 17.84 Price On Date T <= 17.84 Price On Date T+1 > 17.70 Price On Date T+1 <= 17.70 Buy Sell BuySell First Cut Second Cut Third Cut

32
Exercise 3: 分析房價 SF : Square Feet IDLocationTypeMilesSFCMHome Price (K) 1UrbanDetached2200050High 2RuralDetached920005Low 3UrbanAttached31500150High 4UrbanDetached152500250High 5RuralDetached3030001Low 6RuralDetached3250010Medium 7RuralDetached2018005Medium 8UrbanAttached5180050High 9RuralDetached3030001Low 10UrbanAttached251200100Medium CM : No. of Homes in Community

33
Unknown Attribute Values in C4.5 Training Testing

34
Unknown Attribute Values Adjustment of Attribute Selection Measure

35
Fill in Approach

36
Probability Approach

38
Unknown Attribute Values Partitioning the Training Set

39
Probability Approach

40
Unknown Attribute Values Classifying an Unseen Case

41
Probability Approach

42
Evaluation – Coincidence Matrix Cost = $190 * (closing good account) + $10 * (keeping bad account open) Accuracy ( 正確率 ) = (36+632) / 718 = 93.0% Precision ( 精確率 ) for Insolvent = 36/58 = 62.01% Recall ( 捕捉率 ) for Insolvent = 36/64 = 56.25% F Measure = 2 * Precision * Recall / (Precision + Recall ) = 2 * 62.01% * 56.25% / (62.01% + 56.25% ) = 0.7 / 1.1826 = 0.59 Cost = $190 * 22 + $10 * 28 = $4,460 Decision Tree Model

43
Decision Tree Generation Algorithm: Gini Index If a data set S contains examples from n classes, gini index, gini(S), is defined as where p j is the relative frequency of class C j in S. If a data set S is split into two subsets S 1 and S 2 with sizes N 1 and N 2 respectively, the gini index of the split data contains examples from n classes, the gini index, gini(S), is defined as

44
Decision Tree Generation Algorithm: Gini Index The attribute provides the smallest ginisplit(S) is chosen to split the node The computation cost of gini index is less than information gain All attributes are binary splitting in IBM Intelligent Miner –A V | A < V

45
Decision Tree Generation Algorithm: Inference Power A feature that is useful in inferring the group identity of a data tuple is said to have a good inference power to that group identity. In Table 1, given attributes (features) “Gender”, “Beverage”, “State”, try to find their inference power to “Group id”

48
Naive Bayesian Classification Each data sample is a n-dim feature vector –X = (x1, x2,.. xn) for attributes A1, A2, … An Suppose there are m classes –C = {C1, C2,.. Cm} The classifier will predict X to the class Ci that has the highest posterior probability, conditioned on X –X belongs to Ci iff P(Ci|X) > P(Cj|X) for all 1<=j<=m, j!=i

49
Naive Bayesian Classification P(Ci|X) = P(X|Ci) P(Ci) / P(X) –P(Ci|X) = P(Ci ∪ X) / P(X) ; P(X|Ci) = P(Ci ∪ X) / P(Ci) => P(Ci|X) P(X) = P(X|Ci) P(Ci) P(Ci) = si / s –si is the number of training sample of class Ci –s is the total number of training samples Assumption: Independent between Attributes –P(X|Ci) = P(x1|Ci) P(x2|Ci) P(x3|Ci)... P(xn|Ci) P(X) can be ignored

50
Naive Bayesian Classification Classify X=(age=“<=30”, income=“medium”, student=“yes”, credit-rating=“fair”) –P(buys_computer=yes) = 9/14 –P(buys_computer=no)=5/14 –P(age=<30|buys_computer=yes)=2/9 –P(age=<30|buys_computer=no)=3/5 –P(income=medium|buys_computer=yes)=4/9 –P(income=medium|buys_computer=no)=2/5 –P(student=yes|buys_computer=yes)=6/9 –P(student=yes|buys_computer=no)=1/5 –P(credit-rating=fair|buys_computer=yes)=6/9 –P(credit-rating =fair|buys_computer=no)=2/5 –P(X|buys_computer=yes)=0.044 –P(X|buys_computer=no)=0.019 –P(buys_computer=yes|X) = P(X|buys_computer=yes) P(buys_computer=yes)=0.028 –P(buys_computer=no|X) = P(X|buys_computer=no) P(buys_computer=no)=0.007

51
Homework Assignment

Similar presentations

Presentation is loading. Please wait....

OK

Classification Continued

Classification Continued

© 2018 SlidePlayer.com Inc.

All rights reserved.

To ensure the functioning of the site, we use **cookies**. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy & Terms.
Your consent to our cookies if you continue to use this website.

Ads by Google

Raster scan display ppt on tv Ppt on history of irrational numbers Ppt on question tags grammar Free download ppt on revolt of 1857 Ppt on district institute of education and training Fiber post ppt online Ppt on cross docking technique Ppt on machine translation of japanese Ppt on world literacy day Ppt on second law of thermodynamics for kids