Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2003Data Mining by H. Liu, ASU1 3. Classification Methods Patterns and Models Regression, NBC k-Nearest Neighbors Decision Trees and Rules Large.

Similar presentations


Presentation on theme: "Spring 2003Data Mining by H. Liu, ASU1 3. Classification Methods Patterns and Models Regression, NBC k-Nearest Neighbors Decision Trees and Rules Large."— Presentation transcript:

1 Spring 2003Data Mining by H. Liu, ASU1 3. Classification Methods Patterns and Models Regression, NBC k-Nearest Neighbors Decision Trees and Rules Large size data

2 Spring 2003Data Mining by H. Liu, ASU2 Models and Patterns A model is a global description of data, or an abstract representation of a real-world process –Estimating parameters of a model –Data-driven model building –Examples: Regression, Graphical model (BN), HMM A pattern is about some local aspects of data –Patterns in data matrices Predicates (age < 40) ^ (income < 10) –Patterns for strings (ASCII characters, DNA alphabet) –Pattern discovery: rules

3 Spring 2003Data Mining by H. Liu, ASU3 Performance Measures Generality –How many instances are covered Applicability –Or is it useful? All husbands are male. Accuracy –Is it always correct? If not, how often? Comprehensibility –Is it easy to understand? (a subjective measure)

4 Spring 2003Data Mining by H. Liu, ASU4 Forms of Knowledge Concepts –Probabilistic, logical (proposition/predicate), functional Rules Taxonomies and Hierarchies –Dendrograms, decision trees Clusters Structures and Weights/Probabilities –ANN, BN

5 Spring 2003Data Mining by H. Liu, ASU5 Induction from Data Inferring knowledge from data - generalization Supervised vs. unsupervised learning –Some graphical illustrations of learning tasks (regression, classification, clustering) –Any other types of learning? The task of deduction –infer information that is a logical consequence of querying a database Who conducted this class before? Which courses are attended by Mary? –Deductive databases: extending the RDBMS

6 Spring 2003Data Mining by H. Liu, ASU6 What is a bad classifier? Some simplest classifiers –Table-Lookup What if x cannot be found in the training data? We give up!? –Or, we can … A simple classifier Cs can be built as a reference –If it can be found in the table (training data), return its class; otherwise, what should it return? A bad classifier is one that does worse than Cs. Do we need to learn a classifier for data of one class?

7 Spring 2003Data Mining by H. Liu, ASU7 Many Techniques Decision trees Linear regression Neural networks k-nearest neighbour Naïve Bayesian classifiers Support Vector Machines and many more...

8 Spring 2003Data Mining by H. Liu, ASU8 Regression for Numeric Prediction Linear regression is a statistical technique when class and all the attributes are numeric. y = α + βx, where α and β are regression coefficients We need to use instances to find α and β –by minimizing SSE (least squares) –SSE = Σ (y i -y i ’) 2 = Σ (y i - α + βx i ) 2 Extensions –Multiple regression –Piecewise linear regression –Polynomial regression

9 Spring 2003Data Mining by H. Liu, ASU9 Nearest Neighbor Also called instance based learning Algorithm –Given a new instance x, –find its nearest neighbor –Return y’ as the class of x Distance measures –Normalization?! Some interesting questions –What’s its time complexity? –Does it learn?

10 Spring 2003Data Mining by H. Liu, ASU10 Nearest Neighbor (2) Dealing with noise – k-nearest neighbor –Use more than 1 neighbors –How many neighbors? –Weighted nearest neighbors How to speed up? –Huge storage –Use representatives (a problem of instance selection) Sampling Grid Clustering

11 Spring 2003Data Mining by H. Liu, ASU11 Naïve Bayes Classification This is a direct application of Bayes’ rule P(C|x) = P(x|C)P(C)/P(x) x - a vector of x 1,x 2,…,x n That’s the best classifier you can ever build –You don’t even need to select features, it takes care of it automatically But, there are problems –There are a limited number of instances –How to estimate P(x|C)

12 Spring 2003Data Mining by H. Liu, ASU12 NBC (2) Assume conditional independence between x i ’s We have P(C|x) ≈ P(x 1 |C) P(x i |C) (x n |C)P(C) How good is it in reality? Let’s build one NBC for a very simple data set –Estimate the priors and conditional probabilities with the training data –P(C=1) = ? P(C=2) =? –What is the class for (1,2,1)? P(1|x) ≈ P(1|1) P(2|1) (1|1)P(1), P(2|x) ≈ –What is the class for (1,2,2)?

13 Spring 2003Data Mining by H. Liu, ASU13 Example of NBC C12 7 =43 A1020 121 202 A20 1 2 A31 2 A1A2A3C 1211 0011 2122 1212 0121 2222 1011

14 Spring 2003Data Mining by H. Liu, ASU14 Golf Data

15 Spring 2003Data Mining by H. Liu, ASU15 Decision Trees A decision tree Outlook HumidityWind sunnyovercastrain YES highnormalstrongweak NO YESNOYES

16 Spring 2003Data Mining by H. Liu, ASU16 How to `grow’ a tree? Randomly  Random Forests (Breiman, 2001) What are the criteria to build a tree? –Accurate –Compact A straightforward way to grow is –Pick an attribute –Split data according to its values –Recursively do the first two steps until No data left No feature left

17 Spring 2003Data Mining by H. Liu, ASU17 Discussion There are many possible trees –let’s try it on the golf data How to find the most compact one –that is consistent with the data? Why the most compact? –Occam’s razor principle Issue of efficiency w.r.t. optimality –One attribute at a time or …

18 Spring 2003Data Mining by H. Liu, ASU18 Grow a good tree efficiently The heuristic – to find commonality in feature values associated with class values –To build a compact tree generalized from the data It means for each feature, we check whose splits can lead to pure leaf nodes. Is it a good heuristic? –What do you think? –How to judge it? –Is it really efficient? –How to implement it?

19 Spring 2003Data Mining by H. Liu, ASU19 Measuring the purity of a data set – Entropy Information gain (see the brief review) Choose the feature with max gain Let’s grow one Outlook (7,7) Sun (5) Rain (5) OCa (4)

20 Spring 2003Data Mining by H. Liu, ASU20 Different numbers of values Different attributes can have varied numbers of values Some treatments –Removing useless attributes before learning –Binarization –Discretization Gain-ratio is another practical solution –Gain = root-Info – InfoAttribute(i) –Split-Info = -  ((|T i |/|T|)log 2 (|T i |/|T|)) –Gain-ratio = Gain / Split-Info

21 Spring 2003Data Mining by H. Liu, ASU21 Another kind of problems A difficulty problem. Why is it difficulty? Similar ones are Parity, Majority problems. XOR problem 0 0 0 0 1 1 1 0 1 1 1 0

22 Spring 2003Data Mining by H. Liu, ASU22 Tree Pruning An effective approach to avoid overfitting data and for a more compact tree (easy to understand) Two general ways to prune –Pre-pruning to stop splitting further Any significant difference in classification accuracy before and after division –Post-pruning to trim back

23 Spring 2003Data Mining by H. Liu, ASU23 Rules from Decision Trees Two types of rules –Order sensitive (more compact, less efficient) –Order insensitive The most straightforward way is … Class-based method –Group rules according to classes –Select most general rules (or remove redundant ones) Data-based method –Select one rule at a time (keep the most general one) –Work on the remaining data until all data is covered

24 Spring 2003Data Mining by H. Liu, ASU24 Variants of Decision Trees and Rules Tree stumps Holte’s 1R rules (1992) –For each attribute Sort according to its values v Find the most frequent class value c for each v –Breaking tie with coin flipping Output the most accurate rule as if v then c –An example (the Golf data)

25 Spring 2003Data Mining by H. Liu, ASU25 Handling Large Size Data When data simply cannot fit in memory … –Is it a big problem? Three representative approaches –Smart data structures to avoid unnecessary recalculation Hash trees SPRINT –Sufficient statistics AVC-set (Attribute-Value, Class label) to summarize the class distribution for each attribute Example: RainForest –Parallel processing Make data parallelizable

26 Spring 2003Data Mining by H. Liu, ASU26 Ensemble Methods A group of classifiers –Hybrid (Stacking) –Single type Strong vs. weak learners A good ensemble –Accuracy –Diversity Some major approaches form ensembles –Bagging –Boosting

27 Spring 2003Data Mining by H. Liu, ASU27 Bibliography I.H. Witten and E. Frank. Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. 2000. Morgan Kaufmann. M. Kantardzic. Data Mining – Concepts, Models, Methods, and Algorithms. 2003. IEEE. J. Han and M. Kamber. Data Mining – Concepts and Techniques. 2001. Morgan Kaufmann. D. Hand, H. Mannila, P. Smyth. Principals of Data Mining. 2001. MIT. T. G. Dietterich. Ensemble Methods in Machine Learning. I. J. Kittler and F. Roli (eds.) 1 st Intl Workshop on Multiple Classifier Systems, pp 1-15, Springer-Verlag, 2000.


Download ppt "Spring 2003Data Mining by H. Liu, ASU1 3. Classification Methods Patterns and Models Regression, NBC k-Nearest Neighbors Decision Trees and Rules Large."

Similar presentations


Ads by Google