Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.

Similar presentations


Presentation on theme: "Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications."— Presentation transcript:

1 Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications 5.Plan for this week

2 Data Mining and Information Any result should answer a practical or theoretical question. For your results to be useful, they must be interpretable in most applications. Data mining -- the process of finding, interpreting, and evaluating patterns in large sets of data.

3 Data Mining and Machine Learning Techniques Machine learning programs adapt their behavior with experience. To “learn” is to be trained by data with a set of well defined instructions – machine learning algorithms. Data mining tools are supplements, rather than substitutes, for human knowledge and intuition. The objective of running the learning algorithm on the data is to find some patterns or trends that will aid in understanding the data.

4 Model Classification by Outcome PredictiveClassifier Knowledge Based Expert systems Fuzzy systems Evolutionary programs Neural network Expert systems Genetic algorithm Mathematical Regression analysis Correlation Adaptive learning Cluster analysis Classification and Decision Trees (CART, C5, QUEST) Self-Organizing maps

5 Classification Problem Given dataset D and class label C, find a classifier d such that misclassification rate of d is minimized. Goal – to produce accurate classifier and to understand problem structure Requirements: high accuracy, interpretable, fast construction for very large training data

6 Decision Trees A decision tree T encode d (a classifier) in form of a tree Internal node – binary, k-ary splits Leaf node – labeled with one class label

7

8 Decision Tree Construction Top-down tree construction schema: Examine training data and find best splitting attribute for the root node Partitioning training data Recur on each child node

9 Decision Tree Construction (contd.) BuildTree (Node t, Training data D, Split Selection Method S) (1)Apply S to D to find splitting criterion (2)If (t is not a leaf node) (3) create chidren nodes of t (4) partition D into children partitions (5) recur on each partition (6)Endif Three algorithmic components: Split selection (C5, CART, QUEST, …) Pruning Data access

10 Split Selection Methods Impurity-based split selection: CART, C5 (most common in today’s data mining tools) Model-based split selection: QUEST (Loh and Shih, 1997, freeware, available at www.stat.wisc.edu/~loh, quick, unbiased, efficient, statistical tree)www.stat.wisc.edu/~loh

11 Decision Trees and C5 One of data mining methods commonly reported in the literature. C5 is a software package based on decision tree method by J. R. Quinlan. One major advantage of decision trees over other machine learning techniques is that they produce models (rules) that can be interpreted by humans. To learn more about Rule Induction …Rule Induction

12 CSUS Access to C5 Login to quad Change directory to /opt/C50Release1 Read the “ReadMe” file for example and format requirements You are ready to use C5 An example of C5 application

13 Extracting Knowledge from Gene Expression Data: A Case Study of Batten Disease – S. M. Lin Duke University Medical Center proposed a prototype KDD system to enable scientists to analyze the massive microarray data, form hypotheses, and draw insights directly into underlying mechanisms of diseases. Data  Microarray database  data mining  patterns  human experts  Genomics knowledge base  discoveries

14 Plan for this week Monday (Lu, Dunham part II) –DT-based: 1R, ID3, C5, CART –Rule-generating: Prism Wednesday (Han-ch7, Dunham-part II) –Statistics-based: Regression (D), Naïve Bayes –Distance-based KNN (D) –ANN


Download ppt "Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications."

Similar presentations


Ads by Google