Chapter 18 From Data to Knowledge

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Chapter 7 – Classification and Regression Trees
Decision Trees.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees Chapter 18 From Data to Knowledge.
ICS 273A Intro Machine Learning
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.
Learning from Observations Chapter 18 Through
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Decision Tree Learning
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
10. Decision Trees and Markov Chains for Gene Finding.
Machine Learning: Ensemble Methods
Data Mining Practical Machine Learning Tools and Techniques
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Decision trees (concept learnig)
Classification Algorithms
Prepared by: Mahmoud Rafeek Al-Farra
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 6 Classification and Prediction
ID3 Vlad Dumitriu.
Data Science Algorithms: The Basic Methods
CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
Decision Tree Saed Sayad 9/21/2018.
Advanced Artificial Intelligence
Machine Learning Techniques for Data Mining
Roberto Battiti, Mauro Brunato
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Clustering.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Learning Chapter 18 and Parts of Chapter 20
Data Mining(中国人民大学) Yang qiang(香港科技大学) Han jia wei(UIUC)
CS639: Data Management for Data Science
A task of induction to find patterns
Decision Trees Jeff Storey.
Machine Learning: Decision Tree Learning
Data Mining CSCI 307, Spring 2019 Lecture 15
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Chapter 18 From Data to Knowledge Decision Trees Chapter 18 From Data to Knowledge

Types of learning Classification: Regression Unsupervised Learning From examples labelled with the “class” create a decision procedure that will predict the class. Regression From examples labelled with a real valued, create a decision procedure that will predict the value. Unsupervised Learning From examples generate groups that are interesting to the user.

Concerns Representational Bias Generalization Accuracy Is the learned concept correct? Gold Standard Comprehensibility Medical diagnosis Efficiency of Learning Efficiency of Learned Procedure

Classification Examples Medical records on patients with diseases. Bank loan records on individuals. DNA sequences corresponding to “motif”. Digit images of digits. See UCI Machine Learning Data Base

Regression Stock histories -> stock future price Patient data -> internal heart pressure House data -> house value Representation is key.

Unsupervised Learning Astronomical maps -> groups of stars that astronomers found useful. Patient Data -> new diseases Treatments depend on correct disease class Gene data -> corregulated genes and transcription factors Often exploratory

Weather Data: Four Features: windy, play, outlook: nominal Temperature: numeric outlook = sunny | humidity <= 75: yes (2.0) | humidity > 75: no (3.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0)

Dumb DT Algorithm Build tree: ( discrete features only) If all entries below node are homogenous, stop Else pick a feature at random, create a node for feature and form subtrees for each of the values of the feature. Recurse on each subtree. Will this work?

Properties of Dumb Algorithm Complexity Homogeneity cost is O(DataSize) Splitting is O(DataSize) Times number of node in tree = bd on work Accuracy on training set perfect Accuracy on test set Not so perfect: almost random

Recall: Iris petalwidth <= 0.6: Iris-setosa (50.0) : | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0)

Heuristic DT algorithm Entropy Set with mixed classes c1, c2,..ck Entropy(S) = - sum lg(pi)*pi where pi is probability of class ci. (estimated) Sum weighted entropies of each subtrees, where weight is proportion of examples in the subtree. This defines a quality measure on features.

Shannon Entropy Entropy is the only function that: Is 0 when only 1 class present Is k if 2^k classes, equally present Is “additive” ie. E(X,Y) = E(X)+E(Y) if X and Y are independent. Entropy sometimes called uncertainty and sometimes information. Uncertainty defined on RV where “draws” are from the set of classes.

Shannon Entropy Properties Probability of guessing the state/class is 2^{-Entropy(S)} Entropy(S) = average number of yes/no questions needed to reveal the state/class.

Majority Function Suppose 2n boolean features. Class defined by n or more features are on. How big is the tree? At least 2n choose n leaves. Prototype Function: At least k of n are true is a common medical concept. Concepts that are prototypical do not match the representational bias of DTS.

Dts with real valued attributes Idea: convert to solved problem For each real valued attribute f with values v1, v2,… vn (sorted) and binary features: f1< (v1+v2)/2 f2 < (v2+v3/2) etc Other approaches possible. E.g. fi<any vj so no sorting needed

DTs ->Rules (j48.part) For each leaf, we make a rule by collecting the tests to the leaf. Number of rules = number of leaves Simplification: test each condition on a rule and see if dropping it harms accuracy. Can we go from Rules to DTs Not easily. Hint: no root.

Summary Comprehensible if tree is not large. Effective if small number of features sufficient. Bias. Does multi-class problems naturally. Can be extended for regression. Easy to implement and low complexity