Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Classification Algorithms
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
ICS320-Foundations of Adaptive and Learning Systems
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Classification Techniques: Decision Tree Learning
Induction and Decision Trees. Artificial Intelligence The design and development of computer systems that exhibit intelligent behavior. What is intelligence?
Online Algorithms – II Amrinder Arora Permalink:
Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Decision Tree Algorithm
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
Evaluation.
Induction of Decision Trees
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Data Mining: A Closer Look
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Inductive learning Simplest form: learn a function from examples
COMP3503 Intro to Inductive Modeling
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
Machine Learning CSE 681 CH2 - Supervised Learning.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Learning from Observations Chapter 18 Through
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Classification Techniques: Bayesian Classification
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Classification Algorithms Decision trees Rule-based induction Neural networks Memory(Case) based reasoning Genetic algorithms Bayesian networks Basic Principle.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Data Mining and Decision Support
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Decision Trees.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
CS 9633 Machine Learning Concept Learning
Chapter 6 Classification and Prediction
Decision Tree Saed Sayad 9/21/2018.
Classification Techniques: Bayesian Classification
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
©Jiawei Han and Micheline Kamber
Presentation transcript:

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island

What is Data Mining? Data mining is the application of machine learning techniques to large databases in order to extract knowledge. (KDD – Knowledge Discovery in Databases) No longer strictly true, data mining now encompasses other computational techniques outside the classic machine learning domain.

What is Machine Learning? Programs that get better with experience given a task and some performance measure. –Learning to classify news articles –Learning to recognize spoken words –Learning to play board games –Learning to classify customers

What is Knowledge? •Structural descriptions of data (transparent) –If-then-else rules –Decision trees •Models of data (non-transparent) –Neural Networks –Clustering (Self-Organizing Maps, k-Means) –Naïve-Bayes Classifiers

Why Data Mining? Oversimplifying somewhat: Queries allow you to retrieve existing knowledge from a database. Data mining induces new knowledge in a database.

Why Data Mining? (Cont.) Example: Give me a description of customers who spent more than $100 in my store.

Why Data Mining? (Cont.) The Query: •The only thing a query can do is give you a list of every single customer who spent more than $100. •Probably not very informative with the exception that you will most likely see a lot of customer records.

Why Data Mining? (Cont.) Data Mining Techniques: •Data mining techniques allow you to generate structural descriptions of the data in question, i.e., induce new knowledge. •In the case of rules this might look something like: IF age < 35 AND car = MINIVAN THEN spent > $100

Why Data Mining? (Cont.) • In principle, you could generate the same kind of knowledge you gain with data mining techniques using only queries: – look at the data set of customers who spent more that $100 and propose a hypothesis – test this hypothesis against your data using a query – if the query returns a non-null result set then you have found a description of a subset of your customers • Time consuming, undirected search.

Decision Trees • Decision trees are concept learning algorithms • Once a concept is acquired the algorithm can classify objects according to this concept. • Concept Learning: – acquiring the definition of a general category given a sample of positive and negative examples of the category, – can be formulated as a problem of searching through a predefined space of potential concepts for the concept that best fits the training examples. • Best known algorithms: ID3, C4.5, CART

Example MI – Myocardial Infarction (Source: ® Neural Networks and Artificial Intelligence for Biomedical Engineering ”, IEEE Press, 2000) Below is a table of patients who have entered the emergency room complaining about chest pain – two types of diagnoses: Angina and Myocardial Infarction. Question: can we generalize beyond this data?

Example (Cont ’ d) C4.5 induces the following decision tree for the data: Decision Surface Systolic Blood Pressure > 130 <= 130 Angina Myocardial Infarction

Definition of Concept Learning Notes: – This is called supervised learning because of the necessity of labeled data provided by the trainer.  Once we have determined c ’ we can use it to make predictions on unseen elements of the data universe. • Given: – A data universe X  A sample set S, where S  X  Some target concept c: X  {true, false}  Labeled training examples D, where D = { | s  S } • Using D determine:  A function c ’ such that c ’ (x)  c(x) for all x  X.

The Inductive Learning Hypothesis Any function found to approximate the target concept well over a sufficiently large set of training examples will also approximate the target concept well over other unobserved examples. In other words, we are able to generalize beyond what we have seen.

Recasting our Example as a Concept Learning Problem  The data universe X are ordered pairs of the form Systolic Blood Pressure Ž White Blood Count  The sample set S  X is the table of value pairs we are given  Target concept: Diagnosis: X  {Angina, MI}  Training examples is the table where D = { | s  S }  Find a function Diagnosis ’ that best describes D.

Recasting our Example as a Concept Induction Problem A definition of the learned function Diagnosis ’ : Diagnosis ’ (Systolic Blood Pressure, White Blood Count) = IF Systolic Blood Pressure > 130 THEN Diagnosis ’ = Angina ELSE IF Systolic Blood Pressure <= 130 THEN Diagnosis ’ = MI.

Decision Tree Representation We can represent the learned function as a tree: " Each internal node tests an attribute • Each branch corresponds to attribute value • Each leaf node assigns a classification Systolic Blood Pressure > 130 <= 130 Angina Myocardial Infarction

Entropy • S is a sample of training examples  p  is the proportion of positive examples in S  p  is the proportion of negative examples in S  Entropy measures the impurity (randomness) of S Entropy(S)  - p + log 2 p + - p  log 2 p  p+p+

Top-down Induction of Decision Trees • Recursive Algorithm • Main loop: – Let attribute A be the attribute that minimizes the entropy at the current node – For each value of A, create new descendant of node – Sort training examples to leaf nodes – If training examples classified satisfactorily, then STOP, else iterate over new leaf nodes.

Information Gain Gain(S, A) = expected reduction in entropy due to sorting on A. In other words, Gain(S, A) is the information provided about the target concept, given the value of some attribute A.

Training, Evaluation and Prediction We know how to induce classification rules on the data, but: • How do we measure performance? • How do we use our rules to do prediction?

Training & Evaluation The simplest method of measuring performance is the hold- out method: • Given labeled data D, we divide the data into two sets: – A hold-out (test) set D h of size h,  A training set D t = D – D h.  The error of the induced function c ’ t is given as follows: where  (p, q) = 1 if p  q and 0 otherwise. å   h Dscs th scsc h error )(, ))(),('( 1 

Training & Evaluation • However, since we trained and evaluated the learner on a finite set of data we want to know what the confidence interval is. • We can compute the 95% confidence interval of error h as follows,  Assume that the hold-out set D h has h  30 members. – Assume that each d in D h has been selected independently and according to the probability distribution over the domain. Then: h error hh h )1( 96.1  

Prediction  As we have said earlier, the induced function c ’  c, that is, the induced function is an estimate of the target concept.  Therefore, we can use c ’ to estimate (predict) the label for any unseen instance x  X with an appropriate accuracy.

Summary • Data Mining is the application of machine learning algorithms to large databases in order to induce new knowledge. • Machine Learning can be considered to be a directed search over the space of all possible descriptions of the training data for the best description of the data set that also generalizes well to unseen instances. • Decision trees are concept learning algorithms that learn classification functions.