DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Data Mining Lecture 9.
Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Classification and Prediction
Decision Tree Algorithm
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 5 Data mining : A Closer Look.
Introduction to Directed Data Mining: Decision Trees
Chapter 7 Decision Tree.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Data Mining: Classification
Midwestern State University, Wichita Falls TX 1 Computerized Trip Classification of GPS Data: A Proposed Framework Terry Griffin - Yan Huang – Ranette.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Chapter 9 Neural Network.
Basics of Decision Trees  A flow-chart-like hierarchical tree structure –Often restricted to a binary structure  Root: represents the entire dataset.
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
CS690L Data Mining: Classification
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Decision Tree Learning
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Bootstrapped Optimistic Algorithm for Tree Construction
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Chapter 6 Decision Tree.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Classification Algorithms
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Prepared by: Mahmoud Rafeek Al-Farra
Chapter 6 Classification and Prediction
Data Science Algorithms: The Basic Methods
Classification and Prediction
ID3 Algorithm.
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining – Chapter 3 Classification
Decision Tree Concept of Decision Tree
©Jiawei Han and Micheline Kamber
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering

What is Data Mining ??? Data Mining is all about automating the process of searching for patterns in the data. Data mining is the discovery of hidden knowledge, unexpected patterns and new rules in large databases..

Data Mining Techniques Key techniques Association Classification Decision Trees Clustering Techniques Regression

Classification Classification is a most familiar and most popular data mining technique. Classification applications includes image and pattern recognition, loan approval, detecting faults in industrial applications. All approaches to performing classification assumes some knowledge of the data. Training set is used to develop specific parameters required by the technique. The goal of classification is to build a concise model that can be use to predict the class of records whose class label is not know.

Classification Classification consists of assigning a class label to a set of unclassified cases. 1. Supervised Classification The set of possible classes is known in advance. 2. Unsupervised Classification Set of possible classes is not known. After classification we can try to assign a name to that class. Unsupervised classification is called clustering.

Decision tree Classification scheme Generates a tree and a set of rules Set of records divide into 2 subsets ◦ -training set (deriving the classifier) ◦ - test set (measure the accuracy of classifier) Attributes are divided into 2 types -numerical attribute -categorical attribute

Decision tree ◦ A flow-chart-like tree structure ◦ Internal node denotes a test on an attribute ◦ Branch represents an outcome of the test ◦ Leaf nodes represent class labels or class distribution or rule. Use of decision tree: Classifying an unknown sample ◦ Test the attribute values of the sample against the decision tree

Training Dataset

Output: A Decision Tree OUTLOOK HUMIDITY PLAY WINDY PLAY NO PLAY PLAY sunny overcast rain <=75>75true false

Extracting Classification Rules from Trees Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf Each attribute-value pair along a path forms a conjunction The leaf node holds the class prediction Rules are easier for humans to understand

RULE 1: If it is sunny and the humidity is not above 75% then play. RULE 2: If it is sunny and the humidity is not above 75% then play. RULE 3:If it is overcast, then play RULE 4:If it is rainy and not windy, then play. RULE 5:If it is rainy and windy, then don't play. Output: A Decision Tree whether to play a golf OUTLOOK HUMIDITY PLAY WINDY PLAY NO PLAY PLAY sunny overcast rain <=75>75true false

Example The classification of an unknown input vector is done by traversing the tree from the root node to the leaf node. e.g: outlook= rain, temp=70,humidity=65, and weather=true…..then find the value of Class attribute?????

Tree construction Principle Splitting Attribute Splitting Criterion 3 main phases -construction Phase -Pruning Phase -Processing the pruned tree to improve the understandability

The Generic Algorithm Let the training data set be T with class- labels{C1,C2….Ck}. T he tree is built by repeatedly partitioning the training data set The process continued till all the records in partition belong to the same class.

T is homogenous -T contains cases all belonging to a single class Cj. The decision tree for T is a leaf identifying class Cj. T is not homogeneous -T contains cases that belongs to a mixture of classes. -A test is chosen,based on single attribute, that has one or more mutually exclusive outcomes{O1,O2,….On}. -T is partitioned into subset T1,T2,T3…..Tn. where Ti contains all those cases in T that have the outcome Oi of the chosen set. -The decision tree for T consist of decision node identifying the test, and one branch for each possible outcome.

-The same tree building method is applied recursively to each subset of training cases. - n is taken 2,and a binary decision tree is generated. T is trivial - T contains no cases. - The decision tree T is a leaf,but the class to be associated with the leaf must be determined from information other than T.

Decision Tree Construction Algorithms CART(Classification And Regression Tree) ID3(Iterative Dichotomizer 3) C4.5

Advantages Generate understandable rules Able to handle both numeric and categorical attributes They provide clear indication of which fields are most important for prediction or classification.

Weaknesses Some decision trees can only deal with binary-valued target classes Others can assign records to an arbitrary number of classes,but are error-prone when the number of training examples are class gets small. Process of growing a decision tree is computationally expensive.

References ba-data-mining-techniques/index.html Data Mining: Concepts and Techniques (Chapter 7 Slide for textbook), Jiawei Han and Micheline Kamber, Intelligent Database Systems Research Lab, School of Computing Science, Simon Fraser University, Canada Data Mining Techiques: Second edition by Arun K. Pujari.