Next Generation Techniques: Trees, Network and Rules

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
From Decision Trees To Rules
Decision Tree Approach in Data Mining
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
1 DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES Emily Thomas Director of Planning and Institutional Research.
Decision Tree Rong Jin. Determine Milage Per Gallon.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Basic Data Mining Techniques Chapter Decision Trees.
Tree-based methods, neutral networks
Data Mining.
Basic Data Mining Techniques
Three kinds of learning
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Artificial Neural Networks (ANNs)
A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T Y
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Data Mining: A Closer Look
Introduction to Directed Data Mining: Neural Networks
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Basic Data Mining Techniques
© Negnevitsky, Pearson Education, Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data.
Issues with Data Mining
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Inductive learning Simplest form: learn a function from examples
Slides for “Data Mining” by I. H. Witten and E. Frank.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Chapter 9 – Classification and Regression Trees
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Mining By Dave Maung.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
DATA MINING Patrick J. Gallagher March 21, What is Data Mining?
Soft Computing Lecture 19 Part 2 Hybrid Intelligent Systems.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Mining Copyright KEYSOFT Solutions.
Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.
Classification and Regression Trees
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Data Based Decision Making
Introduction to Data Mining, 2nd Edition by
Artificial Intelligence Lecture No. 28
Classification with CART
Presentation transcript:

Next Generation Techniques: Trees, Network and Rules

What is a Decision Tree Decision tree is a predictive model that can be viewed as a tree Specifically, each branch of the tree is a classification question, and the leaves of the tree are partitions of the dataset with their classification

What is a Decision Tree (cont’d) Some interesting things about the tree: It divides up the data on each branch point without losing any of the data The number of churners and non-churners is conserved as you move up or down the tree It it pretty easy to understand how the model is being built It is pretty easy to use this model

Applying Decision Tree to Business Because of their tree structure and capability to easily generate rules, decision tree are the favored technique for building understandable models Because of their high level of automation and the ease of translating decision tree models into SQL for deployment in relational databases, the technology has also proven to be easy to integrate with existing IT processes.

CART – growing a Forest and picking the best tree CART which stands for Classification and Regression Trees, is a data exploration and prediction algorithm developed by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone Predictors are picked as they decrease the disorder of the data In building the CART tree, each predictor is picked based on how well it teases apart the records with different predictions.

CHAID Another popular decision tree technology is CHAID or Chi-Square Automatic Interaction Detector. CHAID is similar to CART, but it differs in the way that it chooses its splits.

What is Neural Network When data mining algorithms are talked about these days, people usually talk about either decision trees or neural networks. Of the two, neural networks have probably been of greater interest through the formative stages of data mining technology.

Are Neural Networks Easy to use? A common claim for neural networks is that they are automated so that the user does not need to know that much about how they work, about predictive modeling, or even about the database in order to use them There are many important design decisions that need to be made to effectively use a neural network, such as: How should the nodes in the network be connected? How many neuron-like processing units should be used? When should ‘training’ be stopped in order to avoid overfitting?

Applying Neural Networks to Business Neural Networks are very powerful predictive modeling techniques, but some of the power comes at the expense of ease-of-use and ease-of deployment The model itself is represented by numeric value in a complex calculation that requires all of the predictor values to be in the form of a number The output of the neural network is also numeric and needs to be translated if the actual prediction value is categorical. For example, predicting the demand for blue, white, or black jeans for a clothing manufacturer requires that the predictor values blue, black, and white for the predictor color be converted to numbers.

Applying Neural Networks to Business (cont’d) The neural network model have been successfully addressed in the following two ways: The neural network is packaged up into a complete solution such as fraud prediction The neural network is packaged up with expert consulting services.

What does a neural network look like A neural network is loosely based on the way some people believe that the human brain is organized and how it learns. There are two main structures of consequence in the neural network: The node, which loosely corresponds to the neuron in the human brain The link, which loosely corresponds to the connections between neurons (axons, dendrites, and synapses) in the human brain.

What does a neural network look like (cont’d)

Rule Induction Rule induction is one of the major forms of data mining and is perhaps the most common form of knowledge discovery in unsupervised learning systems. It also perhaps the form of data mining that most closely resembles the process that most people think about when they think about data mining, namely ‘mining’ for gold through a vast database.

What to do with Rule When the rules are mined out the database, the rules can be used either for understanding better the business problems that the data reflects or for performing actual prediction target. Because there is both a left side and right side to a rule (antecedent and consequence) they can be used in several ways in your business: Target the antecedent Target the Consequent Target based on accuracy Target based on coverage Target based on ‘Interestingness’

Rules versus Decision Trees Decision trees also produce rules, but in a very different way than rule induction systems. The main difference between the rules that are produced by decision trees and rule induction systems is as follows: Decision trees produce rules that are mutually exclusive and collectively exhaustive with respect to the training database Rule induction systems produce rules that are not mutually exclusive and might be collectively exhaustive

Another commonality between decision trees and rule induction systems One other thing that decision trees and rule induction systems have in common is the fact that they both need to find ways to combine and simplify rules In a decision tree, this can be as simple recognizing that if a lower split on predictor is more constrained than a split on the same predictor, both don’t need to be provided to the user – only the more restrictive one. Rules from rule indication systems are generally created by taking a simple high-level rule, and then adding new constraints to it until the coverage gets to small so it is not meaningful.