By Adeyemo O.O. ,Adewole A.P, Ogunbiyi T.D, Oni Samson.

Slides:



Advertisements
Similar presentations
Dept of Biomedical Engineering, Medical Informatics Linköpings universitet, Linköping, Sweden A Data Pre-processing Method to Increase.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Clinical decision support system (CDSS). Knowledge-based systems Knowledge based systems are artificial intelligent tools working in a narrow domain to.
Introduction to Research Methodology
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Chapter 15 Application of Computer Simulation and Modeling.
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Data Mining Techniques Outline
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Ensemble Learning: An Introduction
Tree-based methods, neutral networks
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Building Knowledge-Driven DSS and Mining Data
On the Application of Artificial Intelligence Techniques to the Quality Improvement of Industrial Processes P. Georgilakis N. Hatziargyriou Schneider ElectricNational.
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
Chapter 5 Data mining : A Closer Look.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
Data Mining Techniques
Section 2: Science as a Process
Magister of Electrical Engineering Udayana University September 2011
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Soft Computing Lecture 20 Review of HIS Combined Numerical and Linguistic Knowledge Representation and Its Application to Medical Diagnosis.
Breeding Decision Trees Using Evolutionary Techniques Papagelis Athanasios - Kalles Dimitrios Computer Technology Institute & AHEAD RM.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Chapter 9 – Classification and Regression Trees
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Patterns of Event Causality Suggest More Effective Corrective Actions Abstract: The Occurrence Reporting and Processing System (ORPS) has used a consistent.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Ensemble Classifiers.
DECISION TREES An internal node represents a test on an attribute.
Rule Induction for Classification Using
Chapter 6 Classification and Prediction
Section 2: Science as a Process
Hybrid Features based Gender Classification
Classification and Prediction
©Jiawei Han and Micheline Kamber
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

By Adeyemo O.O. ,Adewole A.P, Ogunbiyi T.D, Oni Samson. PREDICTION $ CLASSIFICATION CAPABILITIES OF DECISION TREE ALGORITHMS IN MODELLING By Adeyemo O.O. ,Adewole A.P, Ogunbiyi T.D, Oni Samson.

ABSTRACT Decision tree is a data mining technique that can accurately classify data and make effective predictions, it has been successfully employed for data analyses as a comprehensible knowledge representation in a broad range of fields such as customer relationship management, engineering, medicine, agriculture, computational biology, business management, fraudulent statement detection.

In this paper, we provide a review of research publications that have explored the accuracy of the prediction and classification capabilities of decision tree to develop data mining model in comparison with several other algorithms in different application domains ,this will enable researchers to have a general overview of knowledge gap in decision tree data mining algorithm. Data mining takes advantage of the large set of data that is available to carry out prediction and classification activities , So we used data consisting of records of Heart disease patients that have been gathered over the years and data mining processes is performed on them using Decision Tree, an approach to achieving data mining.

INTRODUCTION Decision tree is a classification and prediction tool, it is used widely because knowledge discovered from it in illustrated in a hierarchical structure which makes it to be easily understood by people who are not experts in data mining.

It is a predictive modeling based technique developed by Rose Quinlan. It is a sequential classifier in the form of recursive tree structure. The data set in decision tree is analyzed by developing a branch like structure with appropriate decision tree algorithm. Each internal node of tree splits into branches based on the splitting criteria. Each test node denotes a class. Each terminal node represents the decision. They can work on both continuous and categorical attributes. Manpreet Singh et. al. (2013).

RESEARCH OBJECTIVES Adopting a fast and reliable means of predicting or detecting heart disease which is a disease that has claimed several lives in Nigeria, Africa and the World at large disease so that it will be possible to eradicate it. With the use of a decision making system that implements Decision Tree (which predictive capability in the heart disease prediction and some other domain is critically reviewed in this paper), heart disease could be eradicated or reduced to a very minimal level in Nigeria.

PROCESSES OF DEVELOPING A DECISION TREE MODEL TREE GROWING The initial stage of creating a decision tree model is tree growing, which includes two steps: tree merging and tree splitting. Tree merging : The non-significant predictor categorizes and the significant categories within a dataset are grouped together. Tree splitting: To remove the impurities within the model (which increases as the tree grows and may result in reducing the accuracy of the model) into different leaves Mutasem Sh. Alkhasawneh et.al, (2012)

TREE PRUNING To remove irrelevant splitting nodes. The removal of irrelevant nodes can help reduce the chance of creating an over-fitting tree. Such a procedure is particularly useful because an over-fitting tree model may result in misclassifying data in real world applications. Mutasem Sh. Alkhasawneh et.al, (2012) TREE SELECTION The final stage of developing a decision tree model is tree selection. At this stage, the created decision tree model will be evaluated by either using cross-validation or a testing dataset. This stage is essential as it can reduce the chances of misclassifying data in real world applications, and consequently, minimize the cost of developing further applications. Mutasem Sh. Alkhasawneh et.al, (2012)

DECISION TREE ALGORITHMS The different decision tree algorithms are ID3 C4.5 C5.0 CHAID CART.

ALGORITHM FOR DECISION TREE INDUCTION BASIC ALGORITHM (A GREEDY ALGORITHM) - Tree is constructed in a top-down recursive divide-and-conquer manner -At start, all the training examples are at the root -Attributes are categorical (if continuous-valued, they are discretized in advance) -Examples are partitioned recursively based on selected attributes -Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) CONDITIONS FOR STOPPING PARTITIONING -All samples for a given node belong to the same class -There are no remaining attributes for further partitioning –majority voting is employed for classifying the leaf. -There are no samples left Jiawei Han, (2006)

DECISION TREE APPLICATIONS Decision tree has been used to develop models for prediction and classification in different domains some of which are Business Management , Customer Relationship Management, Fraudulent Statement Detection, Engineering, Energy Consumption, Fault Diagnosis, Healthcare Management , Agriculture as explained in the studies below.

CLASSIFICATION Decision trees algorithm used for classification in different domains independently and also in combination with other algorithms by different researchers are discussed below: Mohd Najwadi Yusoff and Aman Jantan, 2011 Proposed the usage of Genetic Algorithm (GA) as an approach to optimize Decision Tree (DT) in malware classification in comparison with Current techniques in malware classification.New classifier was developed by combining GA with DT and named Anti-Malware System (AMS) Classifier in order to classify unique type of malware.Their result shows AMS Classifier shows an accuracy increase from 4.5% to 6.5% from DT Classifier.

Baisen Zhang Tillman, Russ 2007 investigated the potential of a decision tree approach for modelling NFUE(Nitrogen fertilizer Use Efficiency) in New Zealand pastures. The researchers validated their models for 11 of the 16 trials tested with a predictive accuracy of 69%.

D.Senthil Kumar Et al, in their research focused on the aspect of Medical diagnosis by learning pattern through the collected data of diabetes, hepatitis and heart diseases and to develop intelligent medical decision support systems to help the physicians, they proposed the use of decision trees C4.5 algorithm, ID3 algorithm and CART algorithm to classify these diseases and compare the effectiveness, correction rate among them. Abolfazl Kazemia ET. Al, 2011 researched the use of “CHIAD”, “CRT”, “QUEST” and “C5.0” Decision Tree algorithm to help organizations determine the criteria needed for the identification of potential customers in the competitive environment of their business. The tree obtained based on C5.0 algorithm provided the most optimal variable and decision tree by 83.96% accuracy which is closer to field results used for the comparison and performs better in action.

Baisen Zhang Tillman, Russ 2007 investigated the potential of a decision tree approach for modelling NFUE(Nitrogen fertilizer Use Efficiency) in New Zealand pastures. . It was concluded that this type of modelling approach can be used to predict NFUE and thereby to assist decisions on when and where to apply N fertilizer in pastures for increasing productivity while reducing the environmental impact. Abishek Suresh, Et. Al. Investigated the application of decision tree models for the formation of protein homodimer complexes for molecular catalysis and regulation. The decision tree model produced positive predictive values (PPV) of 72% for 2S, 58% for 3SMI and 57% for 3SDI in cross validation. It was thus concluded that the method finds application in assigning homodimers with folding mechanism.

Majoobi , J , 2007 studied the performances of Decision trees classification for prediction of wave parameters which are necessary for many applications in coastal and offshore engineering. According to the researchers several and various prediction models have been proposed in the literature for this purpose, decision tree models was found to give a better accuracy. Wang Wei, 2012, In his study, used decision tree to classify image classification, which was established based on the analysis of the spectrum characteristics, the texture characteristics and other auxiliary information, such as NDVI, NDBI and topography characteristics. The result of their study indicated that the accuracy of decision tree classification was 4.06% higher than that of the maximum likelihood classification and Kappa coefficient was increased by 5.61%.

Kuldeep Kumar, Et. Al 2006 in their study discussed the effectiveness of using decision trees for classification in mammography. The results obtained using algorithms based on decision trees were compared with that produced by neural network and decision tree was reported to have higher classification rate. Micheal D Twa, 2011 described the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way in other to solve with the aim of providing solution to the challenge of interpretation of volume and complexity of data produced during videokeratography examinations. . In their research the proposed method was compared with other known classification methods and decision tree classifier performed equal to or better than the other classifiers tested.

Gregor Stiglic, ET. Al. 2012, in their research, presented an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The results demonstrate a significant increase of accuracy in fewer complexes visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, higher accuracy gains were observed in bioinformatics datasets. Peng Du, Ding Xiaoqing 2008, in their research presented a method based on decision tree classifier to identify the gender of a person. . The result of their research shows that the performance of decision tree classifier is superior to the ordinary classifier.

Felipe Lirra ,2013 in their research developed a decision tree model, which indicated the action range of peptides on the types of microorganisms on which they can exercise biological activity in other to assist in the recent attempts to find effective substitutes to combat infections that have been directed at identifying natural antimicrobial peptides in order to circumvent resistance to commercial antibiotics. ). The results of their study showed that the use of decision trees to evaluate the antimicrobial activity of synthetic peptides enables the creation of more effective models for use in the development of new drugs.

PREDICTION Decision trees algorithm used for prediction in different domains independently and also in combination with other algorithms by different researchers are discussed below: Jay Gholap, 2013 used attribute selection and boosting meta-techniques to tune the performance of J48 decision tree algorithm on the large amounts of data that are harvested along with the crops in predicting the soil fertility class since achieving and maintaining appropriate levels of soil fertility. J48 gives accuracy of 96.73% which makes a good predictive model in predicting the soil fertility in agriculture.

Mohammad Taha Khan ET. Al Mohammad Taha Khan ET. Al. 2012 primarily researched the application of two decision tree algorithms C4.5 and the C5.0 was used for breast cancer as well as heart disease prediction. Over running the dataset of breast cancer of 400 records C4.5 shows 5 train error whereas C5.0 show only 3 train errors. C5.0 produces rules in a very easy readable form but C4.5 generates the rule set in the form of a decision tree. Yoshikazu Goto, ET. Al. 2010 in their study developed a simple and generally applicable bedside model for predicting outcomes after cardiac arrest (OHCA). This simple prediction model may provide clinicians with a practical bedside tool for the OHCA patient’s stratification in the emergency department.

Atul Kumar Pandey ET. Al 2013 studied the comparison of Pruned J48 Decision Tree with Reduced Error Pruning Approach prediction model against simple pruned and unpruned approach using for classifying heart disease based on clinical data of patients and also developed a heart disease prediction model that can assist medical professionals in predicting heart disease status based on these clinical features. the result obtained it was discovered that fasting blood sugar is the most important attribute which gives better classification against the other attributes but its gives not better accuracy.

A. R. Senthil kumar, ET. Al.2013 Investigated the performance of soft computing techniques in modeling qualitative and quantitative water resource variables such as stream flow. It was found that REPtree(decision tree) model performed well compared to other soft computing techniques such as MLR, ANN, fuzzy logic, and M5P. B.S. ZHANG, ET. Al. 2004 applied Decision tree models to predict annual and seasonal pasture production and investigated the interactions between pasture production and environmental and management factors in the North Island hill country. . The decision tree models for annual, spring, summer, autumn and winter pasture production correctly predicted 82%, 71%, 90%, 88% and 90 % of cases in the model validation.

Sevgi Zeynep Dogan, ET. Al., 2008 In their study compared the performance of three different decision-tree-based methods of assigning attribute weights to be used in a case-based reasoning (CBR) prediction model. The study compares the impact of attribute weights generated by three different methods and, hence, highlights the fact that the prediction rate of models such as CBR largely depends on the data associated with the parameters used in the model. Bark Cheung Chiu ET. Al. 2013 adopted the used of Input-Output Agent Modelling (IOAM) which is an approach to modelling an agent in terms of relationships between the inputs and outputs of the cognitive system together with a leading inductive learning algorithm, C4.5 to build a subtraction skill modeller, C4.5-IOAM. Experimental results from their investigation shows in the domain of modelling elementary subtraction skills, showed that the tree quality and the leaf quality of a decision path provided valuable references for resolving contradicting predictions and a single tree model representation performed nearly equally well to the multi-tree model representation.

Middendorf et al. used alternating decision trees to predict whether an S. cerevisiae gene would be up- or down regulated under particular conditions of transcription regulator expression given the sequence of its regulatory region. In addition to good performance predicting the expression state of target genes, they were able to identify motifs and regulators that appear to control the expression of the target genes. Lee S, Park I. 2013 in their study, analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events.

Heiko Milde, ET. Al 1999, In their research, introduced the MAD system which generates decision trees based on a new method for qualitative electrical circuit analysis. In particular, their new approach towards qualitative reasoning about faults in electrical circuits has reached a level of achievement so that it can be utilized to generate diagnosis systems employed in industry.  SMITHA.T, DR.V.SUNDARAM 2012 studied the application of ID3 algorithm to build a decision tree model to predict the chances of occurrences of disease in an area by identify the significant parameters for prediction process. 95% of the prediction accuracy was achieved employing the decision tree classification model in the research which made the researchers conclude that mostly female inhabitant with a hereditary history living in a poor environment condition and having an average age of greater than 35 is suffering the disease.

Methodology In this research, decision tree algorithm ID3 (Iterative Dichotomized 3) was used. These classification algorithm was selected because it have potential to yield good results in prediction and classification applications.

Heart Disease Data Record set with medical attributes was obtained online from a Hospital. With the help of the dataset, the patterns significant to the heart attack prediction are extracted using the developed ID3 Datamining model. The records were split equally into two datasets: training dataset and testing dataset. To avoid bias, the records for each set were selected randomly. The data include values for the following:

Heart Disease Predictor Interface

The result page shows result of the prediction which can either be Heart disease Present or Absent

Results A decision tree is a flowchart-like structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label (decision taken after computing all attributes). A path from root to leaf represents classification rules. The java program consists of several packages but ID3 Logic is the package that does the main work. The system has been built into a jar file which once double-clicked on a system with java run time.

CONCLUSION Decision tree has been found useful in classification and prediction modeling due to the fact that it can capability to accurately discover hidden relationships between variables, it is capable of removing insignificant attributes within a dataset. Twenty One studies published between 1999 and 2014 in more than three application domains have been studied in this research and met the minimum criteria for inclusion in our literature review. Decision tree-a data mining model developed and employed in this research was used in predicting the existence of heart disease in any diagnosed patient which has provided a solution that helps remove the bottleneck at hospitals. It also provides a means of giving an idea of the possible heart disease status of a patient without carry out laboratory test simply by using the symptoms being felt by the patient. Interestingly, anybody can make use of the system since training of the system is required just once for any particular data set.