Literal and ProRulext: Algorithms for Rule Extraction of ANNs Paulemir G. Campos, Teresa B. Ludermir {pgc,

Slides:



Advertisements
Similar presentations
Rule extraction in neural networks. A survey. Krzysztof Mossakowski Faculty of Mathematics and Information Science Warsaw University of Technology.
Advertisements

Prachi Saraph, Mark Last, and Abraham Kandel. Introduction Black-Box Testing Apply an Input Observe the corresponding output Compare Observed output with.
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
Machine Learning Neural Networks
Soft computing Lecture 6 Introduction to neural networks.
Computer Intelligence and Soft Computing
Decision Support Systems
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers Aparna S. Varde April 28, 2005 CS539: Machine Learning Course Instructor:
Chapter 6: Multilayer Neural Networks
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Multilayer Perceptron Classifier Combination for Identification of Materials on Noisy Soil Science Multispectral Images Fabricio A.
Soft Computing Colloquium 2 Selection of neural network, Hybrid neural networks.
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
Soft Computing Lecture 20 Review of HIS Combined Numerical and Linguistic Knowledge Representation and Its Application to Medical Diagnosis.
Artificial Neural Networks
Kumar Srijan ( ) Syed Ahsan( ). Problem Statement To create a Neural Networks based multiclass object classifier which can do rotation,
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
An Introduction to Artificial Intelligence and Knowledge Engineering N. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering,
Chapter 9 Neural Network.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Resistant Learning on the Envelope Bulk for Identifying Anomalous Patterns Fang Yu Department of Management Information Systems National Chengchi University.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Pattern Discovery of Fuzzy Time Series for Financial Prediction -IEEE Transaction of Knowledge and Data Engineering Presented by Hong Yancheng For COMP630P,
Fuzzy Systems Michael J. Watts
Akram Bitar and Larry Manevitz Department of Computer Science
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Artificial Intelligence, Expert Systems, and Neural Networks Group 10 Cameron Kinard Leaundre Zeno Heath Carley Megan Wiedmaier.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Hybrid Load Forecasting Method With Analysis of Temperature Sensitivities Authors: Kyung-Bin Song, Seong-Kwan Ha, Jung-Wook Park, Dong-Jin Kweon, Kyu-Ho.
Chapter 8: Adaptive Networks
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
CSC 562: Final Project Dave Pizzolo Artificial Neural Networks.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Stock market forecasting using LASSO Linear Regression model
Artificial Neural Networks for Data Mining. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-2 Learning Objectives Understand the.
The article written by Boyarshinova Vera Scientific adviser: Eltyshev Denis THE USE OF NEURO-FUZZY MODELS FOR INTEGRATED ASSESSMENT OF THE CONDITIONS OF.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
An Evolutionary Algorithm for Neural Network Learning using Direct Encoding Paul Batchis Department of Computer Science Rutgers University.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Neural network based hybrid computing model for wind speed prediction K. Gnana Sheela, S.N. Deepa Neurocomputing Volume 122, 25 December 2013, Pages 425–429.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Koichi Odajima & Yoichi Hayashi
Deep Feedforward Networks
Debesh Jha and Kwon Goo-Rak
Fuzzy Systems Michael J. Watts
network of simple neuron-like computing elements
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Introduction to Radial Basis Function Networks
Evolutionary Ensembles with Negative Correlation Learning
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

Literal and ProRulext: Algorithms for Rule Extraction of ANNs Paulemir G. Campos, Teresa B. Ludermir {pgc,

Presentation Summary 1. Introduction 2. Literal and ProRulext 3. Experiments 4. Results 5. Discussions 6. Conclusions Acknowledgements References

1. Introduction Main Features of Artificial Neural Networks (ANN): Excellent capacity for generalization; It have been applied with success to solve several problems the actual world; It represents the domain knowledge in topology, weight values and bias; And, explaining clearly your answers is not available promptly (main deficiency).

1. Introduction Usually this deficiency can be minimized through the “IF/THEN” Rule Extraction from the trained network (ANN + Rule Extraction). However, exist others hybrid models for this aim, such as, Evolutionary Algorithms and Neuro-Fuzzy Systems.

1. Introduction This paper presents two algorithms for extraction of rules from trained networks: Literal and ProRulext. The Literal has as a differential to be portable. The ProRulext has a relatively low computational cost in the rules extraction from feedforward MLP networks with one hidden layer.

2. Literal and ProRulext Literal: Is a very simple algorithm proposed for the extraction of “IF-THEN” propositional rules from trained networks applied to problems of pattern classification and time series forecast; The rules are extracted through a literal mapping of the network input and output; This approach is a Pedagogical Technique (Andrews et al [2] Taxonomy).

2. Literal and ProRulext Overview of the Literal Algorithm: 1. Make discrete the network inputs and outputs in intervals with the same width; 2. Normalize the patterns of the training set of network for values within [0;1] or [-1;1]; 3. Present each one of these normalized input patterns to the trained network obtaining the respective rule consequents;

2. Literal and ProRulext Overview of the Literal Algorithm (to continue): 4. De-Normalize the rule antecedents and consequents previously obtained for original values of the database; 5. Store the new rules created in the previous steps in a file; 6. Select the input attribute with more frequent contents through the conclusion of the rules;

2. Literal and ProRulext Overview of the Literal Algorithm (to continue): 7. Eliminate the other attributes of each one of these rules, guaranteeing more general rules; 8. Eliminate the redundant rules that can be obtained after the execution of steps 6 and 7;

2. Literal and ProRulext Overview of the Literal Algorithm (to continue): 9. Calculate the coverage of the training set of each resultant rule through conclusion, based on the number of activations of theses rules; 10. Exclude the rules with 0% coverage of the patterns used in the training of the network from which the rules have been extracted originally.

2. Literal and ProRulext ProRulext: Is the other algorithm proposed in this paper for the extraction of “IF-THEN” propositional rules from MLP networks with one hidden layer trained to pattern classification and time series forecast;

2. Literal and ProRulext ProRulext (to continue): The rules are extracted by using a decompositional method to obtain its antecedents and by applying a pedagogical approach to determine the consequents; This approach is a Eclectic Technique (Andrews et al [2] Taxonomy).

2. Literal and ProRulext Overview of the ProRulext Algorithm: 1. Make it discrete the network inputs and outputs in intervals with the same width; 2. Normalize the network input and output patterns of the training set for values within [0;1] or [-1;1]; 3. Present each one of these input patterns to the trained network;

2. Literal and ProRulext Overview of the ProRulext Algorithm (to continue): 4. Build the AND/OR graph of the trained network considering only its positive weights; 5. Determine the antecedents of the rules through the decompositional method; 6. Apply a pedagogical approach to find the consequents of these rules;

2. Literal and ProRulext Overview of the ProRulext Algorithm (to continue): 7. De-Normalize the rule antecedents and consequents previously obtained for original values of the database; 8. Store the new rules created in the previous step in a file; 9. Select an input attribute with more frequent contents through conclusion of rules;

2. Literal and ProRulext Overview of the ProRulext Algorithm (to continue): 10. Eliminate the other attributes of each one of these rules, guaranteeing more general rules; 11. Eliminate the redundant rules which can be obtained through the execution of steps 9 and 10;

2. Literal and ProRulext Overview of the ProRulext Algorithm (to continue): 12. Calculate the coverage of the training set of each resulting rule through conclusion, based on the number of activations of these rules; 13. Erase the rules with 0% of coverage of the patterns used in the training of the network from which the rules have been extracted originally.

2. Literal and ProRulext It is valid to emphasize that both algorithms presented have rule simplification stages (the last five steps of Literal and ProRulext). This way it can be assured the acquisition of concise and legible rules from trained network for pattern classification and time series forecast.

3. Experiments The trained networks and the respective sets of rules have been generated through the AHES (Applied Hybrid Expert System) version ‘ ’ [4].

3. Experiments The models implemented in the AHES are feedforward MLP networks with one hidden layer and the rule extraction techniques: BIO-RE [11], Geometrical [7], NeuroLinear [10], Literal [5] and ProRulext [4].

3. Experiments - Databases In a problem of patterns classification, it will be used a database about Breast Cancer from the Proben1 repository [6]. This base contains 699 cases, among which 458 are related to benign Breast Cancer and 241 to malignant Breast Cancer, each one with 10 attributes more the Breast Cancer class.

3. Experiments - Databases For the time series forecast problem it will be used a database with the Index of the Stock Market of São Paulo (IBOVESPA) [6]. The series predicted in this work will be of minimum with a total amount of 584 patterns.

3. Experiments - Databases Before the experiments those bases have been submitted to pre-processing stages [6]. Thus, the Breast Cancer database remained with 457 cases, 219 benign and 238 malignant. The IBOVESPA database has the size of the time window indicated equal to two and the number of patterns has become 582.

3. Experiments - Databases Furthermore, the databases have been normalized to values belonging to the interval [0; 1] or [-1; 1] (depending on the activation function used) before the stages of training and rule extraction from each trained networks.

3. Experiments – The Trained Networks The MLP networks have been trained according to the Holdout methodology. Thus, each training set contains 2/3 of the total normalized input and output patterns. On the other hand, each test set has the remaining 1/3 of the patterns.

3. Experiments – The Trained Networks Fixed parameters during the training stage of the networks obtained with the Breast Cancer database: Method of weight adjusting per epochs or batch; Choice of the fixed initial weights among values within the interval [-0.1; 0.1]; Moment term equal to ‘0.1’, number of epochs equal to 100 and output maximum error desired equal to ‘0.01’.

3. Experiments – The Trained Networks Fixed parameters during the training stage of the networks obtained with the IBOVESPA database: Method of weight adjusting per pattern or on-line; Choice of the fixed initial weights among values belonging to the interval [-0.1; 0.1]; Without moment term; number of epochs equal to 100 and output maximum error desired equal to ‘0.01’.

3. Experiments – The Trained Networks Variable parameters during the training stage of the networks obtained with the Breast Cancer and IBOVESPA databases: Number of units of the hidden layer (1, 3 and 5); Learning rate (0.1; 0.5; 0.9); Use or not of bias; And, kinds of non-linear activation functions (sigmoid and hyperbolic tangent).

3. Experiments – The Trained Networks Trained networks selected using Breast Cancer database: where: CM1 Network – CM_Tan_NE9_Bias_4; CM2 Network – CM_Sig_NE9_Bias_1.

3. Experiments – The Trained Networks Trained networks selected using IBOVESPA database: where: IB1 Network – IBOVESPA_Sig_Bias_2; IB2 Network – IBOVESPA_Tan_4; MAE – Mean Absolute Error.

3. Experiments – Extracting Rules ProRulext algorithm: Limits of the “IF part” using the two database: ‘0.1’, ‘0.5’ and ‘0.9’; Limits of the “THEN part” using the Breast Cancer database: ‘0.1’, ‘0.5’ and ‘0.9’; And, limits of the “THEN part” using the IBOVESPA database: ‘0.1’, ‘0.5’ and ‘0.8’, because with ‘0.9’ no rule has been obtained.

3. Experiments – Extracting Rules Literal and ProRulext Algorithms: Quantity of intervals to make discrete numerical input and output attributes of the two databases: 2 (two) This to obtain sets of rules as much compact as possible.

3. Experiments – Extracting Rules Examples of extracted rules by Literal from CM2 Network (Breast Cancer)

3. Experiments – Extracting Rules Examples of extracted rules by ProRulext from IB1 Network (IBOVESPA)

3. Experiments – Extracting Rules It was also obtained sets of rules with the BIO-RE (Bio) [11], Geometrical (Geo) [7] and NeuroLinear (Neuro) [10] techniques. It has been done for comparison among the results obtained with these techniques and the ones presented by Literal and ProRulext.

4. Results The best results of the sets of extracted rules from trained networks with Breast Cancer database where: Sig – Sigmoid, Tan – Hyperbolic Tangent, Irr – non relevant (Sig or Tan)

4. Results The best results of the sets of extracted rules from trained networks with IBOVESPA database where: Sig – Sigmoid, Tan – Hyperbolic Tangent, Irr – non relevant (Sig or Tan)

5. Discussions The results using Breast Cancer database indicate that the BIO-RE technique [11] has obtained sets of more concise, comprehensible and faithful rules, because the antecedents of the rules extracted by the Geometrical approach [7] are hidden units, what damages its legibility.

5. Discussions The Literal and the ProRulext algorithms have presented performance compatible with the one obtained with the NeuroLinear technique, mainly recognized for extracting very faithful, compact and legible rules.

5. Discussions However, the NeuroLinear was the most expensive computational method. And the BIO-RE and Literal techniques have not been affected by the kind of activation function used in the network training.

5. Discussions By analyzing results obtained with IBOVESPA database, can be concluded that all the investigated approaches, except by the Geometric technique, have offered the acquisition of sets of rules that are very concise, legible and faithful to the networks from which they have been obtained.

5. Discussions It is important to mention that Literal and ProRulext do not have the disadvantages presented by the other methods investigated. Besides, the algorithms proposed in this paper extract very expressive rules, as already illustrated.

6. Conclusions It has been observed that Literal and ProRulext algorithms presented performance similar to the NeuroLinear, obtaining sets of rules that are concise, legible and faithful to the networks from which they have extracted, also with a lower computational cost and applicable to trained networks for pattern classification and time series forecast.

6. Conclusions BIO-RE has obtained optimal rule sets, but it is only applicable to binary data or when the conversion to this type does not significantly affect the network performance [11].

6. Conclusions Thus, as Literal and ProRulext do not have that limitation, these new approaches appear as efficient alternatives for the rule extraction from trained networks to justify the inferred outputs.

Acknowledgements The authors thanks to CNPQ and CAPES (Brazilian Government Research Institutes) for financial support to development this research.

References [1] R. Andrews and S. Geva, “Rule Extraction from Local Cluster Neural Nets”, Neurocomputing, vol. 47, 2002, pp [2] R. Andrews, A. B. Tickle and J. Diederich, “A Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks”, Knowledge-Based Systems, vol. 8, n. 6, 1995, pp. 373–389. [3] B. Baesens, R. Setiono, C. Mues and J. Vanthienen, “Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation”, Management Science, vol. 49, 2003, pp

References [4] P. G. Campos, “Explanatory Mechanisms for ANNs as Extraction of Knowledge”, Master Thesis, Federal University of Pernambuco, Brazil, 2005 (In Portuguese). [5] P. G. Campos and T. B. Ludermir, “Literal – A Pedagogical Technique for Rules Extraction of ANNs”, V ENIA – Brazilian Conference of Artificial Intelligence, São Leopoldo-RS, 2005, pp (In Portuguese). [6] P. G. Campos, E. M. J. Oliveira, T. B. Ludermir and A. F. R. Araújo, “MLP Networks for Classification and Prediction with Rule Extraction Mechanism”, Proceedings of the International Joint Conference on Neural Networks, Budapest, 2004, pp

References [7] Y. M. Fan and C. J. Li, “Diagnostic Rule Extraction from Trained Feedforward Neural Networks”, Mechanical Systems and Signal Processing, vol. 16, n. 6, 2002, pp [8] Y. Hayashi, R. Setiono and K. Yoshida, “A Comparison Between Two Neural Network Rule Extraction Techniques for the Diagnosis of Hepatobiliary Disorders”, Artificial Intelligence in Medicine, vol. 20, n. 3, 2000, pp [9] T. B. Ludermir, A. C. P. L. F. Carvalho, A. P. Braga et al, “Hybrid Intelligent Systems”, In: S. O. Rezende (Organizer), Intelligent Systems: Foundations and Applications, Manole, Barueri, 2003, pp (In Portuguese).

References [10] R. Setiono, H. Liu, “NeuroLinear: From Neural Networks to Oblique Decision Rules”, Neurocomputing, vol. 17, 1997, pp [11] I. A. Taha, J. Ghosh, “Symbolic Interpretation of Artificial Neural Networks”, IEEE Transactions on Knowledge and Data Engineering, vol. 11, n. 3, 1999, pp