Fuzzy Machine Learning Methods for Biomedical Data Analysis

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.
Rule extraction in neural networks. A survey. Krzysztof Mossakowski Faculty of Mathematics and Information Science Warsaw University of Technology.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Feature selection and transduction for prediction of molecular bioactivity for drug design Reporter: Yu Lun Kuo (D )
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Fuzzy rule-based system derived from similarity to prototypes Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Poland School.
Three kinds of learning
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Classification.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Data Mining Chun-Hung Chou
Data Mining: Classification
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM) Dr. Bernard Chen Assistant Professor Department of Computer.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
The Broad Institute of MIT and Harvard Classification / Prediction.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Gene-Markers Representation for Microarray Data Integration Boston, October 2007 Elena Baralis, Elisa Ficarra, Alessandro Fiori, Enrico Macii Department.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Consensus Group Stable Feature Selection
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining and Decision Support
Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
University of Georgia 1 Yanqing Zhang Department of Computer Science Georgia State University Atlanta, GA
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
An Artificial Intelligence Approach to Precision Oncology
Trees, bagging, boosting, and stacking
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Basic machine learning background with Python scikit-learn
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Lecture 16. Classification (II): Practical Considerations
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Fuzzy Machine Learning Methods for Biomedical Data Analysis Yanqing Zhang Department of Computer Science Georgia State University Atlanta, GA 30302-5060 yzhang@gsu.edu 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Outline Background Fuzzy Association Rule Mining for Decision Support (FARM-DS) FARM-DS on Medical Data FARM-DS on Microarray Expression Data Fuzzy-Granular Gene Selection on Microarray Expression Data Conclusion and Future Work This is the outline of my presentation. Firstly, I will briefly overview the background knowledge. After introduction of the proposed fuzzy association rule mining for decision support system, the experiment results on biomedical datasets will be presented. Because I have reported this part in my proposal, today we will focus on microarray data analysis by applying fuzzy-granular based methods including FARM-DS. Finally, I will summarize this presentation and discuss the future work. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Background Theory Computational Intelligence, Granular Computing, Fuzzy Sets Knowledge Discovery and Data mining (KDD) Decision Support system (DS) Rule-Based Reasoning (RBR), Association Rule Mining Application Bioinformatics, Medical Informatics, etc. Concern Accuracy Interpretability In the last decade, with the advent of genomic and proteomic technologies, more and more biomedical databases have been created and have been growing very fast. The general target of my research is intelligent data analysis with hybrid Computational Intelligence techniques including Fuzzy Sets, Granular Computing, Clustering, Association Rule Mining to extract knowledge from these databases to ease biomedical decision-making process. An effective DSS is expected to be both accurate and easy to interpret. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Outline Background Fuzzy Association Rule Mining for Decision Support (FARM-DS) FARM-DS on Medical Data FARM-DS on Microarray Expression Data Fuzzy-Granular Gene Selection on Microarray Expression Data Conclusion and Future Work So now I will quickly go through the algorithm design part of FARM-DS. 2017/4/25 Yan-Qing Zhang, Georgia State University

Motivation – deal with numeric data Traditional Association rule mining algorithm If X, then Y Conf = Pr(Y|X) Supp = Pr(X and Y) don’t work on numeric data Fuzzy Logic Feature transform Fuzzy AR mining (Zadeh, 1965) Basically, association rules identify feature subsets that are statistically related in the underlying data. An association rule is of the form “IF X, THEN Y” where X and Y are disjoint conjunctions of feature-value pairs. The confidence of the rule is the conditional probability of Y given X, Pr(Y|X), and the support of the rule is the prior probability of X and Y, Pr(X and Y). Here probability is taken to be the observed frequency in the data set. However, traditional AR mining algorithms can only handle datasets with categorical features. To extend them for discovering correlations in numeric data, it is natural to use the idea of fuzzy logic to split a numeric feature into discrete fuzzy sets for feature transformation. After that, traditional AR mining algorithms can work on the dataset after transformation. A lot of works have been conducted for fuzzy association rule mining under this basic idea. 2017/4/25 Yan-Qing Zhang, Georgia State University

Motivation – decision support FARs for classification Accuracy vs. Interpretability Very Few works Hu et al. 2002 Combinatorial rule explosion Chatterjee et al. 2004 Human intervention However, most of AR or FAR mining algorithms are used to describe, or interpret correlations inside a dataset. At another end of the whole story, humans always need to make decisions in the real world. The simplest decisions humans need to make is binary classification: For example, given a sample, we may need to decide it is good or bad. State-of-the-art classifiers, such as SVMs, Neural Nets, although demonstrate high classification accuracy, are well known as “black-boxes”. How they classify or predict a sample is hard to understand by human, and hence can not provide effective decision support Because association rules are easy to understand, it is promising to go one step beyond data description. That is, to utilize these ARs or FARs for decision support to help human experts to make decisions, preconditioned that predictions based on these rules are accurate enough. As far as we know, very few works have been conducted in this promising research field. Hu proposed a FARM system in 2002. However, their system faces combinatorial rule explosion, and hence it can not handle data with a high dimensional feature space. Chatterjee also designed another FARM system in 2004. One shortcoming is that some parameters in the system need to be predefined by humans based on experience. However, it is usually difficult for a human to accurately estimate these parameters. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University FARM-DS Target Numeric data Binary classification Effectiveness Accuracy Interpretability Modeling process Training Testing To extract fuzzy association rules from numerical data, And to use these FARs to provide effective decision support for a binary classification problem, We proposed this FARM-DS system in this work. Notice that the effectiveness is evaluated by both accuracy and interpretability. The new FARM-DS system consists of two phases: the training phase and the testing phase. In the training phase, four steps are executed to extract fuzzy association rules. These FARs are thereafter used to predict unseen samples in the testing phase. 2017/4/25 Yan-Qing Zhang, Georgia State University

Step 1: Fuzzy Interval Partition 1-in-1-out 0-order TSK model Step 1 in the training phase is fuzzy interval partition. In this step, we build a very simple 1-in-1-out 0-order TSK fuzzy model for each feature. As a result, we split a numeric feature into multiple fuzzy intervals, which are represented with these simple fuzzy rules and corresponding fuzzy membership functions. Notice that the conclusion part of each rule is just the class label. Also notice that there are at least two fuzzy sets for each feature. Here is an example to split the feature into two fuzzy sets, with corresponding linguistic terms “low” and “high”: If the feature value is in the low fuzzy set, it is a negative sample; If it is high, the sample is postive. An ANFIS system will be used to find the optimal number of fuzzy sets and also find fuzzy MFs with the optimal shape and parameters with some validation heuristic. ANFIS for model optimization and parameter selection (Jang, 1993) 2017/4/25 Yan-Qing Zhang, Georgia State University

Step 2: Data Abstraction positive cluster Clustering K-Means Fuzzy C-means Validation #clusters Optimal cluster Silhouette Value At the same time as the 1st step, the second step is conducted for data abstraction. Here some clustering algorithms such as K-means, FCM, and self organization map will be utilized. The basic idea is to group similar samples into multiple clusters based on their patterns in the feature space. In our FARM-DS system, a cluster with more positive samples than negative samples is defined to be a positive cluster, and a cluster with more negative samples than positive samples is defined to be a negative cluster. Similar to the first step, some validation heuristic is used to decide the optimal number of clusters and the optimal clustering result. The optimization target is to maximize the overall silhouette value defined in this formula. Basically, a larger silhouette value means that samples in a same cluster are more similar and samples from different clusters are more different. (The silhouette value for a sample is a measure of how similar the sample is to samples in its own cluster compared with samples in other clusters, and ranges from -1 to +1.). After clustering, each cluster can be represented by some representative samples. For example, the center may be used to denote a cluster. As a result, a high-level data abstraction can be achieved. In this way, the number of transactions and following rules is independent with the dimension of the input feature space. It is only decided by the number of clusters to generate a compact rule base, which in turn enhances the generalization capability and the interpretability to predict unknown new samples. Currently, only k-means clustering algorithm for data abstraction. We can try other clustering algorithm such as fuzzy C-means, self organization map in near future. negative cluster 2017/4/25 Yan-Qing Zhang, Georgia State University

Step 3: Generating Fuzzy Discrete Transactions Project the center of each cluster on each feature Create transactions With positive cluster, +1 is inserted With negative cluster, -1 is inserted After fuzzy interval partition at step 1 for numeric data transformation, and clustering at step 2 for data abstraction, step 3 is executed to generate fuzzy discrete transactions. The idea is pretty straightforward: Given a cluster, with sk+ positive samples and sk- negative samples, | sk+ - sk-| same “fuzzy discrete transactions” will be created. If this is a positive cluster, 1 is inserted into these transactions. If this is a negative cluster, -1 is inserted. After that, the center of the cluster is projected onto each feature, if the difference between projections values on different fuzzy sets are not significant for feature Fi, it will not be inserted into these transactions. That means Fi is pruned in these transactions. The pruning process improves the interpretability of rules because short rules will be induced. Otherwise if the difference is significant enough (>=alpha), Fi will be inserted in the form of “Fi_1” or “Fi_0”. Currently, only two MFs for each feature at step 1 are considered. On each input feature fi, two membership values and are calculated for a center by projecting the center on the feature. Here shows an example of projecting a center with fi =0.113 on the trapezoidal membership functions. If there are multiple fuzzy sets for a feature, may replace alpha_cut with other more general operations such as max or sum ones 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Step 3 - example 5-2 = 3 transactions 1 f1_1 f2 Here is one example to show how to project a cluster onto two features to generate fuzzy discrete transactions. In this cluster, there are 5 positive samples and 2 negative samples. So this is a positive cluster, and 5 minus 2 is equal to 3 same transactions will be generated from this cluster. Each transaction will include an item “1”. After that, the center is projected onto two features F1 and F2. Because the difference between projection values on two fuzzy sets is not significant for F2, F2 will not be inserted into the transactions. For F1, projection value on “high” is significantly larger than projection value on “low”, F1 will be inserted into these transactions in the form of “F1_1”. The advantage is that we can overcome combinatorial rule explosion because the number of rules is not directly related to dimensionalities but decided by the number of clusters. f1 Avoid combinatorial rule explosion Number of different transactions are decided by number of clusters 2017/4/25 Yan-Qing Zhang, Georgia State University

Step 4: Association Rule Mining Association Rule Mining on fuzzy discrete transactions Traditional Apriori algorithm (Agrawal and Srikant 1994) If f1 is low, f2 is high, …, fh is low, then y=1/-1 Rule pruning: For a pair of rules A and B, if B is more specific than A (that means A is included by B), and B has the same support value as A, A is eliminated. A: If f1 is low, then y=1, sup=50% B: If f1 is low and f2 is high, then y=1, sup=50% In the first 3 steps, numeric data is transformed into fuzzy discrete transactions, and hence it is easy to utilize traditional AR mining algorithms such as Apriori algorithm proposed by Agrawal and Srikant in 1994. The association rules can be represented in this form, notice the number of features in an AR usually is smaller than the number of original features, because some features are not significant and hence be removed from the corresponding fuzzy transactions. To improve interpretability and to simplify the model, a rule pruning process is conducted: For a pair of … 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Testing Phase In the testing phase, the performance of fuzzy association rules is evaluated on the testing dataset. Assume that there are r plus positive rules and r minus negative rules. For each new sample, its positive weight weight+ is defined to be the sum of the firing strengths of all positive rules (1), negative weight is defined similarly (3) The firing strength of a rule is calculated by projecting the sample onto each feature in the rule and then calculating the activation difference on different fuzzy sets. Finally, a class label is calculated as the difference between positive weight and negative weight, and plus a bias, which can be optimized by cross validation. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Adaptive FARM-DS Train Fuzzy intervals partition Data abstraction Generate fuzzy discrete transactions AR mining Test As a summary, The new FARM-DS algorithm consists of two phases: the training phase and the testing phase. In the training phase, four steps are executed to mine fuzzy association rules. At step 1, a 1-in-1-out ANFIS system is used to generate fuzzy internals on each input feature. Each fuzzy interval is defined with a fuzzy membership function. At step 2, clustering is conducted for data abstraction to extract inherent data distribution knowledge. At step 3, FARM-DS naturally transforms quantitative samples into “fuzzy discrete transactions” by projecting the center of each cluster extracted at step 2 on the fuzzy intervals generated at step 1. Finally, at step 4, simple “IF-THEN” Fuzzy Association Rules can be mined from the “fuzzy discrete transactions” by the traditional Apriori association rule mining algorithm. These FARs are thereafter used to predict unseen samples in the testing phase. Notice that step 1 and step 2 can be executed independently in parallel. He, et al. 2006a, IJDMB 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Outline Background Fuzzy Association Rule Mining for Decision Support (FARM-DS) FARM-DS on Medical Data FARM-DS on Microarray Expression Data Fuzzy-Granular Gene Selection on Microarray Expression Data Conclusion and Future Work I’d like to skip this part as it has already been reported in my proposal. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Empirical Studies Classification algorithms C4.5 decision trees (Quinlan, 1993) Support vector machines (Vapnik, 1995) FARM-DS (He, et al. 2006a, IJDMB) Accuracy Estimation 5-folds cross validation Interpretability In this group of empirical studies, we compared the FARM-DS system with other two popular classifiers, c4.5 decision trees proposed by quinlan in 1993 and support vector machine proposed by vapnik in 1995. And the 5- folds cross validation heuristic approach has been used to evaluate the performance. we randomly split the original dataset into 5 equal size subset. four of these subsets are combined as the training dataset and another one is taken as the testing dataset. The training-testing process is repeated five times such that each subset is used as the testing dataset exactly once. And the average accuracy is reported as the accuracy of the system. After that, parameters with the best validation accuracy are used to extract FARs on the whole dataset for interpretability analysis. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Evaluation metrics Accuracy Classification Error Area under ROC curve (future work) Interpretability Rule numbers Average rule lengths We adopted multiple metrics to evaluate the performance. This evaluation focus on two aspects, the accuracy and interpretability. Now, we can use classification error metric and area under ROC curve metric to evaluate the accuracy. Although we did not do the area under ROC curve metric now, we will do it in near future. A smaller error and a larger AUC mean a more accurate classifier. The number of extracted rules and the average rule lengths are used to evaluate the interpretability. Intuitively, we believe a classifier is easy to interpret with the small number of short rules. Bradley, 1997 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Datasets The datasets used in this group are wisconsin breast cancer dataset and cleveland heart-disease dataset which are available from UCI repository of machine learning databases. Merz, et al. UCI repository of machine learning databases, 1998 2017/4/25 Yan-Qing Zhang, Georgia State University

Result analysis on Accuracy In this work, we run FARM-DS and SVM on these two datasets, and because we have the exactly same experiment conditions as bennett’s work in 1997, their results can directly compare with ours. Notice that the svm modeled by us is called svm1, and the svm modeled by bennett is called svm2. and the results demonstrate that FARM-DS have almost the same accuracy as the optimal SVM, and the higher accuracy than the C4.5 decision tree classifier. FARM-DS ≈ SVM > C4.5 SVM2 and C4.5 results from (Bennett et al. 1997) 2017/4/25 Yan-Qing Zhang, Georgia State University

Result analysis on Interpretability SVM, high accuracy, hard to interpret C4.5, low accuracy , easy to interpret FARM-DS, high accuracy, easy to interpret As we all know, SVM is well known as a black box because the classification decision is hard to understand. Decision Trees can induce rules that are easy to interpret. However, the low accuracy decreases the effectiveness of DTs induced rules. Our FARM-DS has both high accuracy and good interpretability. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Interpretability (1) FARs extracted by FARM-DS are short and compact, and hence, easy to understand. 22 positive rules and 8 negative rules are extracted. In average, the length of a positive rule is 2.6, the length of a negative rule is 4.3, and every sample activates 3.3 positive rules and 5.6 negative rules. Firstly, FARs extracted by FARM-DS are short and compact. In experiments, FARM-DS is executed again on the whole dataset. Take BCW data as an example, 22 positive rules and 8 negative rules are extracted. In average, the length of a positive rule is 2.6, the length of a negative rule is 4.3, and every sample activates 3.3 positive rules and 5.6 negative rules. We believe that both the short length and the small number of activated rules can make extracted FARs easy to understand for further study. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Interpretability (2) FARs may help human experts to correct the wrongly classified samples. Secondly, FARs may help human experts to correct the wrongly classified samples. There are 19 wrongly classified samples by FARM-DS in Wisconsin dataset, however, we notice that 12 such samples activate some correct rules. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Interpretability (3) The larger support of the negative rules may help human experts to make final correct decisions and find inherent disease-resulting mechanisms. For example, the first validation sample in fold 1 is classified to be positive but it is actually negative. (That is, it is false positive). Its positive weight weight+=2.0000, and its negative weight weight-=0.9660. For this sample, FARM-DS returns 2 fired positive rules and 5 fired negative rules, of which the most general ones and the most specific ones are shown in this Table. The larger support of the negative rules may help human experts to make final correct decisions and find inherent disease-resulting mechanisms. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Interpretability (4) FARs are helpful to select important features. Higher activation frequency means more important feature Last, FARs are helpful to select important features. Intuitively, more frequent a feature is activated, more important it is. In experiment, we calculate the activation frequency. For BCW data, f4, f6, f8 are more important, and f1, f7, and f9 are less important. Human experts may work on important features first. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Outline Background Fuzzy Association Rule Mining for Decision Support (FARM-DS) FARM-DS on Medical Data FARM-DS on Microarray Expression Data Fuzzy-Granular Gene Selection on Microarray Expression Data Conclusion and Future Work Now I will report the result of microarray expression data analysis with FARM-DS. 2017/4/25 Yan-Qing Zhang, Georgia State University

Microarray Expression Data Extremely high dimensionality Gene selection Cancer classification Rule-based reasoning A typical microarray expression dataset is extremely sparse compared to a traditional classification dataset. For example, the AML/ALL leukemia dataset has only 72 samples (tissues) with 7129 features (gene expression measurements). That means, without gene selection, we have to discriminate and classify such a few samples in such a high dimensional space. It is unnecessary or even harmful for classification because it is believed that no more than 10% of these 7129 genes are relevant to Leukemia classification [9]. This extreme sparseness is believed to significantly deteriorate the performance of a classifier. As a result, the ability to extract a subset of informative genes while to remove irrelevant or redundant genes is crucial for accurate classification. Furthermore, it is also helpful for biologists to find the inherent cancer-resulting mechanism. After gene selection, FAR mining are conducted on expression data, these FARs will be used to classify new tissue samples. Moreover, due to their easy interpretability, FARs may also be helpful to support human experts for rule-based reasoning. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Empirical Studies Rule-Based Reasoning/Classification CART for decision trees modeling (Breiman, et al. 1984) ANFIS for fuzzy neural networks modeling (Jang, 1993) FARM-DS (He, et al. 2006a, IJDMB) the FARM-DS system is compared with two other rule-based classifiers, CART decision trees proposed by Breiman in 1984 and ANFIS proposed by Jang in 1993. The accuracy for each model is estimated with leave-one-out cross validation. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Evaluation metrics Accuracy Classification Error Area under ROC curve Accuracy Estimation Leave-one-out cross validation Interpretability Rule numbers Average rule lengths The accuracy is evaluated with classification error and area under ROC curve. A smaller error and a larger AUC mean a more accurate classifier. We use leave-one-out cross validation to estimate the real classification performance on unknown new samples. Suppose there are n samples, in each fold, we build a classifier on n-1 samples and test the performance on another 1 sample. This process is repeated n times so that each sample is used for testing one and only one time. Finally the averaged accuracy on these n folds performance is averaged as the estimation to real classification accuracy. The interpretability is evaluated with rule numbers and rule lengths. A classifier is easy to interpret if the rules are few and short. Bradley, 1997 2017/4/25 Yan-Qing Zhang, Georgia State University

AML/ALL leukemia dataset The AML/ALL leukemia dataset is used in experiments. The 8 gene features listed in this table are believed to be related to leukemia and hence are used for rule extraction.   Tang, et al. 2006 2017/4/25 Yan-Qing Zhang, Georgia State University

Result analysis: AML/ALL leukemia dataset FARM-DS is more accurate than CART. There are 2 classification errors with FARM-DS and 7 classification errors with CART. FARM-DS has also the largest area under ROC curve. On the other hand, compared with ANFIS, FARM-DS extracts much shorter rules (average rule length 4.8 vs. 8) and thus easier to interpret. Higher accuracy than CART Easier to interpret than ANFIS 2017/4/25 Yan-Qing Zhang, Georgia State University

Rules extracted by FARM-DS: AML/ALL leukemia dataset IF gene2 (Y12670), gene3 (D14659) and gene5 (M80254) are down-regulated, THEN the tissue is ALL(-1) This table lists the 5 rules extracted by FARM-DS. For example, the 1st rule can be explained as … Obviously, these rules are easy to interpret and hence may be more helpful to biomedical studies. 2017/4/25 Yan-Qing Zhang, Georgia State University

Prostate cancer dataset Another group of experiments is on the prostate cancer dataset. There are 102 tissue samples and 12600 gene features. The 8 genes listed here are closely related to the prostate cancer.   Tang, et al. 2006 2017/4/25 Yan-Qing Zhang, Georgia State University

Result analysis: prostate cancer dataset Similar to leukemia dataset, FARM-DS has higher accuracy than CART. There are 13 errors with CART but only 7 errors with FARM-DS. FARM-DS has larger area under ROC curve than CART. On the other hand, compared with ANFIS, FARM-DS extracts much shorter rules (average rule length 3.1 vs. 8) and hence easier to interpret. Higher accuracy than CART Easier to interpret than ANFIS 2017/4/25 Yan-Qing Zhang, Georgia State University

Rules extracted by FARM-DS: prostate cancer dataset The 15 fuzzy association rules are listed in this table. It shows some interesting patterns. Seems like if gene G5 is down-regulated, the sample is healthy, otherwise, it is a cancerous tissue. The similar pattern is also demonstrated for gene G1. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Outline Background Fuzzy Association Rule Mining for Decision Support (FARM-DS) FARM-DS on Medical Data FARM-DS on Microarray Expression Data Fuzzy-Granular Gene Selection on Microarray Expression Data Conclusion and Future Work We also designed a fuzzy-granular based method to select marker genes from microarray expression data. 2017/4/25 Yan-Qing Zhang, Georgia State University

Gene Selection and Cancer Classification on Microarray Expression Data Extremely high dimensionality AML/ALL leukemia dataset 72 * 7129 no more than 10% relevant genes (Golub, et al. 1999) Gene selection accurate classification helpful for cancer study A typical gene expression dataset is extremely sparse compared to a traditional classification dataset. For example, the AML/ALL leukemia dataset has only 72 samples (tissues) with 7129 features (gene expression measurements). That means, without gene selection, we have to discriminate and classify such a few samples in such a high dimensional space. It is unnecessary or even harmful for classification because it is believed that no more than 10% of these 7129 genes are relevant to Leukemia classification [9]. This extreme sparseness is believed to significantly deteriorate the performance of a classifier. As a result, the ability to extract a subset of informative genes while to remove irrelevant or redundant genes is crucial for accurate classification. Furthermore, it is also helpful for biologists to find the inherent cancer-resulting mechanism. From the data mining viewpoint, this gene selection problem is essentially a feature selection or dimensionality reduction problem. 2017/4/25 Yan-Qing Zhang, Georgia State University

Gene Categorization and Gene Ranking Informative genes Redundant genes Irrelevant genes Noisy genes • Informative genes, which are really cancer-related; • Redundant genes, which are also cancer-related but there are some other informative genes regulated similarly but more significantly for cancer classification; • Irrelevant genes, which are not cancer-related and their existence do not affect cancer classification; • Noisy genes, which are not cancer-related but they have negative effects on cancer classification. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Information Loss Noise Overfitting themselves Complementary to redundant/irrelevant genes Conflict with informative genes Imbalanced gene selection Inflexibility (Notice that the pre-filtering step by RI metric is targeted at minimizing this kind of effect by eliminating most of irrelevant genes.) individually contribute to discriminate the training samples by some non-cancer-related factors so that it is ranked high. (overfitting) complementary to some redundant or irrelevant genes so that these redundant or irrelevant genes are ranked higher. conflict with some informative genes so that these informative genes are ranked lower. How to decrease information loss? Granulation! 2017/4/25 Yan-Qing Zhang, Georgia State University

Coarse Granulation with Relevance Indexes Target: remove irrelevant genes imbalance imbalance balance Target: tune thresholds to select genes in balance 2017/4/25 Yan-Qing Zhang, Georgia State University

Fine Granulation with Fuzzy C-Means Clustering We explicitly group genes with similar expression patterns into clusters and then the lower-ranked genes in each cluster could be safely removed as redundant genes. The assumption is that genes with similar expression patterns also have similar functions to regulate cancers. Furthermore, due to complex correlation between genes, the similarity is by no means a “crisp” concept. Fuzzy C-Means deals with complex correlation between genes by assigning a gene into multiple clusters Therefore, a really informative gene achieves more than one opportunity to survive. clustering in the training samples space genes with similar expression patterns have similar functions a gene may have multiple functions (Fuzzy works here!) 2017/4/25 Yan-Qing Zhang, Georgia State University

Conquer with correlation-based Ranking After clustering, the third step is to rank genes in each cluster separately. Three different correlation-based methods, S2N, proposed by Furey in 2000, FC, designed by Pavlidis in 2001, and T-Statitics, used by Duan in 2004, are used for gene ranking. In these formulas, a larger weight value means a higher rank, the gene with the largest weight is the most informative in a cluster. Lower-ranked genes are removed as redundant genes 2017/4/25 Yan-Qing Zhang, Georgia State University

Aggregation with Data Fusion Pick up genes from different clusters in balance An informative gene is more possible to survive (due to fuzzy clustering) 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Original Gene Set Relevance Indexes -based pre-filtering Relevant Gene Set Correlation-based Gene Ranking 1 Gene Cluster 1 Correlation-based Gene Ranking 2 Fuzzy C-Means Clustering Gene Cluster 2 Correlation-based Gene Ranking K Gene Cluster K Final Gene Set 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Empirical Study Comparison Signal to Noise (S2N) (Furey, et al. 2000) Fuzzy-Granular + S2N Fisher Criterion (FC) (Pavlidis, et al. 2001) Fuzzy-Granular + FC T-Statistics (TS) (Duan, et al. 2004) Fuzzy-Granular + TS In our experiments, comparison studies are carried out to compare these three correlation algorithms on the whole gene set and also on the gene subsets after fuzzy granulation. So there are altogether 6 gene selection methods in the experiments. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Evaluation Methods Metrics Accuracy Sensitivity Specificity Area under ROC curve Estimation Leave-1-out CV .632 bootstrapping .632 Perf = 0.368 * training perf + 0.632 * testing perf The performance is evaluated with 4 metrics, accuracy, sensitivity, specificity, and area under ROC curve, which are explained in this slide. The performance for each model is estimated with leave-one-out cross validation. We also tried 0.632 bootstrapping for performance estimation because someone argue that bootstrapping is better than cross validation to estimate the accuracy on small size datasets like microarray datasets. The basic process of bootstrapping: Suppose there are n samples, one bootstrapping process randomly selects n samples with replacement. That mean that after we pick up one sample from the dataset, we will return it back and continue another time random selection. As a result, a sample may be selected multiple times. Averagely, 63.2% samples will be selected for training and other 36.8% samples will be used for testing. The performance on one bootstrapping process is calculate with this formula. We repeat this bootstrapping process B times, the average performance on these B times bootstrapping will be used to estimate real classification performance. 2017/4/25 Yan-Qing Zhang, Georgia State University

prostate cancer dataset The prostate cancer dataset is used. This dataset is very high-dimensional.   2017/4/25 Yan-Qing Zhang, Georgia State University

Result analysis: prostate cancer dataset The result shows that fuzzy granulation can always improve classification performance on the original correlated-based method without fuzzy granulation. With leave-one-out cross validation, it improves accuracy and area under ROC curve about 3%-8%. Similar performance improvement is also observed with 100 times .632 bootstrapping. The result also shows that fuzzy-granulation plus signal to noise is the best method. It is better than fuzzy-granulation plus fisher criterion and fuzzy-granulation plus t-statistics. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Colon cancer dataset Similar performance improvement is also observed on the colon cancer dataset.   2017/4/25 Yan-Qing Zhang, Georgia State University

Result analysis: colon cancer dataset Firstly, fuzzy-granulation plus a correlation-based method is always better than directly applying the correlation-based method on the whole data. Secondly, fuzzy-granulation plus S2N is better than FG+FC and FG+TS. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Conclusion High-level data abstraction data clustering techniques Quantitative data transformed to fuzzy discrete transactions Fuzzy interval partition Apriori algorithm for AR mining Strong decision support for biomedical study High accuracy and easy to interpret More accurate cancer classification Eliminate irrelevant/redundant genes to decrease noise Select informative genes in balance Two algorithms have been designed in my dissertation work. The first one is a general fuzzy association rule mining algorithm. FARM-DS implements high-level data abstraction by applying data clustering techniques. It automatically transform continuous data into fuzzy discrete transactions with a simple one-in-on-out TSK model. After that, the apriori algorithm is used for association rule mining on these fuzzy discrete transactions. Our experiment results show that FARM-DS can provide strong decision support for biomedical studies because classification based on FARs are both highly accurate and easy to interpret. The second algorithm in this dissertation work is applying fuzzy granulation on the large gene set on microarray expression data. The fuzzy-granular method can improve classification accuracy, mainly by eliminating XXX and selecting XXX. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Future Works Applying FARM-DS on other biomedical applications Integrating more intelligent data analysis techniques. Cloud computing based fuzzy data mining algorithms for big data mining GPU based fuzzy data mining algorithms for big data mining 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University References [1] Y. C. He, Y.C. Tang, Y.-Q. Zhang and R. Sunderraman, “Mining Fuzzy Association Rules from Microarray Gene Expression Data for Leukemia Classification,” Proc. of International Conference on Granular Computing (GrC-IEEE 2006), Atlanta, pp. 461-465, May 10-12, 2006. [2] Y.C. He and Y.C. Tang, Y.-Q. Zhang and R. Sunderraman, “Adaptive Fuzzy Association Rule Mining for Effective Decision Support in Biomedical Applications,” International Journal of Data Mining and Bioinformatics, Vol. 1, No. 1, pp. 3-18, 2006. [3] Y.C. He, Y.C. Tang, Y.-Q. Zhang and R. Sunderraman, “Fuzzy-Granular Gene Selection from Microarray Expression Data,” Proc. of DMB2006 in conjunction with IEEE-ICDM2006, Hong Kong, Dec. 18, 2006, (accepted). [4] Y.C. He, Y.C. Tang, Y.-Q. Zhang and R. Sunderraman, “Fuzzy-Granular Methods for Identifying Marker Genes from Microarray Expression Data,” Computational Intelligence for Bioinformatics, Gary B. Fogel, David Corne, and Yi Pan (eds.), IEEE Press, 2007. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Acknowledgments Thanks goto Dr. Yuchun Tang Dr. Yuanchen He For their hard works on this research project. 2017/4/25 Yan-Qing Zhang, Georgia State University

Yan-Qing Zhang, Georgia State University Questions? Comments? 2017/4/25 Yan-Qing Zhang, Georgia State University