Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.

Slides:



Advertisements
Similar presentations
Naïve-Bayes Classifiers Business Intelligence for Managers.
Advertisements

Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Tree-based methods, neutral networks
Three kinds of learning
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Chapter 5 Data mining : A Closer Look.
Decision Tree Models in Data Mining
Ensembles of Classifiers Evgueni Smirnov
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
CLassification TESTING Testing classifier accuracy
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
by B. Zadrozny and C. Elkan
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Appendix: The WEKA Data Mining Software
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Weka Project assignment 3
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
1 KDD-09, Paris France Quantification and Semi-Supervised Classification Methods for Handling Changes in Class Distribution Jack Chongjie Xue † Gary M.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Ensemble Methods: Bagging and Boosting
CLASSIFICATION: Ensemble Methods
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Data Science Credibility: Evaluating What’s Been Learned
Big data classification using neural network
KDD CUP 2001 Task 1: Thrombin Jie Cheng (
Results for all features Results for the reduced set of features
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining (and machine learning)
Data Mining Lecture 11.
Data Mining Practical Machine Learning Tools and Techniques
A Unifying View on Instance Selection
KDD CUP 2001 Task 3 Localization
Word Embedding Word2Vec.
Prepared by: Mahmoud Rafeek Al-Farra
Somi Jacob and Christian Bach
Identifying Severe Weather Radar Characteristics
A task of induction to find patterns
Physics-guided machine learning for milling stability:
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
A task of induction to find patterns
Machine Learning: Lecture 5
Presentation transcript:

Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School of Computer Science and Information Systems September 27, 2003

Overview Introduction Purpose Methods Algorithms Results Conclusions and Future Work

Introduction

Introduction: Data Mining Application of machine learning algorithms to large databases Often used to classify future data based on training set “Target” variable is variable to be predicted Theoretically, algorithms are context-independent

Introduction: Stacked Generalization Method for combining models Part of training set used to train level-0, or base, models as usual Level-1 data built from predictions of level-0 models on remainder of set Level-1 Generalizers are models trained on level-1 data

Introduction: Bioinformatics and Protein Localization Bioinformatics: application of computing to molecular biology Currently much interest in information about proteins Expression of proteins localized in a particular type or part of cell (localization) Knowledge of protein localization can shed light on protein’s function Data mining employed to predict localization from database of information about encoding genes

Introduction: KDD Cup 2001 Task KDD Cup: Annual data mining competition sponsored by ACM SIGKDD Participants use training set to predict target variable values in test dataset of different instances Winner is most accurate model (correct predictions/total instances in test set) 2001 task: predict protein localization of genes Anonymized genes were instances, information about genes were attributes Datasets (incl. revealed target) used in this project

Purpose Use Stacked Generalization approach on this task Compare inter-algorithm performance using level-0 models and level-1 generalizers Evaluate strategy of equally distributing target variable

Methods

Methods: Dataset Manipulations Reduce number of input variables Reduce number of potential target values to 3 Separate original training dataset into training and validation sets for stacking Eliminate effectively unary variables in final training dataset

Table: Target Variable Distribution

Second training set created by stratifying to ensure equally distributed localizations Level-0 models trained on both raw (unequally distributed) and equally distributed training sets Separate level-1 data and level-1 generalizers from this dataset Methods: Equally Distributed Approach

Algorithms

Algorithms: Level-0 Artificial Neural Network (ANN) Fully connected feedforward network Input variables  dummy variables  186 input nodes Target variable  dummy variables  2 output nodes 1 hidden node Training based on change in misclassification rate

Used CHAID-like algorithm Chi-squared p value splitting criterion: p < 0.2 Model selection based on proportion of instances correctly classified Algorithms: Level-0 Decision Tree

Algorithms: Level-0 Nearest Neighbor (NN) Compare each instance between two datasets Count number of matching attributes Predict target value of instance matching on greatest number of attributes Use relative frequency in unequally distributed dataset to break ties

Algorithms: Level-0 Hybrid Decision Tree/ANN Difficult for ANN to learn with too many variables Decision Tree can be used as a “feature selector” Important variables are those used as branching criteria New ANN trained using only important variables as inputs

Algorithms: Level-1 Generalizers ANN and Decision Tree –Designed and trained essentially the same as level-0 counterparts –ANN had 8 input nodes Naïve Bayesian Model –Calculated likelihood of each target value based on Bayes rule –Predicted value with highest likelihood

Results

Results: Accuracy Rates

Results: Evaluation of Accuracy Rates Similar to highest-performing KDD Cup models However, predictions drawn from much smaller pool of potential localizations Also not much better than just predicting nucleus Still, had fewer input variables with which to work

Level-1 Decision Tree Diagram

Results: Statistical Comparisons No significant inter-algorithm differences for level-0 models Hybrid offered some improvement over ANN alone Equal distribution usually resulted in slightly worse performance Stacked Generalization resulted in better performance, sometimes significantly so

Conclusions and Future Work

Conclusions and Future Work: Stratifying for Equal Distribution Not worth it and perhaps harmful Resulting small sample size may be to blame Could sample from full training set Other sampling approaches could be used Weight variable not necessarily meaningful

Conclusions and Future Work: Specific Models Algorithms performed comparably to each other ANN may need more hidden nodes Hybrid model improved ANN’s performance slightly, but not much NN may owe some of performance to tie-breaker implementation Naïve Bayesian not standout, as might be expected –Could run A Priori search first

Conclusions and Future Work: Stacked Generalization in General Somewhat, not drastically, better performance Possible ways to improve performance –Cross-validation could improve both performance and evaluation –Use posterior probabilities instead of actual predictions –Try different algorithms –Continue stacking on more levels (level-2, level-3, etc.) Apply Stacked Generalization to actual KDD Cup task

References Page, D. (2001). KDD Cup Website located at Ting, K.M., Witten, I.H. (1997). Stacked generalization: when does it work?. Proc International Joint Conference on Artificial Intelligence, Japan, Witten, I.H., Frank, E. (2000). Data Mining. Morgan Kaufmann (San Francisco). Wolpert, D.H. (1992). Stacked Generalization. Neural Networks, 5: