Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Gustavo.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On Rival Penalization Controlled Competitive Learning.
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Quality evaluation of product reviews using an information.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extreme Re-balancing for SVMs: a case study Advisor :
A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data Author: Gustavo E. A. Batista Presenter: Hui Li University of Ottawa.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology U*F clustering : a new performant “ clustering-mining ”
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Looking inside self-organizing map ensembles with resampling.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A self-organizing neural network using ideas from the immune.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A quantitative stock prediction system based on financial news Presenter : Chun-Jung Shih Authors :Robert.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Empirical Study of Learning from Imbalanced Data Using.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A semantic similarity metric combining features and intrinsic information content Presenter: Chun-Ping.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
國立雲林科技大學 National Yunlin University of Science and Technology Self-organizing map learning nonlinearly embedded manifoldsmanifolds Author :Timo Simila.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Study on Automatic Recognition of Road Signs Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The application of SOM as a decision support tool to identify AACSB peer schools Presenter : Chun-Ping.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Visualization of multi-algorithm clustering for better economic decisions - The case of car pricing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Loss of the Mahalanobis Distance in High Dimensions-
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Cost- sensitive boosting for classification of imbalanced.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Modeling Semantic Similarities in Multiple Maps Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Recognizing Partially Occluded, Expression Variant Faces.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Prediction model building and feature selection with support.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology IEEE EC1 Generating War Game Strategies Using A Genetic.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Investigating the Effect of Sampling Methods for Imbalanced.
Balancing Techniques Gretel Fernández.
Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Gustavo E. A. P. A. Batista Ronaldo C. Prati Maria Carolina Monard A study of the Behavior of Several Methods for Balancing Machine Learning Training Data SigKDD,2004

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction KNN 10 methods Experimental Results Conclusions Personal Opinion

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation class imbalances are significant losses of performance in standard classifiers

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective a broad experimental evaluation involving 10 methods, to deal with the class imbalance problem 13 UCI data sets

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction is a large imbalance between the majority class and the minority class present some degree of class overlapping May incorrectly classify many cases from the minority class because the nearest neighbors of these cases are examples belonging to the majority class

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction, For instance, it is straightforward to create a classifier having an accuracy of 99% in a domain where the majority class proportion corresponds to 98% of the examples, by simply forecasting every new example as belonging to the majority class.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction ROC curve (AUC) represent the expected performance as a single scalar it is equivalent to the Wilconxon test of ranks

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methods Implement k-NN algorithm

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodological Implement k-NN algorithm Use the Heterogeneous Value difference Metric distance function Euclidean distance for quantitative attributes VDM distance for qualitative attributes

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (1) Random over-sampling Random replication of minority class examples

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (2) Random under-sampling Random elimination of majority class examples

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (3) Tomek Links Given two examples E i and E j belonging to different classes, and d(E i,E j ) is the distance between E i and E j. A (E i,E j ) pair is called a Tomek link if there is not an example E l, such that d(E i,E l )<d(E i,E j ) or d(E j,E l )<d(E i,E j ) 1.borderline 2.Is noise As an under-sampling method, eliminate majority class example As a data cleaning method, eliminate both class examples

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (4) Condensed Nearest Neighbor Rule find consistent subset of examples, eliminate the examples from the majority class are distant from the decision brooder a subset is consistent with E if using a 1-NN, correctly classifies the examples in E. an algorithm to create a subset from E as an under-sampling method 1.Randomly draw one majority class example all examples from the minority class put these examples in 2.1-NN over the examples in to classify the examples in E 3.every misclassified example from E is moved to

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (5) One-sided selection (OSS) Is an under-sampling method resulting from the application of Tomek links followed by the application of CNN Remove noisy and borderline majority class examples

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (6) CNN + Tomek links It is similar to the OSS, but the method to find the consistent subset is applied before the Tomk links. As finding Tomek links is computationally demanding, it would be computationally cheaper.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (7) Neighborhood Cleaning Rule Use Wilson’s Edited Nearest Neighbor Rule

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (8) Synthetic Minority Over-sampling Technique (Smote) Its main idea is to form new minority class examples by interpolating between several minority class examples that lie together. cause the decision boundaries for the minority class to spread further into the majority class space

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (9) Smote + Tomek links

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method (10) Smote + ENN ENN remove more examples than the Tomek links does, so it is expected that it will provide a more in depth data cleaning.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation C4.5 symbolic learning algorithm to induce decision trees 15 UCI data sets

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation Unpruned decision trees obtained better results

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Evaluation

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusion imbalance 10 methods of matching Smote + Tomek or Smote + ENN might be applied to data sets with a small number of positive instances. Large number of positive examples, the Random over- sampling method less expensive than other methods would produce meaningful results. ROC curves

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions Drawback go deeper! Be carefully! Application What alternative methodological are there ? Future Work easy to implement