Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra 1 National ICT Australia Limited is funded and.

Slides:

Advertisements

Similar presentations

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.

Advertisements

Evaluating Classifiers

Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.

Olivier Duchenne ， Armand Joulin ， Jean Ponce Willow Lab ， ICCV2011.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.

Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.

SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Evaluating Classifiers Lecture 2 Instructor: Max Welling Read chapter 5.

Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

Jeremy Wyatt Thanks to Gavin Brown

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

ROC & AUC, LIFT ד"ר אבי רוזנפלד.

05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.

InCob A particle swarm based hybrid system for imbalanced medical data sampling Pengyi Yang School of Information Technologies.

Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.

KDD Cup Task 2 Mark Craven Department of Biostatistics & Medical Informatics Department of Computer Sciences University of Wisconsin

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extreme Re-balancing for SVMs: a case study Advisor ：

Extreme Re-balancing for SVMs and other classifiers Presenter: Cui, Shuoyang 2005/03/02 Authors: Bhavani Raskutti & Adam Kowalczyk Telstra Croporation.

1 Machine Learning: Experimental Evaluation. 2 Motivation Evaluating the performance of learning systems is important because: –Learning systems are usually.

From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Evaluating Results of Learning Blaž Zupan

A presentation on the topic For CIS 595 Bioinformatics course

Interactive Learning of the Acoustic Properties of Objects by a Robot

Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.

Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.

Data Mining and Decision Support

Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Final Report (30% final score) Bin Liu, PhD, Associate Professor.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

© Telstra Corporation Limited 2002 Research Laboratories KDD Cup 2002: Single Class SVM for Yeast Gene Regulation Prediction Adam Kowalczyk Bhavani Raskutti.

Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.

1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 

Presented by: Isabelle Guyon Machine Learning Research.

Support Vector Machines Part 2. Recap of SVM algorithm Given training set S = {(x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ) | (x i, y i )   n  {+1, -1}

1 CISC 841 Bioinformatics (Fall 2008) Review Session.

Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.

Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

7. Performance Measurement

Evaluating Classifiers

Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016

Trees, bagging, boosting, and stacking

Evaluating Results of Learning

KDD 2004: Adversarial Classification

Regularized risk minimization

Evaluating Classifiers

An Enhanced Support Vector Machine Model for Intrusion Detection

Data Mining Classification: Alternative Techniques

Face detection using Random projections

Experiments in Machine Learning

Computational Learning Theory

Computational Learning Theory

Anti-Learning Adam Kowalczyk Statistical Machine Learning

Presentation transcript:

Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra 1 National ICT Australia Limited is funded and supported by:

Overview Anti-learning –Elevated XOR Natural data –Predicting Chemo-Radio-Therapy (CRT) response for Oesophageal Cancer –Classifying Aryl Hydrocarbon Receptor genes Synthetic data –High dimensional mimicry Conclusions Appendix: A Theory of Anti-learning –Perfect anti-learning –Class-symmetric kernels

Definition of anti-learning Training accuracy Random guessing accuracy Off-training accuracy Off-training accuracy Systematically:

Anti-learning in Low Dimensions y x z +1 -1

Anti-Learning Learning

Evaluation Measure Area under Receiver Operating Characteristic (AROC) f fθ False Positive True Positive AROC( f )

Learning and anti-learning mode of supervised classification TP FN AROC FN AROC FN TP + + Learning Anti- learning AR OC Test Training Random: AROC = 0.5 ?

Anti-learning in Cancer Genomics

From Oesophageal Cancer to machine learning challenge

Learning and anti-learning mode of supervised classification TP FN AROC FN AROC FN TP + + Learning Anti-learning AROC Test Training Random: AROC = 0.5

Anti-learning in Classification of Genes in Yeast

KDD’02 task: identification of Aryl Hydrocarbon Receptor genes (AHR data)

Anti-learning in AHR-data set from KDD Cup 2002 Average of 100 trials; random splits: training: test = 66% : 34%

KDD Cup 2002 Yeast Gene Regulation Prediction Task Vogel- AI Insight - change - change or control Single class SVM 38/84 training examples 1.3/2.8% of data used in ~14,000 dimensions

Anti-learning in High Dimensional Approximation (Mimicry)

Paradox of High Dimensional Mimicry high dimensional features If detection is based of large number of features, the imposters are samples from a distribution with the marginals perfectly matching distribution of individual features for a finite genuine sample, then imposters are be perfectly detectable by ML-filters in the anti-learning mode

Mimicry in High Dimensional Spaces

Quality of mimicry Average of independent test for of 50 repeats d = 1000 d = 5000 = | n E | / | n X |

Formal result :

Proof idea 1: Geometry of the mimicry data Key Lemma:

Proof idea 1: Geometry of the mimicry data

Proof idea 2:

Proof idea 3:kernel matrix

Proof idea 4

Theory of anti-learning

Hadamard Matrix

CS-kernels

Perfect learning/anti-learning for CS-kernels Kowalczyk & Chapelle, ALT’ 05 False positive True positive Test ROC S-T Train ROC T 1 1

Perfect learning/anti-learning for CS-kernels Kowalczyk & Chapelle, ALT’ 05

Perfect learning/anti-learning for CS-kernels

Perfect anti-learning theorem Kowalczyk & Smola, Conditions for Anti-Learning

Anti-learning in classification of Hadamard dataset Kowalczyk & Smola, Conditions for Anti-Learning

AHR data set from KDD Cup’02 Kowalczyk, Smola, submitted Kowalczyk & Smola, Conditions for Anti-Learning

From Anti-learning to learning Class Symmetric CS– kernel case Kowalczyk & Chapelle, ALT’ 05

Perfect anti-learning : i.i.d. a learning curve n = 100, n Rand = 1000 random AROC: mean ± std n samples i.i.d. samples from the perfect anti-learning-set S More is not necessarily better!

Conclusions Statistics and machine learning are indispensable components of forthcoming revolution in medical diagnostics based on genomic profiling High dimensionality of the data poses new challenges pushing statistical techniques into uncharted waters Challenges of biological data can stimulate novel directions of machine learning research

Acknowledgements Telstra –Bhavani Raskutti Peter MacCallum Cancer Centre –David Bowtell –Coung Duong –Wayne Phillips MPI –Cheng Soon Ong –Olivier Chapelle NICTA –Alex Smola