Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra 1 National ICT Australia Limited is funded and.

Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra (Adam.Kowalczyk@nicta.com.au) 1 National ICT Australia Limited is funded and supported by:

Overview Anti-learning –Elevated XOR Natural data –Predicting Chemo-Radio-Therapy (CRT) response for Oesophageal Cancer –Classifying Aryl Hydrocarbon Receptor genes Synthetic data –High dimensional mimicry Conclusions Appendix: A Theory of Anti-learning –Perfect anti-learning –Class-symmetric kernels

Definition of anti-learning Training accuracy Random guessing accuracy Off-training accuracy Off-training accuracy Systematically:

Anti-learning in Low Dimensions +1 -1 +1 y x z +1 -1

Anti-Learning Learning

Evaluation Measure Area under Receiver Operating Characteristic (AROC) f fθ 00.51 0 1 False Positive True Positive AROC( f )

Learning and anti-learning mode of supervised classification TP FN AROC 0 1 1 0 FN AROC 0 1 1 0 FN 0 1 1 0 TP + + Learning Anti- learning AR OC Test Training Random: AROC = 0.5 ?

Anti-learning in Cancer Genomics

From Oesophageal Cancer to machine learning challenge

Learning and anti-learning mode of supervised classification TP FN AROC 0 1 1 0 FN AROC 0 1 1 0 FN 0 1 1 0 TP + + Learning Anti-learning AROC Test Training Random: AROC = 0.5

Anti-learning in Classification of Genes in Yeast

KDD’02 task: identification of Aryl Hydrocarbon Receptor genes (AHR data)

Anti-learning in AHR-data set from KDD Cup 2002 Average of 100 trials; random splits: training: test = 66% : 34%

KDD Cup 2002 Yeast Gene Regulation Prediction Task http://www.biostat.wisc.edu/~craven/kddcup/task2.ppt Vogel- AI Insight - change - change or control Single class SVM 38/84 training examples 1.3/2.8% of data used in ~14,000 dimensions

Anti-learning in High Dimensional Approximation (Mimicry)

Paradox of High Dimensional Mimicry high dimensional features If detection is based of large number of features, the imposters are samples from a distribution with the marginals perfectly matching distribution of individual features for a finite genuine sample, then imposters are be perfectly detectable by ML-filters in the anti-learning mode

Mimicry in High Dimensional Spaces

Quality of mimicry Average of independent test for of 50 repeats d = 1000 d = 5000 = | n E | / | n X |

Formal result :

Proof idea 1: Geometry of the mimicry data Key Lemma:

Proof idea 1: Geometry of the mimicry data

Proof idea 2:

Proof idea 3:kernel matrix

Proof idea 4

Theory of anti-learning

Hadamard Matrix

CS-kernels

Perfect learning/anti-learning for CS-kernels Kowalczyk & Chapelle, ALT’ 05 False positive True positive Test ROC S-T Train ROC T 1 1

Perfect learning/anti-learning for CS-kernels Kowalczyk & Chapelle, ALT’ 05

Perfect learning/anti-learning for CS-kernels

Perfect anti-learning theorem Kowalczyk & Smola, Conditions for Anti-Learning

Anti-learning in classification of Hadamard dataset Kowalczyk & Smola, Conditions for Anti-Learning

AHR data set from KDD Cup’02 Kowalczyk, Smola, submitted Kowalczyk & Smola, Conditions for Anti-Learning

From Anti-learning to learning Class Symmetric CS– kernel case Kowalczyk & Chapelle, ALT’ 05

Perfect anti-learning : i.i.d. a learning curve n = 100, n Rand = 1000 random AROC: mean ± std 1 2 45 3 0 n samples i.i.d. samples from the perfect anti-learning-set S More is not necessarily better!

Conclusions Statistics and machine learning are indispensable components of forthcoming revolution in medical diagnostics based on genomic profiling High dimensionality of the data poses new challenges pushing statistical techniques into uncharted waters Challenges of biological data can stimulate novel directions of machine learning research

Acknowledgements Telstra –Bhavani Raskutti Peter MacCallum Cancer Centre –David Bowtell –Coung Duong –Wayne Phillips MPI –Cheng Soon Ong –Olivier Chapelle NICTA –Alex Smola

Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra 1 National ICT Australia Limited is funded and.

Similar presentations

Presentation on theme: "Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra 1 National ICT Australia Limited is funded and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra 1 National ICT Australia Limited is funded and.

Similar presentations

Presentation on theme: "Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra 1 National ICT Australia Limited is funded and."— Presentation transcript:

Similar presentations

About project

Feedback