Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anti-Learning Adam Kowalczyk Statistical Machine Learning

Similar presentations


Presentation on theme: "Anti-Learning Adam Kowalczyk Statistical Machine Learning"— Presentation transcript:

1 Anti-Learning Adam Kowalczyk Statistical Machine Learning
NICTA, Canberra National ICT Australia Limited is funded and supported by: 1

2 Overview Anti-learning Natural data Synthetic data Conclusions
Elevated XOR Natural data Predicting Chemo-Radio-Therapy (CRT) response for Oesophageal Cancer Classifying Aryl Hydrocarbon Receptor genes Synthetic data High dimensional mimicry Conclusions Appendix: A Theory of Anti-learning Perfect anti-learning Class-symmetric kernels

3 Definition of anti-learning
Systematically: Random guessing accuracy Off-training accuracy Training accuracy

4 Anti-learning in Low Dimensions
-1 +1 y x z -1 +1 +1 -1

5 Anti-Learning Learning

6 Evaluation Measure Area under Receiver Operating Characteristic (AROC)
1 f θ f True Positive 0.5 AROC( f ) 0.5 1 False Positive 10

7 Learning and anti-learning mode of supervised classification
TP FN AROC 1 + Learning Anti-learning Test Training Random: AROC = 0.5 ?

8 Anti-learning in Cancer Genomics

9 From Oesophageal Cancer to machine learning challenge

10 Learning and anti-learning mode of supervised classification
Test Training 1 AROC Learning TP + 1 AROC TP 1 FN 1 1 FN + TP Random: AROC = 0.5 Anti-learning AROC 1 FN

11 Anti-learning in Classification of Genes in Yeast

12 KDD’02 task: identification of Aryl Hydrocarbon Receptor genes (AHR data)

13 Anti-learning in AHR-data set from KDD Cup 2002
Average of 100 trials; random splits: training: test = 66% : 34%

14 KDD Cup 2002 Yeast Gene Regulation Prediction Task http://www. biostat
Vogel- AI Insight - change - change or control Single class SVM 38/84 training examples 1.3/2.8% of data used in ~14,000 dimensions

15 Anti-learning in High Dimensional Approximation (Mimicry)

16 Paradox of High Dimensional Mimicry
If detection is based of large number of features, the imposters are samples from a distribution with the marginals perfectly matching distribution of individual features for a finite genuine sample, then imposters are be perfectly detectable by ML-filters in the anti-learning mode high dimensional features

17 Mimicry in High Dimensional Spaces

18 Quality of mimicry d = 5000 d = 1000
= | nE | / |nX| d = 1000 = | nE | / |nX| Average of independent test for of 50 repeats

19 Formal result :

20 Proof idea 1: Geometry of the mimicry data
Key Lemma:

21 Proof idea 1: Geometry of the mimicry data

22 Proof idea 2:

23 Proof idea 2:

24 Proof idea 2:

25 Proof idea 3:kernel matrix

26 Proof idea 4

27 Theory of anti-learning

28 Hadamard Matrix

29 CS-kernels

30 Perfect learning/anti-learning for CS-kernels
False positive True positive Test ROCS-T Train ROCT 1 Kowalczyk & Chapelle, ALT’ 05

31 Perfect learning/anti-learning for CS-kernels
Kowalczyk & Chapelle, ALT’ 05

32 Perfect learning/anti-learning for CS-kernels

33 Perfect learning/anti-learning for CS-kernels

34 Perfect anti-learning theorem
Kowalczyk & Smola, Conditions for Anti-Learning

35 Anti-learning in classification of Hadamard dataset
Kowalczyk & Smola, Conditions for Anti-Learning

36 AHR data set from KDD Cup’02
Kowalczyk & Smola, Conditions for Anti-Learning Kowalczyk, Smola, submitted

37 From Anti-learning to learning Class Symmetric CS– kernel case
Kowalczyk & Chapelle, ALT’ 05

38 Perfect anti-learning : i.i.d. a learning curve
More is not necessarily better! n = 100, nRand = 1000 random AROC: mean ± std 1 2 3 4 5 nsamples i.i.d. samples from the perfect anti-learning-set S

39 Conclusions Statistics and machine learning are indispensable components of forthcoming revolution in medical diagnostics based on genomic profiling High dimensionality of the data poses new challenges pushing statistical techniques into uncharted waters Challenges of biological data can stimulate novel directions of machine learning research

40 Acknowledgements Telstra Peter MacCallum Cancer Centre MPI NICTA
Bhavani Raskutti Peter MacCallum Cancer Centre David Bowtell Coung Duong Wayne Phillips MPI Cheng Soon Ong Olivier Chapelle NICTA Alex Smola


Download ppt "Anti-Learning Adam Kowalczyk Statistical Machine Learning"

Similar presentations


Ads by Google