Download presentation
Presentation is loading. Please wait.
Published byOsborne Lamb Modified over 9 years ago
1
Michael Biehl Mathematics and Computing Science University of Groningen / NL Prototype-based classifiers and their applications in the life-sciences www.cs.rug.nl/~ biehl
2
Michael Biehl Mathematics and Computing Science University of Groningen / NL LVQ and Relevance Learning frequently asked questions and rarely given answers www.cs.rug.nl/~ biehl
3
WSOM 2014 frequently asked questions So, why do you still do this LVQ-stuff ?
4
WSOM 2014 basics: distance based classifiers, relevance learning What about the curse of dimensionality ? How do you find a good distance measure ? example: Generalized Matrix LVQ What about over-fitting ? Is the relevance matrix unique ? Is it useful in practice ? frequently asked questions applications: bio-medical data adrenal tumors, rheumatoid arthritis outlook: What‘s next ?
5
WSOM 2014 K-NN classifier a simple distance-based classifier - store a set of labeled examples - classify a query according to the label of the Nearest Neighbor (or the majority of K NN) ? - piece-wise linear class borders parameterized by all examples feature space + conceptually simple, no training required, one parameter (K) - expensive storage and computation, sensitivity to “outliers” can result in overly complex decision boundaries
6
WSOM 2014 prototype based classification a prototype based classifier - represent the data by one or several prototypes per class - classify a query according to the label of the nearest prototype (or alternative schemes) - piece-wise linear class borders parameterized by prototypes feature space + less sensitive to outliers, lower storage needs, little computational effort in the working phase - training phase required in order to place prototypes, model selection problem: number of prototypes per class, etc. ?
7
WSOM 2014 set of prototypes carrying class-labels based on dissimilarity / distance measure nearest prototype classifier (NPC): given - determine the winner with Nearest Prototype Classifier minimal requirements: - assign to class standard example: squared Euclidean
8
WSOM 2014 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors initialize prototype vectors for different classes competititve learning: LVQ1 [Kohonen, 1990] identify the winner (closest prototype) present a single example move the winner - closer towards the data (same class) - away from the data (different class)
9
WSOM 2014 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors ∙ tesselation of feature space [piece-wise linear] ∙ distance-based classification [here: Euclidean distances] ∙ generalization ability correct classification of new data ∙ aim: discrimination of classes ( ≠ vector quantization or density estimation )
10
WSOM 2014 What about the curse of dimensionality ? concentration of norms/distances for large N „distance based methods are bound to fail in high dimensions“ ? LVQ: - prototypes are not just random data points - carefully selected representatives of the data - distances of a given data point to prototypes are compared projection to non-trivial low-dimensional subspace! [Ghosh et al., 2007, Witoelar et al., 2010] models of LVQ training, analytical treatment in the limit successful training needs training examples see also:
11
WSOM 2014 cost function based LVQ one example: Generalized LVQ [Sato & Yamada, 1995] sigmoidal (linear for small arguments), e.g. E approximates number of misclassifications linear E favors large margin separation of classes, e.g. two winning prototypes: minimize small, large E favors class-typical prototypes
12
WSOM 2014 cost function based LVQ There is nothing objective about objective functions J. McClelland
13
WSOM 2014 GLVQ training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negative, differentiable distance requirement: update decreases, increases
14
WSOM 2014 GLVQ training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negative, differentiable distance
15
WSOM 2014 GLVQ training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on Euclidean distance moves prototypes towards / away from sample with prefactors
16
WSOM 2014 16 fixed distance measures: - select distance measures according to prior knowledge - data driven choice in a preprocessing step - compare performance of various measures example: divergence based LVQ (DLVQ) Mwebaze et al., Neurocomputing (2011) relevance learning: - employ a parameterized distance measure - update its parameters in the training process together with prototype training - adaptive, data driven dissimilarity example: Matrix Relevance LVQ What is a good distance measure ?
17
WSOM 2014 Relevance Matrix LVQ generalized quadratic distance in LVQ: variants: one global, several local, class-wise relevance matri ces Λ (j) → piecewise quadratic decision boundaries rectangular discriminative low-dim. representation e.g. for visualization [Bunte et al., 2012] possible constraints: rank-control, sparsity, … diagonal matrices: single feature weights [Bojer et al., 2001] [Hammer et al., 2002] [Schneider et al., 2009]
18
WSOM 2014 Relevance Matrix LVQ Generalized Matrix-LVQ (GMLVQ) optimization of prototypes and distance measure
19
WSOM 2014 19 heuristic interpretation summarizes - the contribution of the original dimension - the relevance of original features for the classification interpretation assumes implicitly: features have equal order of magnitude e.g. after z-score-transformation → (averages over data set) standard Euclidean distance for linearly transformed features
20
Classification of adrenal tumors Wiebke Arlt, Angela Taylor Dave J. Smith, Peter Nightingale P.M. Stewart, C.H.L. Shackleton et al. Petra Schneider Han Stiekema Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen School of Medicine Queen Elizabeth Hospital University of Birmingham/UK (+ several centers in Europe) [Arlt et al., J. Clin. Endocrinology & Metabolism, 2011] [Biehl et al., Europ. Symp. Artficial Neural Networks (ESANN), 2012]
21
WSOM 2014 ∙ adrenocortical tumors, difficult differential diagnosis: ACC: adrenocortical carcinomas ACA: adrenocortical adenomas ∙ idea: steroid metabolomics tumor classification based on urinary steroid excretion 32 candidate steroid markers: adrenocortical tumors
22
WSOM 2014 Generalized Matrix LVQ, ACC vs. ACA classification ∙ data divided in 90% training, 10% test set ∙ determine prototypes typical profiles (1 per class) ∙ apply classifier to test data evaluate performance (error rates, ROC) ∙ adaptive generalized quadratic distance measure parameterized by ∙ repeat and average over many random splits adrenocortical tumors data set: 24 hrs. urinary steroid excretion 102 patients with benign ACA 45 patients with malignant ACC
23
WSOM 2014 prototypes ACA ACC log-transformed steroid excretion in ACA/ACC rescaled using healthy control group values
24
WSOM 2014 off-diagonal diagonal elements subset of 9 selected steroids ↔ technical realization (patented, University of Birmingham/UK) Relevance matrix adrenocortical tumors
25
WSOM 2014 highly discriminative combination of markers ! weakly discriminative markers adrenocortical tumors 5a-THA (8) TH-Doc (12)
26
WSOM 2014 ROC characteristics clear improvement due to adaptive distances (1-specificity) (sensitivity) 8 GMLVQ GRLVQ diagonal rel. Euclidean full matrix AUC 0.87 0.93 0.97 adrenocortical tumors
27
WSOM 2014 27 frequently asked questions How relevant are the relevances ? What about over-fitting ? matrices introduce O(N 2 ) additional adaptive parameters! Is the relevance matrix unique ? - uniqueness of parameterization (Ω for given Λ) ? - uniqueness of the relevance matrix Λ ? - interpretation of relevance matrix ( uniqueness)
28
WSOM 2014 What about over-fitting ? observation: low rank of resulting relevance matrix effective # of degrees of freedom ~ N eigenvalues in ACA/ACC classification columns of stationary Ω T are vectors in the eigenspace associated with the smallest eigenvalue of the pseudo-covariance mathematics: stationarity conditions Γ -not necessarily positive -depends on Ω itself -cannot be determined prior to training Biehl et al. Machine Learning Reports (2009) in preparation (forever)
29
WSOM 2014 by-product: low-dim. representation projection on first eigenvector of Λ projection on 2nd eigenvector control benign malignant
30
WSOM 2014 30 Is the relevance matrix unique ? (I) uniqueness of Ω, given Λ matrix square root is not unique irrelevant rotations, reflections, symmetries…. canonical representation in terms of eigen-decomposition of Λ: - pos. semi-definite -symmetric
31
WSOM 2014 31 (II) uniqueness given transformation: are in the null-space of is possible if the rows of → identical mapping of examples, different for possible to extend by prototypes is singular if features are correlated, dependent Is the relevance matrix unique ?
32
WSOM 2014 32 regularization training process yields determine with eigenvectors and eigenvalues regularization: (K>J ) retains the eigenspace corresponding to largest eigenvalues removes also span of small non-zero eigenvalues (K=J ) removes null-space contributions, unique solution with minimal Euclidean norm of row vectors
33
WSOM 2014 33 regularization regularized mapping after/during training pre-processing of data (PCA-like) mapped feature space fixed K prototypes yet unknown retains original features flexible K may include prototypes Strickert, Hammer, Villmann, Biehl, IEEE SCCI 2013 Regularization and improved interpretation of linear data mappings and adaptive distance measures
34
WSOM 2014 34 illustrative example infra-red spectral data: 124 wine spamples 256 wavelengths 30 training data 94 test spectra alcohol content high low medium GMLVQ classification
35
WSOM 2014 35 GMLVQ best performance 7 dimensions remaining over-fitting effect null-space correction P=30 dimensions
36
WSOM 2014 36 original regularized regularization - enhances generalization - smoothens relevance profile/matrix - removes ‘false relevances’ - improves interpretability of Λ raw relevance matrix posterior regularization
37
Early diagnosis of Rheumatoid Arthritis Synovial expression of CXCL4 and CXCL7 by macrophages during early inflammatory arthritis predicts progression to rheumatoid arthritis (in preparation) L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner Rheumatology Research Group, Univ. of Birmingham, UK
38
WSOM 2014 38 Rheumatoid Arthritis Rheumatoid Arthritis (RA) -chronicle inflammatory disease -immune system affects joints -RA leads to deformation and disability
39
WSOM 2014 mRNA extractionreal-time PCRtissue section synovium panel of 117 cytokines cell signaling proteins regulate immune response produced by, e.g. T-cells, macrophages, lymphocytes, fibroblasts, etc. synovial tissue cytokine expression
40
WSOM 2014 40 uninflamed control (n=9) established RA (n=12) early inflammation resolving (n=9) early RA (n=17) cytokine based diagnosis of RA at earliest possible stage ? long term goals: understand pathogenesis and mechanism of progression synovial tissue cytokine expression
41
WSOM 2014 GMLVQ analysis pre-processing: log-transformed expression values (117 dim. data, 47 samples in total) 21 leading principal components explain 95% of the variation Two two-class problems: (A) established RA vs. uninflamed controls (B) early RA vs. resolving inflammation 1 prototype per class, global relevance matrix, distance measure: leave-two-out validation (one from each class) evaluation in terms of Receiver Operating Characteristics
42
WSOM 2014 false positive rate true positive rate diagonal Λ ii vs. cytokine index i established RA vs. uninflamed control early RA vs. resolving inflammation initialization of relevances as prior knowledge relevant cytokines
43
WSOM 2014 PF4 (platelet factor 4) = CXCL4 chemokine (C-X-C motif) ligand 4 PPBP (pro-platelet basic protein) = CXCL7 chemokine (C-X-C motif) ligand 7 cytokines associated with platelets (historically), also produced by other cell types direct study on protein level, imaging of sinovial tissue with co-staining for CD41 platelets CD68 macrophages ------------- here: predominant source of CXCL4/7 expression vWF vascular endothelium protein level studies high levels of CXCL4 and CXLC7 in the first 12 weeks of synovitis (less pronounced later) cytokines potentially important for - disease progression - early diagnosis, outcome pred. expression on macrophages outside of blood vessels discriminates early RA / resolving
44
WSOM 2014 false positive rate true positive rate diagonal Λ ii vs. cytokine index i established RA vs. uninflamed control early RA vs. resolving inflammation initialization of relevances relevant cytokines macrophage stimulating 1
45
WSOM 2014 45 What next ? just two (selected) on-going projects MIWOCI poster session Improved interpretation of linear mappings with B. Frenay, D. Hofmann, A. Schulz, B. Hammer minimal / maximal feature relevances by null-space contributions at constant (minimal) L1-norm of Ω rows Optimization of Receiver Operating Characteristics with M. Kaden, P. Stürmer, T. Villmann statistical interpretation of AUC (ROC) allows for direct optimization based on pairs of examples (one from each class)
46
WSOM 2014 46 http://matlabserver.cs.rug.nl/gmlvqweb/web/ Matlab collection: Relevance and Matrix adaptation in Learning Vector Quantization (GRLVQ, GMLVQ and LiRaM LVQ) http://www.cs.rug.nl/~biehl/ links Pre/re-prints etc.:
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.