Michael Biehl Mathematics and Computing Science University of Groningen / NL Prototype-based classifiers and their applications in the life-sciences www.cs.rug.nl/~

Michael Biehl Mathematics and Computing Science University of Groningen / NL Prototype-based classifiers and their applications in the life-sciences www.cs.rug.nl/~ biehl

Michael Biehl Mathematics and Computing Science University of Groningen / NL LVQ and Relevance Learning frequently asked questions and rarely given answers www.cs.rug.nl/~ biehl

WSOM 2014 frequently asked questions So, why do you still do this LVQ-stuff ?

WSOM 2014 basics: distance based classifiers, relevance learning What about the curse of dimensionality ? How do you find a good distance measure ? example: Generalized Matrix LVQ What about over-fitting ? Is the relevance matrix unique ? Is it useful in practice ? frequently asked questions applications: bio-medical data adrenal tumors, rheumatoid arthritis outlook: What‘s next ?

WSOM 2014 K-NN classifier a simple distance-based classifier - store a set of labeled examples - classify a query according to the label of the Nearest Neighbor (or the majority of K NN) ? - piece-wise linear class borders parameterized by all examples feature space + conceptually simple, no training required, one parameter (K) - expensive storage and computation, sensitivity to “outliers” can result in overly complex decision boundaries

WSOM 2014 prototype based classification a prototype based classifier - represent the data by one or several prototypes per class - classify a query according to the label of the nearest prototype (or alternative schemes) - piece-wise linear class borders parameterized by prototypes feature space + less sensitive to outliers, lower storage needs, little computational effort in the working phase - training phase required in order to place prototypes, model selection problem: number of prototypes per class, etc. ?

WSOM 2014 set of prototypes carrying class-labels based on dissimilarity / distance measure nearest prototype classifier (NPC): given - determine the winner with Nearest Prototype Classifier minimal requirements: - assign to class standard example: squared Euclidean

WSOM 2014 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors initialize prototype vectors for different classes competititve learning: LVQ1 [Kohonen, 1990] identify the winner (closest prototype) present a single example move the winner - closer towards the data (same class) - away from the data (different class)

WSOM 2014 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors ∙ tesselation of feature space [piece-wise linear] ∙ distance-based classification [here: Euclidean distances] ∙ generalization ability correct classification of new data ∙ aim: discrimination of classes ( ≠ vector quantization or density estimation ) 

WSOM 2014 What about the curse of dimensionality ? concentration of norms/distances for large N „distance based methods are bound to fail in high dimensions“ ? LVQ: - prototypes are not just random data points - carefully selected representatives of the data - distances of a given data point to prototypes are compared projection to non-trivial low-dimensional subspace! [Ghosh et al., 2007, Witoelar et al., 2010] models of LVQ training, analytical treatment in the limit successful training needs training examples see also:

WSOM 2014 cost function based LVQ one example: Generalized LVQ [Sato & Yamada, 1995] sigmoidal (linear for small arguments), e.g. E approximates number of misclassifications linear E favors large margin separation of classes, e.g. two winning prototypes: minimize small, large E favors class-typical prototypes

WSOM 2014 cost function based LVQ There is nothing objective about objective functions J. McClelland

WSOM 2014 GLVQ training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negative, differentiable distance requirement: update decreases, increases

WSOM 2014 GLVQ training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negative, differentiable distance

WSOM 2014 GLVQ training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on Euclidean distance moves prototypes towards / away from sample with prefactors

WSOM 2014 16 fixed distance measures: - select distance measures according to prior knowledge - data driven choice in a preprocessing step - compare performance of various measures example: divergence based LVQ (DLVQ) Mwebaze et al., Neurocomputing (2011) relevance learning: - employ a parameterized distance measure - update its parameters in the training process together with prototype training - adaptive, data driven dissimilarity example: Matrix Relevance LVQ What is a good distance measure ?

WSOM 2014 Relevance Matrix LVQ generalized quadratic distance in LVQ: variants: one global, several local, class-wise relevance matri ces Λ (j) → piecewise quadratic decision boundaries rectangular discriminative low-dim. representation e.g. for visualization [Bunte et al., 2012] possible constraints: rank-control, sparsity, … diagonal matrices: single feature weights [Bojer et al., 2001] [Hammer et al., 2002] [Schneider et al., 2009]

WSOM 2014 Relevance Matrix LVQ Generalized Matrix-LVQ (GMLVQ) optimization of prototypes and distance measure

WSOM 2014 19 heuristic interpretation summarizes - the contribution of the original dimension - the relevance of original features for the classification interpretation assumes implicitly: features have equal order of magnitude e.g. after z-score-transformation → (averages over data set) standard Euclidean distance for linearly transformed features

Classification of adrenal tumors Wiebke Arlt, Angela Taylor Dave J. Smith, Peter Nightingale P.M. Stewart, C.H.L. Shackleton et al. Petra Schneider Han Stiekema Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen School of Medicine Queen Elizabeth Hospital University of Birmingham/UK (+ several centers in Europe) [Arlt et al., J. Clin. Endocrinology & Metabolism, 2011] [Biehl et al., Europ. Symp. Artficial Neural Networks (ESANN), 2012]

WSOM 2014 ∙ adrenocortical tumors, difficult differential diagnosis: ACC: adrenocortical carcinomas ACA: adrenocortical adenomas ∙ idea: steroid metabolomics tumor classification based on urinary steroid excretion 32 candidate steroid markers: adrenocortical tumors

WSOM 2014 Generalized Matrix LVQ, ACC vs. ACA classification ∙ data divided in 90% training, 10% test set ∙ determine prototypes typical profiles (1 per class) ∙ apply classifier to test data evaluate performance (error rates, ROC) ∙ adaptive generalized quadratic distance measure parameterized by ∙ repeat and average over many random splits adrenocortical tumors data set: 24 hrs. urinary steroid excretion 102 patients with benign ACA 45 patients with malignant ACC

WSOM 2014 prototypes ACA ACC log-transformed steroid excretion in ACA/ACC rescaled using healthy control group values

WSOM 2014 off-diagonal diagonal elements subset of 9 selected steroids ↔ technical realization (patented, University of Birmingham/UK) Relevance matrix adrenocortical tumors

WSOM 2014 highly discriminative combination of markers ! weakly discriminative markers adrenocortical tumors 5a-THA (8) TH-Doc (12)

WSOM 2014 ROC characteristics clear improvement due to adaptive distances (1-specificity) (sensitivity) 8 GMLVQ GRLVQ diagonal rel. Euclidean full matrix AUC 0.87 0.93 0.97 adrenocortical tumors

WSOM 2014 27 frequently asked questions How relevant are the relevances ? What about over-fitting ? matrices introduce O(N 2 ) additional adaptive parameters! Is the relevance matrix unique ? - uniqueness of parameterization (Ω for given Λ) ? - uniqueness of the relevance matrix Λ ? - interpretation of relevance matrix (  uniqueness)

WSOM 2014 What about over-fitting ? observation: low rank of resulting relevance matrix effective # of degrees of freedom ~ N eigenvalues in ACA/ACC classification columns of stationary Ω T are vectors in the eigenspace associated with the smallest eigenvalue of the pseudo-covariance mathematics: stationarity conditions Γ -not necessarily positive -depends on Ω itself -cannot be determined prior to training Biehl et al. Machine Learning Reports (2009) in preparation (forever)

WSOM 2014 by-product: low-dim. representation projection on first eigenvector of Λ projection on 2nd eigenvector control benign malignant

WSOM 2014 30 Is the relevance matrix unique ? (I) uniqueness of Ω, given Λ matrix square root is not unique irrelevant rotations, reflections, symmetries…. canonical representation in terms of eigen-decomposition of Λ: - pos. semi-definite -symmetric

WSOM 2014 31 (II) uniqueness given transformation: are in the null-space of is possible if the rows of → identical mapping of examples, different for possible to extend by prototypes is singular if features are correlated, dependent Is the relevance matrix unique ?

WSOM 2014 32 regularization training process yields determine with eigenvectors and eigenvalues regularization: (K>J ) retains the eigenspace corresponding to largest eigenvalues removes also span of small non-zero eigenvalues (K=J ) removes null-space contributions, unique solution with minimal Euclidean norm of row vectors

WSOM 2014 33 regularization regularized mapping after/during training pre-processing of data (PCA-like) mapped feature space fixed K prototypes yet unknown retains original features flexible K may include prototypes Strickert, Hammer, Villmann, Biehl, IEEE SCCI 2013 Regularization and improved interpretation of linear data mappings and adaptive distance measures

WSOM 2014 34 illustrative example infra-red spectral data: 124 wine spamples 256 wavelengths 30 training data 94 test spectra alcohol content high low medium GMLVQ classification

WSOM 2014 35 GMLVQ best performance 7 dimensions remaining over-fitting effect null-space correction P=30 dimensions

WSOM 2014 36 original regularized regularization - enhances generalization - smoothens relevance profile/matrix - removes ‘false relevances’ - improves interpretability of Λ raw relevance matrix posterior regularization

Early diagnosis of Rheumatoid Arthritis Synovial expression of CXCL4 and CXCL7 by macrophages during early inflammatory arthritis predicts progression to rheumatoid arthritis (in preparation) L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner Rheumatology Research Group, Univ. of Birmingham, UK

WSOM 2014 38 Rheumatoid Arthritis Rheumatoid Arthritis (RA) -chronicle inflammatory disease -immune system affects joints -RA leads to deformation and disability

WSOM 2014 mRNA extractionreal-time PCRtissue section synovium panel of 117 cytokines cell signaling proteins regulate immune response produced by, e.g. T-cells, macrophages, lymphocytes, fibroblasts, etc. synovial tissue cytokine expression

WSOM 2014 40 uninflamed control (n=9) established RA (n=12) early inflammation resolving (n=9) early RA (n=17) cytokine based diagnosis of RA at earliest possible stage ? long term goals: understand pathogenesis and mechanism of progression synovial tissue cytokine expression

WSOM 2014 GMLVQ analysis pre-processing: log-transformed expression values (117 dim. data, 47 samples in total) 21 leading principal components explain 95% of the variation Two two-class problems: (A) established RA vs. uninflamed controls (B) early RA vs. resolving inflammation 1 prototype per class, global relevance matrix, distance measure: leave-two-out validation (one from each class) evaluation in terms of Receiver Operating Characteristics

WSOM 2014 false positive rate true positive rate diagonal Λ ii vs. cytokine index i established RA vs. uninflamed control early RA vs. resolving inflammation initialization of relevances as prior knowledge relevant cytokines

WSOM 2014 PF4 (platelet factor 4) = CXCL4 chemokine (C-X-C motif) ligand 4 PPBP (pro-platelet basic protein) = CXCL7 chemokine (C-X-C motif) ligand 7 cytokines associated with platelets (historically), also produced by other cell types direct study on protein level, imaging of sinovial tissue with co-staining for CD41  platelets CD68  macrophages ------------- here: predominant source of CXCL4/7 expression vWF  vascular endothelium protein level studies high levels of CXCL4 and CXLC7 in the first 12 weeks of synovitis (less pronounced later) cytokines potentially important for - disease progression - early diagnosis, outcome pred. expression on macrophages outside of blood vessels discriminates early RA / resolving

WSOM 2014 false positive rate true positive rate diagonal Λ ii vs. cytokine index i established RA vs. uninflamed control early RA vs. resolving inflammation initialization of relevances relevant cytokines macrophage stimulating 1

WSOM 2014 45 What next ? just two (selected) on-going projects  MIWOCI poster session Improved interpretation of linear mappings with B. Frenay, D. Hofmann, A. Schulz, B. Hammer minimal / maximal feature relevances by null-space contributions at constant (minimal) L1-norm of Ω rows Optimization of Receiver Operating Characteristics with M. Kaden, P. Stürmer, T. Villmann statistical interpretation of AUC (ROC) allows for direct optimization based on pairs of examples (one from each class)

WSOM 2014 46 http://matlabserver.cs.rug.nl/gmlvqweb/web/ Matlab collection: Relevance and Matrix adaptation in Learning Vector Quantization (GRLVQ, GMLVQ and LiRaM LVQ) http://www.cs.rug.nl/~biehl/ links Pre/re-prints etc.:

Michael Biehl Mathematics and Computing Science University of Groningen / NL Prototype-based classifiers and their applications in the life-sciences www.cs.rug.nl/~

Similar presentations

Presentation on theme: "Michael Biehl Mathematics and Computing Science University of Groningen / NL Prototype-based classifiers and their applications in the life-sciences www.cs.rug.nl/~"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Michael Biehl Mathematics and Computing Science University of Groningen / NL Prototype-based classifiers and their applications in the life-sciences www.cs.rug.nl/~

Similar presentations

Presentation on theme: "Michael Biehl Mathematics and Computing Science University of Groningen / NL Prototype-based classifiers and their applications in the life-sciences www.cs.rug.nl/~"— Presentation transcript:

Similar presentations

About project

Feedback