Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stabil07 03/10/2007 1 Michael Biehl Intelligent Systems Group University of Groningen Rainer Breitling, Yang Li Groningen Bioinformatics Centre Analysis.

Similar presentations


Presentation on theme: "Stabil07 03/10/2007 1 Michael Biehl Intelligent Systems Group University of Groningen Rainer Breitling, Yang Li Groningen Bioinformatics Centre Analysis."— Presentation transcript:

1 Stabil07 03/10/2007 1 Michael Biehl Intelligent Systems Group University of Groningen Rainer Breitling, Yang Li Groningen Bioinformatics Centre Analysis of tiling microarray data by Learning Vector Quantization and Relevance Learning

2 Stabil07 03/10/2007 2 Learning Vector Quantization - introduction to prototype learning and LVQ - distance based classification - basic training prescription: LVQ1 Adaptive metrics and relevance learning - weighted Euclidean distance, relevance learning - feature weighting, feature selection - adaptive relevance matrices in LVQ Outline Summary / Outlook Example: Intron/Exon classification - based on tiling microarray data - application of standard LVQ1 (fixed metric)

3 Stabil07 03/10/2007 3 identify the closest prototype, i.e the so-called winner initialize prototype vectors for different classes present a single example move the winner - closer towards the data (same class) - away from the data (different class) classification: assignment of a vector  to the class of the closest prototype w    aim: generalization ability classification of novel data after learning from examples Learning Vector Quantization (LVQ) - identification of prototype vectors from labelled example data - parameterization of distance based classification (e.g. Euclidean) example: basic LVQ scheme “LVQ1” [Kohonen] often: heuristically motivated variations of competitive learning 

4 Stabil07 03/10/2007 4 identify the closest prototype, i.e the so-called winner initialize prototype vectors for different classes present a single example move the winner - closer towards the data (same class) - away from the data (different class) classification: assignment of a vector  to the class of the closest prototype w    aim: generalization ability classification of novel data after learning from examples Learning Vector Quantization (LVQ) - identification of prototype vectors from labelled example data - parameterization of distance based classification (e.g. Euclidean) example: basic LVQ scheme “LVQ1” [Kohonen] often: heuristically motivated variations of competitive learning  piecewise linear decision boundaries

5 Stabil07 03/10/2007 5 set of prototypes ℝNℝN representing class based on similarity/distance measure nearest prototype classifier given feature vector ξ, determine the winner → assign ξ to class S i* formally: examples: squared Euclidean distance Manhattan distance

6 Stabil07 03/10/2007 6 LVQ1 training: sequential presentation of labelled examples … the winner takes it all: η w : learning rate, step size of update (repulsion/attraction) many variants/modifications: - learning rate schedule η w (t) - update of more than one prototype - more general update functions Ψ(S,σ,w i*,…) randomized initial w k, e.g. close to the class-conditional means

7 Stabil07 03/10/2007 7 LVQ algorithms... - frequently applied in a variety of practical problems - plausible, intuitive, flexible - fast, easy to implement - natural tool for multi-class problems - often based on purely heuristic arguments or cost functions with unclear relation to classification error - limited theoretical understanding of convergence etc. - important issue: which is the ‘right’ distance measure ? Relevance Learning: adaptation of the metrics / distance measure during training here: applied in a (non-standard) classification problem from bioinformatics

8 Stabil07 03/10/2007 8 Gene expression c/o R. Breitling

9 Stabil07 03/10/2007 9 Genomic tiling array data C. elegans A C T T A C A A G G A G T C T A G G C A … C A T T A C G A C T sequencecovered by ‘path’ or ‘tiling’ microarray: transcription intensity vs. genomic position repeated for many samples: - different developmental stages - varying external conditions - different strains (variants) - mutants … perfect match intensities mismatch intensities G A G T C T A G G G A G T G T A G G (PM) (MM)

10 Stabil07 03/10/2007 10 position on genome 21 samples + mutant probe intensities (4120) genomic positions, classified as intronic / exonic c/o R. Breitling, Y. Li exons: transcribed → mRNA → translated → protein introns: transcribed → (pre-) mRNA but spliced out before leaving the nucleus, → no translation Wikipedia: non-coding DNA inside a gene

11 Stabil07 03/10/2007 11 position on genome 21 samples + mutant probe intensities Note: class membership labels according to the current genome annotation, true introns / exons are not exactly known! Aim: identify false introns = potential new exons (or even genes) ? c/o R. Breitling, Y. Li 24 features constructed from ‘raw data’ including: median PM and MM probe intensities, correlations of neighboring genome positions, melting temperatures, … (4120) genomic positions, classified as intronic / exonic

12 Stabil07 03/10/2007 12 4120 labelled vectors (2587 from class ”0”, 1533 from class ”1” ) 24 features (real numbers), z-transformed: example feature vectors class 1 class 0 feature #

13 Stabil07 03/10/2007 13 4120 labelled vectors (2587 from class ”0”, 1533 from class ”1” ) class conditional mean vectors class 1 class 0 evaluation scheme: training from 3000 examples testing on 1000 examples (avg. over >10 random permutations) error rates: all class 0 class 1 training test → (Manhattan) distance based classifier feature # 24 features (real numbers), z-transformed:

14 Stabil07 03/10/2007 14 class 1 class 0 ccm LVQ1 compared to ccm-prototypes: - LVQ1 exaggerates differences between the classes - here: almost identical performance one prototype per class LVQ1 training: several prototypes per class - increased complexity - improved performance 2+1 3+3 6+6 - possible over-fitting (?) (low training, high test error) due to highly specialized w i η w =0.01 epochs test error

15 Stabil07 03/10/2007 15 2+1 prototypes1+2 prototypes all class 0 class 1 test all class 0 class 1 26% 29% 45% 12% 20% 68% class 0 class 1 score (→ place more prototypes in class with greater variability)

16 Stabil07 03/10/2007 16 class 1 class 0 4 % 5 % 7 % 6 % 12 % 10 % 7 % 10 %18 % Prototypes and scores (6+6 prototypes)

17 Stabil07 03/10/2007 17 Adaptive Distance Measures – Relevance Learning scaled features, e.g. modified Euclidean (or Manhattan,…) distances: global relevances local relevances class-wise relev. Euclidean global rel.local/class-wise

18 Stabil07 03/10/2007 18 Adaptive Distance Measures – Relevance Learning LVQ-training + adaptation of relevances, e.g. heuristic RLVQ [Bojer, Hammer et al., 2001] → determine winning prototype update winner as in LVQ1 winner is correct, contribution δ j is large small → λ j { { decreases increases update (global) relevances enforce winner is wrong, contribution δ j is large small → λ j { { increases decreases - weighting/ranking of features → better performance - elimination of noisy/irrelevant features → reduced complexity insight into the data / classification problem

19 Stabil07 03/10/2007 19 (1+1) prototypes, global relevance learning η w =10 -2 η λ =10 -4 training / test (ccm/LVQ1) relevance profile improved performance by weighting and selection of features training and test error ~ 0.115 over-simplified classification ( ≠ overfitting) training and test error > 0.15 relevance profile successful learning requires η λ ≪ η w epochs

20 Stabil07 03/10/2007 20 (6+6) prototypes, global relevance learning η w =10 -2 η λ =10 -5 training / test (ccm/LVQ1) relevance profile improved performance by weighting and selection of features training and test error < 0.11 over-simplified classification training and test error ~ 0.115 relevance profile

21 Stabil07 03/10/2007 21 the data revisited: global relevances relevance profile significance of strain and stage effects perfect match (p.m.) intensities of the probe itself (3) and of neighboring probes mismatch (m.m) intensities p.m. correlations of probe with neighbors m.m. correlations of probe with neighbors melting temperature (G-C content) of probe and its neighbors (median over all samples)

22 Stabil07 03/10/2007 22 (2+1) prototypes, local relevances test error η w =10 -2 η λ =10 -5 class 1 class 0 prototypes relevances epochs

23 Stabil07 03/10/2007 23 (2+1) prototypes, local relevances test error η w =10 -2 η λ =10 -5 class 1 class 0 prototypes relevances determine the minimum of… →class 1 →class 0 epochs very simple classifier:

24 Stabil07 03/10/2007 24 class 0 the data revisited: local or class-wise relevances class 1 significance of strain and stage effects perfect match (p.m.) intensities of the probe itself (3) and of neighboring probes mismatch (m.m) intensities p.m. correlations of probe with neighbors m.m. correlations of probe with neighbors melting temperature (G-C content) of probe and its neighbors important for exon identification! (median over all samples)

25 Stabil07 03/10/2007 25 Adaptive Metrics extended: Relevance Matrices generalized quadratic distance: Λ takes into account correlations between features Ω adaptive, instantaneous linear transformation of feature space enforce variants: global, class-wise, local relevance matrices relevance update: [Schneider, Biehl, Hammer, 2007]

26 Stabil07 03/10/2007 26 Preliminary results: 2+1 prototypes, global relevance matrix off-diagonal Λ jk training / test error epochs η w =10 -2 η Ω =10 -6 eigenvalues 5-dim. subspace diagonal Λ ii <“6+6”

27 Stabil07 03/10/2007 27 p-value p.m. intensities m.m. intensitiesp.m. correlationsm.m. correlations melting temp. p-value p.m. intensities m.m. intensities p.m. correlations m.m. correlations melting temp. global relevance matrix

28 Stabil07 03/10/2007 28 2+1 prototypes, class wise relevance matrices class 1 class 0

29 Stabil07 03/10/2007 29 Summary ( LVQ ) LVQ classifiers + easy to interpret, distance based schemes + parameterized in terms of typical data + natural tool for multi-class problems + suitable for large amounts of data - standard problems of model selection, parameter tuning,… - choice of appropriate metrics Relevance Learning + adapts distance measure while training prototypes + facilitates significant improvement of performance + can simplify the classifier drastically + Matrix RLVQ can take into account correlations - may suffer from over-simplification effects

30 Stabil07 03/10/2007 30 Outlook ( LVQ ) Relevance Learning - put forward Matrix method, apply in different contexts (→ P. Schneider) - theoretical analysis (dynamics, convergence properties) - regularization (here: early stopping) - feature construction (beyond weighting/selection) - advertise in bioinformatics community e.g. R –implementation of RLVQ (Yang Li, in preparation)

31 Stabil07 03/10/2007 31 Summary ( biology ) classification of exonic/intronic gene sequences - weighting / selection of features leads to improvement and/or simplification of classifier - plausible results when forced to over-simplify - importance of p.m. correlations for exon identification (novel set of features suggested by Breitling et al. ) Outlook ( biology ) - systematic study of matrix method (correlations between features) - extension to whole-genome tiling data (millions of probes!) - different organisms and technological platforms - analysis of raw data before heuristic construction of features - investigation of false introns

32 Stabil07 03/10/2007 32 Thanks!

33 Stabil07 03/10/2007 33 The current “LVQ group” in Groningen Petra Schneider W. Storteboom Fabio Bracci Piter Pasma Gert-Jan de Vries Caesar Ogole Aree Witoelar M.B. Julius Kidubuka (Kerstin Bunte)


Download ppt "Stabil07 03/10/2007 1 Michael Biehl Intelligent Systems Group University of Groningen Rainer Breitling, Yang Li Groningen Bioinformatics Centre Analysis."

Similar presentations


Ads by Google