Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration.

Similar presentations


Presentation on theme: "Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration."— Presentation transcript:

1 Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration with Barbara Hammer (Clausthal), Anarta Ghosh (Groningen)

2 Dagstuhl Seminar, 25.03.2007 Outline  Vector Quantization (VQ)  Analysis of VQ Dynamics  Learning Vector Quantization (LVQ)  Summary

3 Dagstuhl Seminar, 25.03.2007 Vector Quantization Objective: representation of (many) data with (few) prototype vectors Assign data ξ μ to nearest prototype vector w j (by a distance measure, e.g. Euclidean) grouping data into clusters e.g. for classification data distance to nearest prototype Find optimal set W for lowest quantization error

4 Dagstuhl Seminar, 25.03.2007 Example: Winner Takes All (WTA) initialize K prototype vectors present a single example identify the closest prototype, i.e the so-called winner move the winner even closer towards the example stochastic gradient descent with respect to a cost function prototypes at areas with high density of data

5 Dagstuhl Seminar, 25.03.2007 Problems Winner Takes All “winner takes most”: update according to “rank” e.g. Neural Gas sensitive to initialization less sensitive to initialization?

6 Dagstuhl Seminar, 25.03.2007 (L)VQ algorithms intuitive fast, powerful algorithms flexible limited theoretical background w.r.t. convergence speed, robustness to initial conditions, etc. Analysis of VQ Dynamics exact mathematical description in very high dimensions study of typical learning behavior

7 Dagstuhl Seminar, 25.03.2007 Model: two Gaussian clusters of high dimensional data Random vectors ξ ∈ ℝ N according to prior prob.: p +, p - p + + p - = 1 B+B+ B-B- (p+)(p+) (p-)(p-) ℓ separable in projection to (B +, B - ) plane (p+)(p+) (p-)(p-) not separable on other planes cluster centers: B +, B - ∈ ℝ N variance: υ +, υ - separation ℓ only separable in 2 dimensions  simple model, but not trivial classes: σ = {+1,-1}

8 Dagstuhl Seminar, 25.03.2007 sequence of independent random data learning rate, step size strength, direction of update etc. move prototype towards current data update of prototype vector prototype class data class “winner” f s […] describes the algorithm used Online learning w s ∈ ℝ N

9 Dagstuhl Seminar, 25.03.2007 projections to cluster centers length and overlap of prototypes 1. Define few characteristic quantities of the system random vector ξ μ enters as projections 2. Derive recursion relations of the quantities for new input data 3. Calculate average recursions

10 Dagstuhl Seminar, 25.03.2007 In the thermodynamic limit N  ∞... average over examples characteristic quantities self average w.r.t. random sequence of data (fluctuations vanish) the projections become correlated Gaussian quantities  completely specified in terms of first and second moments: define continuous learning time μ : discrete (1,2,…,P) t : continuous

11 Dagstuhl Seminar, 25.03.2007 4. Derive ordinary differential equations 5. Solve for R sσ (t), Q st (t) dynamics/asymptotic behavior (t  ∞) quantization/generalization error sensitivity to initial conditions, learning rates, structure of data

12 Dagstuhl Seminar, 25.03.2007 Q 11 Q 22 Q 12 Results VQ 2 prototypes R 1+ R 2- R 2+ R 1- w s winner Numerical integration of the ODEs ( w s (0)≈0 p + =0.6, ℓ=1.0, υ + =1.5, υ - =1.0,  =0.01) E(W) t characteristic quantities quantization error

13 Dagstuhl Seminar, 25.03.2007 B+B+ B-B- ℓ R S+ R S- 2 prototypes Projections of prototypes on the B+,B- plane at t=50 R S+ R S- p + > p - Two prototypes move to the stronger cluster 3 prototypes

14 Dagstuhl Seminar, 25.03.2007 Neural Gas: a winner take most algorithm 3 prototypes update strength decreases exponentially by rank R S+ R S- quantization error E(W) t λ i =2; λ f =10 -2 λ(t) large initially, decreased over time λ(t)  0: identical to WTA t=0 t=50

15 Dagstuhl Seminar, 25.03.2007 Sensitivity to initialization at t=50 t=0 Neural GasWTA at t=50 R S+ R S- R S+ R S- Neural Gas: more robust w.r.t. initialization WTA: (eventually) reaches minimum E(W) depends on initialization: possible large learning time E(W) t “plateau” ∇ H VQ ≈0

16 Dagstuhl Seminar, 25.03.2007 Learning Vector Quantization (LVQ) Objective: classification of data using prototype vectors    Find optimal set W for lowest generalization error misclassified by nearest prototype Assign data {ξ,σ}; ξ ∈ ℝ N to nearest prototype vector (distance measure, e.g. Euclidean)

17 Dagstuhl Seminar, 25.03.2007 w s winner ±1 LVQ1 c={+1, -1} R S+ R S- two prototypes c={+1,+1,-1} R S+ R S- three prototypes c={+1,-1,-1} R S+ R S- which class to add the 3 rd prototype? update winner towards/ away from data no cost function related to generalization error

18 Dagstuhl Seminar, 25.03.2007 Generalization error p + =0.6, p - = 0.4 υ + =1.5, υ - =1.0 εgεg t class misclassified data

19 Dagstuhl Seminar, 25.03.2007 Optimal decision boundary B+B+ B-B- (p + >p - ) (p - ) ℓ d equal variance (υ + =υ - ): linear decision boundary unequal variance υ + >υ - K=2 optimal with K=3 more prototypes  better approximation to optimal decision boundary (hyper)plane where

20 Dagstuhl Seminar, 25.03.2007 Asymptotic ε g p+p+ Optimal: K=3 better LVQ1: K=3 better best: more prototypes on the class with the larger variance more prototypes not always better for LVQ1 υ + >υ - (υ + =0.81, υ - =0.25) ε g (t  ∞ ) c={+1,+1,-1} p+p+ εgεg c={+1,-1,-1} Optimal: K=3 equal to K=2 LVQ1: K=3 worse ε g (t  ∞ )

21 Dagstuhl Seminar, 25.03.2007 Summary  dynamics of (Learning) Vector Quantization for high dimensional data  Neural Gas: more robust w.r.t. initialization than WTA  LVQ1: more prototypes not always better Outlook  study different algorithms e.g. LVQ+/-, LFM, RSLVQ  more complex models  multi-prototype, multi-class problems Reference Dynamics and Generalization Ability of LVQ Algorithms M. Biehl, A. Ghosh, and B. Hammer Journal of Machine Learning Research (8): 323-360 (2007) http://jmlr.csail.mit.edu/papers/v8/biehl07a.html

22 Questions ?

23 Dagstuhl Seminar, 25.03.2007

24 identify the closest prototype, i.e the winner initialize K prototype vectors with classes ς move the winner towards/away from the example if prototype class is correct/wrong correct class example: LVQ1 present a single example incorrect class

25 Dagstuhl Seminar, 25.03.2007 Central Limit Theorem Let x 1, x 2,…, x N be independent random numbers from arbitrary probability distribution with mean and finite variance The distribution of the average of x j approaches a normal distribution as N becomes large. N=1 N=2 N=5N=50 Example: non- normal distribution Distribution of average of x j : p(x j )

26 Dagstuhl Seminar, 25.03.2007 Self Averaging Monte Carlo simulations over 100 independent runs Fluctuations decreases with larger degree of freedom N At N  ∞, fluctuations vanish (variance becomes zero)

27 Dagstuhl Seminar, 25.03.2007 “LVQ +/-” d s = min {d k } with c s = σ μ update correct and incorrect winners d t = min {d k } with c t ≠σ μ t t strongly divergent! p + >> p - : strong repulsion by stronger class to overcome divergence: e.g. early stopping (difficult in practice) stop at ε g (t)=ε g,min ε g (t)

28 Dagstuhl Seminar, 25.03.2007 Comparison LVQ1 and LVQ +/- υ + = υ - =1.0 LVQ1 outperforms LVQ+/- with early stopping c={+1,+1,-1} p+p+ υ + = 0.81, υ - =0.25 LVQ+/- with early stopping outperforms LVQ1 in a certain p + interval p+p+ LVQ+/-  performance depends on initial conditions


Download ppt "Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration."

Similar presentations


Ads by Google