Presentation is loading. Please wait.

Presentation is loading. Please wait.

PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing.

Similar presentations


Presentation on theme: "PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing."— Presentation transcript:

1 PCA NETWORK Unsupervised Learning NEtWORKS

2 PCA is a Representation Network useful for signal, image, video processing

3 In order to analyze multi-dimensional input vectors, a representation with maximum information is the principal component analysis (PCA). PCA per component: extract most significant features, inter-component: avoid duplication or redundancy between the neurons. PCA NEtWORKS

4 R x  Ř x = (1/M ) Σ t x(t)x t (t) An estimate of the autocorrelation matrix by taking the time average over the sample vectors: R x = UΛU t

5 the optimal matrix W is formed by the first m singular vectors of R x. x(t) = W a(t)  the errors of the optimal estimate are [Jain89]: matrix-2-norm error = λ m+1 least-mean-square error = Σ i n =m+1 λ i

6 to enhance the correlation between the input x(t) and the extracted component a(t), it is natural to use a Hebbian-type rule: w(t+1) = w(t) + β x(t)a(t) a(t) = w(t) t x(t) First PC

7 the Oja learning Rule is equivalent to a normalized Hebbian rule. (Show procedure!!) Δw(t) = β [x(t)a(t) - w(t) a(t) 2 ] Oja Learning Rule

8

9 By the Oja learning rule, w(t) converges asymptotically (with probability 1) to Convergence theorem: Single Component w = w(∞) = e 1 where e 1 is the principal eigenvector of R x

10 Δw(t) = β [x(t)a(t) - w(t) a(t) 2 ] Proof: Δw(t) = β [x(t)x’(t)w(t) - a(t) 2 w(t)] Δw(ť) = β [R x - σ(ť)I] w(ť) Δw(ť) = β [UΛU T - σ(ť)I] w(ť) Δw(ť) = β U[Λ - σ(ť)I] U T w(ť) ΔU T w(ť) = β [Λ - σ(ť)I] U T w(ť) ΔΘ(ť) = β [Λ - σ(ť)I] Θ (ť) take average over a block of data, and redenote ť as the block time index:

11 the relative dominance of the principle component grows, with a growth rate: Convergence Rates Each of the eigen-components is enhanced/dampened by θ i (ť+1) = [1+β' λ i - β' σ(ť)] θ i (ť) (1+β' [λ i -σ(ť)])/(1+β' [λ 1 - σ(ť)]) Θ(ť) = [θ 1 (ť) θ 2 (ť) … θ n (ť)] T

12 Simulation: Decay Rates of PCs

13 Multiple Principal Components How to extract

14 Let W denote a n  m weight matrix ΔW(t) = β [x(t) - W(t) a(t)] a(t) t Concern on duplication/redundancy

15 Assume that the first component is already obtained; then the output value can be ``deflated'' by the following transformation: Deflation Method x = (I- w 1 w’ 1 ) x ˜

16 the basic idea is to allow the old hidden units to influence the new units so that the new ones do not duplicate information (in full or in part) already provided by the old units. By this approach, the deflation process is effectively implemented in an adaptive manner. Lateral Orthogonalization Network

17

18 APEX Network (multiple PCs)

19 Δα ij (t) = β [ a i (t) a j (t) - α ij (t) a i (t) 2 ] APEX: Adaptive Principal-component Extractor Δw i (t) = β [ x(t)a i (t) - w i (t) a i (t) 2 ] the Oja Rule: for i-th component (e.g. i=2) Dynamic Orthogonalization Rule (e.g. i=2,j=1)

20 the Hebbian weight matrix W(t) in APEX converges asymptotically to a matrix formed by the m largest principal components. Convergence theorem: Multiple Components the weight matrix W(t) converges to (with probability 1), W(∞) = W where W is the matrix formed by m row vectors w i t, w i = w i (∞) = e i

21 Δα(t) = β [ a 1 (t) a 2 (t) - α(t) a 2 (t) 2 ] Δw 2 (t) = β [ x(t)a 2 (t) – w 2 (t) a 2 (t) 2 ] w’ 1 Δw 2 (t) = β [w’ 1 x(t)a 2 (t) – w’ 1 w 2 (t) a 2 (t) 2 ] Δw’ 1 w 2 (t) = β [a 1 (t)a 2 (t) – w’ 1 w 2 (t) a 2 (t) 2 ] Δ[w’ 1 w 2 (t)- Δα(t)] = β[ w’ 1 w 2 (t) -α(t)]a 2 (t) 2 α(t)→w ’ 1 w 2 (t) a 2 (t) = x’ (t)w 2 (t) - α(t)a 1 (t) = x’ (t) [I- w’ 1 w 1 ] w 2 (t) [w’ 1 w 2 (t+1)- α(t+1)] = [1-βσ(t)][ w’ 1 w 2 (t) -α(t)] w ’ 1 w 2 (t) - α(t) → 0

22 Learning Rates of APEX [w’ 1 w 2 (ť+1)- α(ť+1)] = [1-β’σ(ť)][w’ 1 w 2 (ť) -α(ť)] β’ = 1/σ(ť) β = 1/[Σ t a 2 (t) 2 ] β = 1/[Σ t γ t a 2 (t) 2 ] Learning Rates

23 PAPEX: Hierarchical Extraction Other Extensions DCA: Discriminant Component Analysis ICA: Independent Component Analysis


Download ppt "PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing."

Similar presentations


Ads by Google