Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室.

Similar presentations


Presentation on theme: "An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室."— Presentation transcript:

1 An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室

2 The Principle of ICA: a cocktail-party problem x 1 (t)=a 11 s 1 (t) +a 12 s 2 (t) +a 13 s 3 (t) x 2 (t)=a 21 s 1 (t) +a 22 s 2 (t) +a 12 s 3 (t) x 3 (t)=a 31 s 1 (t) +a 32 s 2 (t) +a 33 s 3 (t)

3 Independent Component Analysis Reference : A. Hyvärinen, J. Karhunen, E. Oja (2001)A. HyvärinenJ. KarhunenE. Oja John Wiley & SonsJohn Wiley & Sons. Independent Component Analysis

4 Central limit theorem The distribution of a sum of independent random variables tends toward a Gaussian distribution Observed signal=IC1IC2 ICn m1m1 + m 2 ….+ m n toward GaussianNon-Gaussian

5 Central Limit Theorem Partial sum of a sequence {z i } of independent and identically distributed random variables z i Since mean and variance of x k can grow without bound as k , consider instead of x k the standardized variables The distribution of y k  a Gaussian distribution with zero mean and unit variance when k . Partial sum of a sequence {z i } of independent and identically distributed random variables z i

6 How to estimate ICA model Principle for estimating the model of ICA Maximization of NonGaussianity

7 Measures for NonGaussianity Kurtosis Super-Gaussian kurtosis > 0 Gaussian kurtosis = 0 Sub-Gaussian kurtosis < 0 Kurtosis : E{(x-  ) 4 }-3*[E{(x-  ) 2 }] 2 kurt(x 1 +x 2 )= kurt(x 1 ) + kurt(x 2 ) kurt(  x 1 ) =  4 kurt(x 1 )

8 Assume measurement Whitening process is zero mean and Then is a whitening matrix xEDVxz T 2 1   Iss  }{ T E sxA  T EDV 2 1   TTT EEVxxVzz}{}{  2 1 2 1   EDEDEED TT I  TT E xx  }{ Let D and E be the eigenvalues and eigenvector matrix of covariance matrix of x, i.e.

9 Importance of whitening For the whitened data z, find a vector w such that the linear combination y=w T z has maximum nongaussianity under the constrain Maximize | kurt(w T z)| under the simpler constraint that ||w||=1 Then

10 Constrained Optimization max F(w), ||w|| 2 =1 At the stable point, the gradient of F(w) must point in the direction of w, i.e. equal to w multiplied by a scalar. 0 ),(    w w L 0]2[ )(     w w w F 1),1()(),( 2 2  wwwwww T FL w w w     2 )(F

11 Gradient of kurtosis }]){([3}) )()( 224 zwzwzww TTT EEkurtF  w wwzw w zwzw w w           2 1 4 24 )(3))(( 1 }}){(3}) )( T T t T TT t T EE F ))((2*3)]()[( 4 1 3 wwwwzwz    T T t T tt T ]3}][{))[((4 2 3 wwzwzzw  TT Ekurtsign    T t t T E 1 )( 1 }{yy 

12 Fixed-point algorithm using kurtosis w k+1 = w k +  Note that adding the gradient to w k does not change its direction, since Convergence : | |=1 since w k and w k+1 are unit vectors )(   wkwk F wkwk = ( +  2 ) )(   wkwk F wkwk 2 3 3}][{wwzwz Therefore, w  T E www/  w k+1 = w k -  ( w k ) 2 = (1- 2  )wkwk

13 Fixed-point algorithm using kurtosis 1. Centering 2. Whitening 3. Choose m, No. of ICs to estimate. Set counter p  1 4. Choose an initial guess of unit norm for w p, eg. randomly. 5. Let 6. Do deflation decorrelation 7.Let w p  w p /||w p || 8.If w p has not converged (| | 1 ), go to step 5. 9.Set p  p+1. If p  m, go back to step 4. One-by-one Estimation Fixed-point iteration

14 Fixed-point algorithm using negentropy The kurtosis is very sensitive to outliers, which may be erroneous or irrelevant observations Need to find a more robust measure for nongaussianity Approximation of negentropy ex. r.v. with sample size=1000, mean=0, variance=1, contains one value = 10  kurtosis at least equal to 10 4 /1000-3=7 kurtosis : E{x 4 }-3

15 Fixed-point algorithm using negentropy Entropy Negentropy Approximation of negentropy

16 w  E{zg(w T z)}  E{g '(w T z)} w w  w/||w|| Fixed-point algorithm using negentropy Convergence : | |=1 Max J(y)

17 Fixed-point algorithm using negentropy 1. Centering 2. Whitening 3. Choose m, No. of ICs to estimate. Set counter p  1 4. Choose an initial guess of unit norm for w p, eg. randomly. 5. Let 6. Do deflation decorrelation 7.Let w p  w p /||w p || 8.If w p has not converged, go back to step 5. 9.Set p  p+1. If p  m, go back to step 4. One-by-one Estimation Fixed-point iteration

18 Implantations Create two uniform sources

19 Implantations Create two uniform sources

20 Implantations Two mixed observed signals

21 Implantations Two mixed observed signals

22 Implantations Centering

23 Implantations Centering

24 Implantations Whitening

25 Implantations Whitening

26 Implantations Fixed-point iteration using kurtosis

27 Implantations Fixed-point iteration using kurtosis

28 Implantations Fixed-point iteration using kurtosis

29 Implantations Fixed-point iteration using negentropy

30 Implantations Fixed-point iteration using negentropy

31 Implantations Fixed-point iteration using negentropy

32 Implantations Fixed-point iteration using negentropy

33 Implantations Fixed-point iteration using negentropy

34 Fixed-point algorithm using negentropy Entropy A Gaussian variable has the largest entropy among all random variables of equal variance Entropy Negentropy Gaussian  0 nonGaussian  >0 A Gaussian variable has the largest entropy among all random variables of equal variance Negentropy Gaussian  0 nonGaussian  >0 It’s the optimal estimator but computationally difficult since it requires an estimate of the pdf Approximation of negentropy

35 Fixed-point algorithm using negentropy High-order cumulant approximation It’s quite common that most r.v. have approximately symmetric dist. Replace the polynomial function by any nonpolynomial fun G i ex.G 1 is odd and G 2 is even

36 Fixed-point algorithm using negentropy According to Lagrange multiplier  the gradient must point in the direction of w

37 Fixed-point algorithm using negentropy The iteration doesn't have the good convergence properties because the nonpolynomial moments don't have the same nice algebraic properties. Finding  by approximative Newton method Real Newton method is fast.  small steps for convergence It requires a matrix inversion at every step.  large computational load Special properties of the ICA problem  approximative Newton method No need a matrix inversion but converge roughly with the same steps as real Newton method

38 Fixed-point algorithm using negentropy According to Lagrange multiplier  the gradient must point in the direction of w Solve this equation by Newton method JF(w)*  w = -F(w)   w = [JF(w)] -1 [-F(w)] E{zz T }E{g ' (w T z)} = E{g ' (w T z)} I diagonalize w  w  [E{zg(w T z)}  w] /[E{g ' (w T z)}  ] w  E{zg(w T z)}  E{g ' (w T z)} w w  w/||w|| Multiply by  -E{g ' (w T z)}

39 w  E{zg(w T z)}  E{g ' (w T z)} w w  w/||w|| Fixed-point algorithm using negentropy Convergence : | |=1 Because ICs can be defined only up to a multiplication sign


Download ppt "An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室."

Similar presentations


Ads by Google