Presentation is loading. Please wait.

Presentation is loading. Please wait.

Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England

Similar presentations


Presentation on theme: "Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England"— Presentation transcript:

1 Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England B.W.Jervis@shu.ac.uk

2 The Problem Temporally independent unknown source signals are linearly mixed in an unknown system to produce a set of measured output signals. It is required to determine the source signals.

3 Methods of solving this problem are known as Blind Source Separation (BSS) techniques. In this presentation the method of Independent Components Analysis (ICA) will be described. The arrangement is illustrated in the next slide.

4 Arrangement for BSS by ICA s1s2:sns1s2:sn x1x2:xnx1x2:xn u1u2:unu1u2:un y 1 =g 1 (u 1 ) y 2 =g 2 (u 2 ) : y n =g n (u n ) g(.) AW

5 Neural Network Interpretation The s i are the independent source signals, A is the linear mixing matrix, The x i are the measured signals, W  A -1 is the estimated unmixing matrix, The u i are the estimated source signals or activations, i.e. u i  s i, The g i (u i ) are monotonic nonlinear functions (sigmoids, hyperbolic tangents), The y i are the network outputs.

6 Principles of Neural Network Approach Use Information Theory to derive an algorithm which minimises the mutual information between the outputs y=g(u). This minimises the mutual information between the source signal estimates, u, since g(u) introduces no dependencies. The different u are then temporally independent and are the estimated source signals.

7 Cautions I The magnitudes and signs of the estimated source signals are unreliable since –the magnitudes are not scaled –the signs are undefined because magnitude and sign information is shared between the source signal vector and the unmixing matrix, W. The order of the outputs is permutated compared wiith the inputs

8 Cautions II Similar overlapping source signals may not be properly extracted. If the number of output channels  number of source signals, those source signals of lowest variance will not be extracted. This is a problem when these signals are important.

9 Information Theory I If X is a vector of variables (messages) x i which occur with probabilities P(x i ), then the average information content of a stream of N messages is bits and is known as the entropy of the random variable, X.

10 Information Theory II Note that the entropy is expressible in terms of probability. Given the probability density distribution (pdf) of X we can find the associated entropy. This link between entropy and pdf is of the greatest importance in ICA theory.

11 Information Theory III The joint entropy between two random variables X and Y is given by For independent variables

12 Information Theory IV The conditional entropy of Y given X measures the average uncertainty remaining about y when x is known, and is The mutual information between Y and X is In ICA, X represents the measured signals, which are applied to the nonlinear function g(u) to obtain the outputs Y.

13 Bell and Sejnowski’s ICA Theory (1995) Aim to maximise the amount of mutual information between the inputs X and the outputs Y of the neural network. (Uncertainty about Y when X is unknown) Y is a function of W and g(u). Here we seek to determine the W which produces the u i  s i, assuming the correct g(u).

14 Differentiating: (=0, since it did not come through W from X.) So, maximising this mutual information is equivalent to maximising the joint output entropy, which is seen to be equivalent to minimising the mutual information between the outputs and hence the u i, as desired.

15 The Functions g(u) The outputs y i are amplitude bounded random variables, and so the marginal entropies H(y i ) are maximum when the y i are uniformly distributed - a known statistical result. With the H(y i ) maximised, I(Y,X) = 0, and the y i uniformly distributed, the nonlinearity g i (u i ) has the form of the cumulative distribution function of the probability density function of the s i, - a proven result.

16 Pause and review g(u) and W W has to be chosen to maximise the joint output entropy H(Y,X), which minimises the mutual information between the estimated source signals, u i. The g(u) should be the cumulative distribution functions of the source signals, s i. Determining the g(u) is a major problem.

17 One input and one output For a monotonic nonlinear function, g(x), Also Substituting: (we only need to maximise this term) (independent of W)

18 A stochastic gradient ascent learning rule is adopted to maximise H(y) by assuming Further progress requires knowledge of g(u). Assume for now, after Bell and Sejnowski, that g(u) is sigmoidal, i.e. Also assume

19 Learning Rule: 1 input, 1 output Hence, we find:

20 Learning Rule: N inputs, N outputs Need Assuming g(u) is sigmoidal again, we obtain:

21 The network is trained until the changes in the weights become acceptably small at each iteration. Thus the unmixing matrix W is found.

22 The Natural Gradient The computation of the inverse matrix is time-consuming, and may be avoided by rescaling the entropy gradient by multiplying it by Thus, for a sigmoidal g(u) we obtain This is the natural gradient, introduced by Amari (1998), and now widely adopted.

23 The nonlinearity, g(u) We have already learnt that the g(u) should be the cumulative probability densities of the individual source distributions. So far the g(u) have been assumed to be sigmoidal, so what are the pdfs of the s i ? The corresponding pdfs of the s i are super- Gaussian.

24 Super- and sub-Gaussian pdfs Gaussian Super-Gaussian Sub-Gaussian * Note: there are no mathematical definitions of super- and sub-Gaussians

25 Super- and sub-Gaussians vSuper-Gaussians:kurtosis (fourth order central moment, measures the flatness of the pdf) > 0.infrequent signals of short duration, e.g. evoked brain signals. vSub-Gaussians kurtosis < 0 signals mainly “on”, e.g. 50/60 Hz electrical mains supply, but also eye blinks.

26 Kurtosis Kurtosis = 4th order central moment = and is seen to be calculated from the current estimates of the source signals. To separate the independent sources, information about their pdfs such as skewness (3rd. moment) and flatness (kurtosis) is required. First and 2nd. moments (mean and variance) are insufficient.

27 A more generalised learning rule Girolami (1997) showed that tanh(u i ) and -tanh(u i ) could be used for super- and sub-Gaussians respectively. Cardoso and Laheld (1996) developed a stability analysis to determine whether the source signals were to be considered super- or sub-Gaussian. Lee, Girolami, and Sejnowski (1998) applied these findings to develop their extended infomax algorithm for super- and sub-Gaussians using a kurtosis-based switching rule.

28 Extended Infomax Learning Rule With super-Gaussians modelled as and sub-Gaussians as a Pearson mixture model the new extended learning rule is

29 Switching Decision and the k i are the elements of the N-dimensional diagonal matrix, K, and Modifications of the formula for k i exist, but in our experience the extended algorithm has been unsatisfactory.

30 Reasons for unsatisfactory extended algorithm 1) Initial assumptions about super- and sub-Gaussian distributions may be too inaccurate. 2) The switching criterion may be inadequate. Alternatives Postulate vague distributions for the source signals which are then developed iteratively during training. Use an alternative approach, e.g, statistically based, JADE (Cardoso).

31 Summary so far We have seen how W may be obtained by training the network, and the extended algorithm for switching between super- and sub-Gaussians has been described. Alternative approaches have been mentioned. Next we consider how to obtain the source signals knowing W and the measured signals, x.

32 Source signal determination The system is: s i unknown x i measured u i  s i estimated yiyi Mixing matrix A Unmixing matrix W g(u)g(u) Hence U=W.x and x=A.S where A  W -1, and U  S. The rows of U are the estimated source signals, known as activations (as functions of time). The rows of x are the time-varying measured signals.

33 Source Signals Channel number Time, or sample number

34 Expressions for the Activations We see that consecutive values of u are obtained by filtering consecutive columns of x by the same row of W. The ith row of u is the ith row of w by the columns of x.

35 Procedure Record N time points from each of M sensors, where N  5M. Pre-process the data, e.g. filtering, trend removal. Sphere the data using Principal Components Analysis (PCA). This is not essential but speeds up the computation by first removing first and second order moments. Compute the u i  s i. Include desphering. Analyse the results.

36 Optional Procedures I The contribution of each activation at a sensor may be found by “back-projecting” it to the sensor.

37 Optional Procedures II A measured signal which is contaminated by artefacts or noise may be extracted by “back- projecting” all the signal activations to the measurement electrode, setting other activations to zero. (An artefact and noise removal method).

38 Current Developments Overcomplete representations - more signal sources than sensors. Nonlinear mixing. Nonstationary sources. General formulation of g(u).

39 Conclusions It has been shown how to extract temporally independent unknown source signals from their linear mixtures at the outputs of an unknown system using Independent Components Analysis. Some of the limitations of the method have been mentioned. Current developments have been highlighted.


Download ppt "Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England"

Similar presentations


Ads by Google