Lecture 20 Empirical Orthogonal Functions and Factor Analysis.

Lecture 20 Empirical Orthogonal Functions and Factor Analysis

Motivation in Fourier Analysis the choice of sine and cosine “patterns” was prescribed by the method. Could we use the data itself as a source of information about the shape of the patterns?

Example maps of some hypothetical function, say, sea surface temperature forming a sequence in time

the data time

the data

pattern number pattern importance

pattern number 3 Choose just the most important patterns

3 most important patterns

comparison original reconstruction using only 3 patterns Note that this process has reduced the noise (since noise has no pattern common to all the images)

amplitudes of patterns time

Note: no requirement that pattern is periodic in time amplitudes of patterns

Discussion: mixing of end members

A B C Useful tool for data that has three “components” ternary diagram

B C 100% A 75% A 50% A 25% A 0% A works for 3 end-members, as long as A+B+C=100% … similarly for B and C

B C Suppose data fall near line on diagram A = data

B C Suppose data fall near line on diagram A = end-members or factors f1f1 f2f2

B C Suppose data fall near line on diagram A = end-members or factors f1f1 f2f2 mixing line 50%

data idealize as being on mixing line B C A f1f1 f2f2

B C You could represent the data exactly with a third ‘noise’ factor A f1f1 f2f2 f3f3 doesn’t much matter where you put f 3, as long as it’s not on the line

S: components (A, B, C, …) in each sample, s (A in s 1 ) (B in s 1 ) (C in s 1 ) (A in s 2 ) (B in s 2 ) (C in s 2 ) (A in s 3 ) (B in s 3 ) (C in s 3 ) … (A in s N ) (B in s N ) (C in s N ) S = Note: a sample is along a row in S N samples M components S is N  M

F: components (A, B, C, …) in each factor, f (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) (A in f 3 ) (B in f 3 ) (C in f 3 ) F = M components M factors F is M  M

C: coefficients of the factors (f 1 in s 1 ) (f 2 in s 1 ) (f 3 in s 1 ) (f 1 in s 2 ) (f 2 in s 2 ) (f 3 in s 2 ) (f 1 in s 3 ) (f 2 in s 3 ) (f 3 in s 3 ) … (f 1 in s N ) (f 2 in s N ) (f 3 in s N ) C = N samples M factors C is N  M

Samples N  M (f 1 in s 1 ) (f 2 in s 1 ) (f 3 in s 1 ) (f 1 in s 2 ) (f 2 in s 2 ) (f 3 in s 2 ) (f 1 in s 3 ) (f 2 in s 3 ) (f 3 in s 3 ) … (f 1 in s N ) (f 2 in s N ) (f 3 in s N ) (A in s 1 ) (B in s 1 ) (C in s 1 ) (A in s 2 ) (B in s 2 ) (C in s 2 ) (A in s 3 ) (B in s 3 ) (C in s 3 ) … (A in s N ) (B in s N ) (C in s N ) = (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) (A in f 3 ) (B in f 3 ) (C in f 3 ) S = C F Coefficients N  M Factors M  M

Samples N  M (f 1 in s 1 ) (f 2 in s 1 ) (f 1 in s 2 ) (f 2 in s 2 ) (f 1 in s 3 ) (f 2 in s 3 ) … (f 1 in s N ) (f 2 in s N ) (A in s 1 ) (B in s 1 ) (C in s 1 ) (A in s 2 ) (B in s 2 ) (C in s 2 ) (A in s 3 ) (B in s 3 ) (C in s 3 ) … (A in s N ) (B in s N ) (C in s N ) = (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) S  C’ F’ selected coefficients N  p selected factors p  M ignore f 3 data approximated with only most important factors p most important factors = those with the biggest coefficients

view samples as vectors in space A B C s1s1 s2s2 s3s3 f Let the factors be unit vectors … … then the coefficients are the projections (dot products) of the sample onto the factors

Suggests a method of choosing factors so that they have large coefficients: A B C s1s1 s2s2 s3s3 f Find the factor f that maximizes E =  i [ s i  f ] 2 with the constraint that f  f =1 Note: square the dot product since it can be negative

Find the factor f that maximizes E =  i [ s i  f ] 2 with the constraint that L = f  f – 1 = 0 E =  i [ s i  f ] 2 =  i [  j S ij f j ] [  k S ik f k ] =  j  k [  i S ij S ik ] f j f k =  j  k M jk f j f k with M jk =  i S ij S ik or M=S T S L =  i f i 2 – 1 Use Lagrange Multipliers, minimizing  =E- 2 L, where 2 is the Lagrange Multiplier. We solved this problem 2 lectures ago. It’s solution is the algebraic eigenvalue problem Mf = 2 f. Recall that the eigenvalue is the corresponding value of E. symmetric Write as square for reasons that will become apparent later

So factors solve the algebraic eigenvalue problem: [S T S] f = 2 f. [S T S] is a square matrix with the same number of rows and columns as there are components. So there are as many factors as there are components. The factors must span a space of the same dimension as the components. If you sort the eigenvectors by the size of their eigenvectors, then the ones with the largest eigenvalue have the largest components. So selecting the most important factors is easy.

An important tidbit from the theory of eigenvalues and eigenvectors that we’ll use later on … [S T S] f = 2 f. Let  2 be a diagonal matrix of eigenvalues, i 2 and let V be a matrix whose columns are the corresponding factors, f (i) Then [S T S] = V  2 V T

Note also that the factors are orthogonal f (i)  f (j) = 0 if i  j This is a mathematically pleasant property But it may not always be the physically most-relevant choice B C A f1f1 B C A f2f2 not orthogonal orthogonal f1f1 f2f2 contains negative A close to mean of data

Upshot eigenvectors of [S T S] f = 2 f with the p eigenvalues identify a p-dimensional sub-space in which most of the data lie you can use those eigenvectors as factors Or You can chose any other p factors that span that subspace In the ternary diagram example, they must lie on the line connecting the two SVD factors

Singular Value Decomposition (SVD) Any N  M matrix S and be written as the product of three matrices S = U  V T where U is N  N and satisfies U T U = UU T V is M  M and satisfies V T V = VV T and  is an N  M diagonal matrix of singular values

Now note that it S = U  V T then S T S = [U  V T ] T [U  V T ] = V  U T U  V T = V  2 V T Compare with the tidbit mentioned earlier S T S=V  2 V T The SVD V is the same V we were talking about earlier The columns of V are the eigenvectors f, so F = V T So we can use the SVD to calculate the factors, F

But its even better than that! Write S = U  V T as S = U  V T = [U  ] [V T ] = C F So the coefficients are C = U  and, as shown previously, the factors are F = V T So we can use the SVD to calculate the coefficients, C, and the factors, F

MatLab Code for computing C and F [U,LAMBDA,V] = svd(S); C = U*LAMBDA; F = V’;

MatLab Code approximating S  S p using only the p most important factors p = (whatever); Up=U(:,1:p); LAMBDAp=LAMBDA(1:p,1:p); Cp = Up*LAMBDAp; Vp = V(:,1:p); Fp = (Vp)’; Sp = Cp * Fp;

back to my example

Each pixel is a component of the image and the patters are factors our derivation assumed that the data (samples, s (i) ) were vectors However, in this example, the data are images (matrices) so what I had to do was to write out the pixels of each image as a vector

Steps 1) load images 2) reorganize images into S 3) SVD of S to get U  and V 4) Examine  to identify number of significant factors 5) Build S’, using only significant factors 6) reorganize S’ back into images

MatLab code for reorganizing a sequence of images D(p,q,r) (p=1 …N x ) (q=1 …N x ) (r=1 …N t ) into the sample matrix, S(r,s) (r=1 …N t ) (q=1 …N x 2 ) for r = [1:Nt] % time r for p = [1:Nx] % row p for q = [1:Nx] % col q s = Nx*(p-1)+q; % index s S(r,s) = D(p,q,r); end

MatLab code for reorganizing the sample matrix S(r,s) (r=1 …N t ) (s=1 …N x 2 ) back into a sequence of images D(p,q,r) (p=1 …N x ) (q=1 …N x ) (r=1 …N t ) for r = [1:Nt] % time p for s = [1:Nx*Nx] % index s p = floor( (s-1)/Nx+0.01 ) + 1; % row p q = s - Nx*(p-1); % col q D(p,q,r) = S(r,s); end

Reality of Factors are factors intrinsically meaningful, or just a convenient way of representing data? Example: Suppose the samples are rocks and the components are element concentrations then thinking of the factors as minerals might make intuitive sense Minerals: fixed element composition Rock: mixture of minerals

Many rocks – but just a few minerals mineral (factor) 1 mineral (factor) 2 mineral (factor) 3 rock 1rock 2 rock 3 rock 4 rock 5 rock 6 rock 7

Possibly Desirable Properties of Factors Factors are unlike each other different minerals typically contain different elements Factor contains either large or near-zero components a mineral typically contains only a few elements Factors have only positive components minerals composed of positive amount of chemical elements Coefficient of factors are positive rocks composed of positive amount of minerals Coefficient typically either large or near-zero rocks composed of just a few major minerals

Transformations of Factors S = C F Suppose we mix factors together to get new factors set of factors New Factors M  M (f 1 in f’ 1 ) (f 2 in f’ 1 ) (f 3 in f’ 1 ) (f 1 in f’s 2 ) (f 2 in f’ 2 ) (f 3 in f’ 2 ) (f 1 in f’ 3 ) (f 2 in f’ 3 ) (f 3 in f’ 3 ) = (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) (A in f 3 ) (B in f 3 ) (C in f 3 ) Transformation M  M Old Factors M  M (A in f’ 1 ) (B in f’ 1 ) (C in f’ 1 ) (A in f’ 2 ) (B in f’ 2 ) (C in f’ 2 ) (A in f’ 3 ) (B in f’ 3 ) (C in f’ 3 ) F new = T F old

Transformations of Factors F new = T F old A requirement is that T -1 exists, else F new will not span the same space as F old S = C F = C I F = (C T -1 ) (T F)= C new F new So you could try to implement the desirable factors by designing an appropriate transformation matrix, T A somewhat restrictive choice of T is T=R, where R is a rotation matrix (rotation matrices satisfy R -1 =R T )

A method for implementing this property Factors are unlike each other different minerals typically contain different elements Factor contains either large or near-zero components a mineral typically contains only a few elements Factors have only positive components minerals composed of positive amount of chemical elements Coefficient of factors are positive rocks composed of positive amount of minerals Coefficient typically either large or near-zero rocks composed of just a few major minerals

Factor contains either large or near-zero components More-or-less equivalent to Lots of variance in the amounts of components contained in the factor

Usual formula for variance for data, x  d 2 = N -2 [ N  i x i 2 - (  i x i ) 2 ] Application to factor, f  f 2 = N -2 [ N  i f i 4 - (  i f i 2 ) 2 ] Note that we are measuring the variance of the squares of the elements of, f. Thus a factor has large  f 2 if the absolute-value of its elements has a lot of variation. The sign of the elements is irrelevant.

Varimax Factors Procedure for maximizing the variance of the factors while still preserving their orthogonality

Based on rotating pairs of factors in their plane f 1 old f 2 old f 1 new f 2 new 

f1f1 f2f2 f3f3 f4f4 f1f1 cos(  )f 2 + sin(  )f 3 f4f4 -sin(  )f 2 + cos(  )f 3 = R rotating a pair of factors in their plane by an amount  1 0 0 0 0 cos(  ) sin(  ) 0 0 -sin(  ) cos(  ) 0 0 0 0 1 R = Called a Givens rotation, by the way

Varimax Procedure for a pair of factors f s and f t find  that maximizes the sum of their variances    f’s 2 +  f’t 2 ) = N  i f’ i s4 -(  i f’ i s2 ) 2 +N  i f’ i t4 -(  i f’ i t2 ) 2 where f i ’s = cos(  f i s + sin(  f i t where f i ’t = -sin(  f i s + cos(  f i t Just solve dE/d  = 0

After much algebra  = ¼ tan -1 2N  i u i v i –  i u i  i v i where u i = f i s2 - f i t2 and v i = 2 f i s2 f i t2 N  i (u i 2 -v i 2 ) – (  i u i ) 2 (  i v i ) 2

Then just apply this rotation to every pair of factors* the result is a new set of factor that are mutually orthogonal but that have maximal variance hence the name Varimax *Actually, you need to do the whole procedure multiple times to get convergence, since subsequent rotations to some extent undo the work of previous rotations

Example 1 f s = [ ½, ½, ½, ½ ] T and f t = [ ½, -½, -½, -½ ] T  = 45° f’ s = [ 1/  2, 0, 1/  2, 0 ] T and f’ t = [ 0, - 1/  2, 0, - 1/  2 ] T rotation angle,   fs 2 +  ft 2 sum of variances  ° worst case: zero variance

Example 2 f s = [0.63, 0.31, 0.63, 0.31] T f t = [0.31, - 0.63, 0.31, -0.63] T  = 26.56° f s = [0.71, 0.00, 0.71, 0.00] T f t = [0.00, -0.71, 0.00, -0.71] T rotation angle,   fs 2 +  ft 2 sum of variances  °

Lecture 20 Empirical Orthogonal Functions and Factor Analysis.

Similar presentations

Presentation on theme: "Lecture 20 Empirical Orthogonal Functions and Factor Analysis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 20 Empirical Orthogonal Functions and Factor Analysis.

Similar presentations

Presentation on theme: "Lecture 20 Empirical Orthogonal Functions and Factor Analysis."— Presentation transcript:

Similar presentations

About project

Feedback