Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.

Similar presentations


Presentation on theme: "Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis."— Presentation transcript:

1 Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis

2 EDR space Now we start talking about regression. The data is {x i, y i } Is dimension reduction on X matrix alone helpful here? Possibly, if the dimension reduction preserves the essential structure about Y|X. This is suspicious. Effective Dimension Reduction --- reduce the dimension of X without losing information which is essential to predict Y.

3 EDR space The model: Y is predicted by a set of linear combinations of X. If g() is known, this is not very different from a generalized linear model. For dimension reduction purpose, is there a scheme which can work on almost any g(), without knowledge of its actual form?

4 EDR space The general model encompasses many models as special cases:

5 Under this general model, The space B generated by β 1, β 2, ……, β K is called the e.d.r. space. Reducing to this sub-space causes no loss of information regarding predicting Y. Similar to factor analysis, the subspace B is identifiable, but the vectors aren’t. Any non-zero vector in the e.d.r. space is called an e.d.r. direction. EDR space

6 This equation assumes almost the weakest form, to reflect the hope that a low-dimensional projection of a high-dimensional regresser variable contains most of the information that can be gathered from a sample of modest size. It doesn’t impose any structure on how the projected regresser variables effect the output variable. Most regression models assume K=1, plus additional structures on g().

7 EDR space The philosophical point of Sliced Inverse Regression: the estimation of the projection directions can be a more important statistical issue than the estimation of the structure of g() itself. After finding a good e.d.r. space, we can project data to this smaller space. Then we are in a better position to identify what should be pursued further : model building, response surface estimation, cluster analysis, heteroscedasticity analysis, variable selection, ……

8 SIR Sliced Inverse Regression. In regular regression, our interest is the conditional density h(Y|X). Most important is E(Y|x) and var(Y|x). SIR treats Y as independent variable and X as the dependent variable. Given Y=y, what values will X take? This takes us from a p-dimensional problem (subject to curse of dimensionality) back to a 1-dimensional curve- fitting problem: E(x i |y), i=1,…, p

9 SIR

10

11 covariance matrix for the slice means of x, weighted by the slice sizes sample covariance for x i ’s Find the SIR directions by conducting the eigenvalue decomposition of with respect to :

12 SIR An example response surface found by SIR.

13 SIR and LDA Reminder: Fisher’s linear discriminant analysis seeks a projection direction that maximized class separation. When the underlying distributions are Gaussian, it agrees with the Bayes decision rule. It seeks to maximize: Between-group variance: Within-group variance:

14 The solution is the first eigen vector in this eigen value decomposition: If we let, the LDA agrees with SIR up to a scaling. SIR and LDA

15 Multi-class LDA Structure-preserving dimension reduction in classification. Within-class scatter: Between-class scatter: Mixture scatter: a: observations, c: class centers Kim et al. Pattern Recognition 2007, 40:2939

16 Maximize: The solution come from the eigen value/vectors of When we have N<<p, Sw is singular. Let Multi-class LDA Kim et al. Pattern Recognition 2007, 40:2939

17 Multi-class LDA

18 PLS  Finding latent factors in X that can predict Y.  X is multi-dimensional, Y can be either a random variable or a random vector.  The model will look like: where T j is a linear combination of X  PLS is suitable in handling p>>N situation.

19 PLS Data: Goal:

20 PLS Solution: a k+1 is the (k+1) th eigen vector of Alternatively, The PLS components minimize Can be solved by iterative regression.

21 PLS Example: PLS v.s. PCA in regression: Y is related to X 1

22 Other than dimension reduction, hidden factor model, there is another way to understand a model like this: It can be understood as explaining the data by a bipartite network --- a control layer and an output layer. Unlike PCA and ICA, NCA doesn’t assume a fully linked loading matrix. Rather, the matrix is sparse. The non-zero locations are pre-determined by biological knowledge about regulatory networks. For example, Network component analysis

23 Motivation: Instead of blindly search for lower dimensional space, a priori information is incorporated into the loading matrix.

24 NCA X NxP =A NxK P KxP +E NxP Conditions for the solution to be unique: (1)A is full column rank; (2)When a column of A is removed, together with all rows corresponding to non-zero values in the column, the remaining matrix is still full column rank; (3)P must have full row rank

25 NCA Fig. 2. A completely identifiable network (a) and an unidentifiable network (b). Although the two initial [A] matrices describing the network matrices have an identical number of constraints (zero entries), the network in b does not satisfy the identifiability conditions because of the connectivity pattern of R 3. The edges in red are the differences between the two networks.

26 NCA Notice that both A and P are to be estimated. Then the criteria of identifiability is in fact untestable. The compute NCA, minimize the square loss function: Z 0 is the topology constraint matrix – i.e. which position of A is non-zero. It is based on prior knowledge. It is the network connectivity matrix.

27 NCA Solving NCA: This is a linear decomposition system which has the bi- convex property. It is solved by iteratively solving for A and P while fixing the other one. Both steps use least squares. Convergence is judged by the total least-square error. The total error is non-increasing in each step. Optimality is guaranteed if the three conditions for identifiability are satisfied. Otherwise a local optimum may be found.


Download ppt "Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis."

Similar presentations


Ads by Google