Download presentation

Presentation is loading. Please wait.

Published byAliza Wagner Modified over 4 years ago

1
Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

2
Plan of the Talk A. Clustering (Brief overview). B. Deterministic Perturbation Theory. C. Statistical Perturbation Theory.

3
Graph Clustering 3 4 1 2 6 7 5

4
3 4 1 2 6 7 5

5
Graph Clustering + Perturbation 3 4 1 2 6 7 5 ?

6
Gene Expression Data Clustering An Application There are over 10 000 genes expressed in any one tissue; DNA arrays typically produce very noisy data. 1.Genes in same cluster behave similarly? 2. Genes in different clusters behave differently? 1.Genes in same cluster behave similarly? 2. Genes in different clusters behave differently? Issues:

7
Bi-partite Graphs 1 2 3 4 1 2 3

8
Matrix Form

9
A Real Data Matrix (Leukemia)

10
Spectral Clustering: General Idea Discrete Optimisation Problem (NP - Hard) Discrete Optimisation Problem (NP - Hard) Real Optimisation Problem (Tractable) Real Optimisation Problem (Tractable) Approximation Exact - Impractical Heuristic - Practical

11
Discrete Optimisation SVD Active Inactive Active Solution: Singular Value Decomposition of W scaled

12
Clustering Algorithm: Summary ACTIVE INACTIVE

13
Literature

14
Types of Graph Matrices

15
How we Cluster

16
Leukemia Data

17
Clustered Leukemia Data

18
Inaccuracies in the Data (Perturbation Theory)

19
Perturbation Theory (Deterministic Noise)

20
Deterministic Perturbation (Symmetric Matrix)

21
Linear Solve

22
Taylor Expansions

23
Rectangular Case Symmetric

24
Random Perturbations (plan) The Model Issues with the Theory A Possible Solution via Simulations? Experiments

25
The Model 3 4 1 2 6 7 5

26
Difficulties with Random Matrix Theory (RMT)

27
Deterministic Perturbation Stochastic Perturbation (simple eigenvector)

28
Deterministic Perturbation Stochastic Perturbation (simple eigenvalues)

29
PP Plot -Test for Normality (Largest eigenvalue of a Symmetric Matrix)

30
Simulated Random Perturbation (Largest eigenvalue of a Symmetric Matrix)

31
Deterministic Perturbation Stochastic Perturbation (simple eigenvectors)

32
Results for Laplacian Matrices

33
Functional of the Eigenvector

34
Results for h T v 2

35
PP Plot of h T v’(0) - Test for Normality (h = e j )

36
Histogram of h T v’(0) - Simulations (h = e j )

37
PP Plot of Simulated v [j] ( ) (Distribution close to Normal)

38
Histogram of Simulated v [j] ( ) (Distribution close to Normal)

39
Extension to the Rectangular Case

40
Probability of “Wrong Clustering”

41
Issues with Numerics

42
Efficient Simulations

43
Solution via Simulations?

44
Solution via Simulations? (Algorithm)

45
Comparing: Direct Calculation Vs. Repeated Linear Solve

Similar presentations

OK

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google