Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Slides:



Advertisements
Similar presentations
Analytical Approaches to Non-Linear Value at Risk
Advertisements

Partitional Algorithms to Detect Complex Clusters
The Normal Distribution
Eigen Decomposition and Singular Value Decomposition
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Modularity and community structure in networks
Portfolio Diversity and Robustness. TOC  Markowitz Model  Diversification  Robustness Random returns Random covariance  Extensions  Conclusion.
Numerical Methods for Empirical Covariance Matrix Analysis Miriam Huntley SEAS, Harvard University May 15, Course Project.
Extremum Properties of Orthogonal Quotients Matrices By Achiya Dax Hydrological Service, Jerusalem, Israel
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Lecture 17 Introduction to Eigenvalue Problems
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
Lecture 21: Spectral Clustering
CS 584. Review n Systems of equations and finite element methods are related.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Pádraig Cunningham University College Dublin Matrix Tutorial Transition Matrices Graphs Random Walks.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
5. Topic Method of Powers Stable Populations Linear Recurrences.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
AppxA_01fig_PChem.jpg Complex Numbers i. AppxA_02fig_PChem.jpg Complex Conjugate.
Clustering (Part II) 11/26/07. Spectral Clustering.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Tutorial 7 SVD Total Least Squares. 2 We already know, that the basis of eigenvectors of a matrix A is a convenient basis for work with A. However, for.
Lecture 18 Eigenvalue Problems II Shang-Hua Teng.
Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Chapter 2 Dimensionality Reduction. Linear Methods
Chapter 12 – Probability and Statistics 12.7 – The Normal Distribution.
Dynamical Systems Model of the Simple Genetic Algorithm Introduction to Michael Vose’s Theory Rafal Kicinger Summer Lecture Series 2002.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
§ Linear Operators Christopher Crawford PHY
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Workshop on Optimization in Complex Networks, CNLS, LANL (19-22 June 2006) Application of replica method to scale-free networks: Spectral density and spin-glass.
KPS 2007 (April 19, 2007) On spectral density of scale-free networks Doochul Kim (Department of Physics and Astronomy, Seoul National University) Collaborators:
3.6 Solving Systems Using Matrices You can use a matrix to represent and solve a system of equations without writing the variables. A matrix is a rectangular.
Curve Fitting Introduction Least-Squares Regression Linear Regression Polynomial Regression Multiple Linear Regression Today’s class Numerical Methods.
1. Systems of Linear Equations and Matrices (8 Lectures) 1.1 Introduction to Systems of Linear Equations 1.2 Gaussian Elimination 1.3 Matrices and Matrix.
Similar diagonalization of real symmetric matrix
STATIC ANALYSIS OF UNCERTAIN STRUCTURES USING INTERVAL EIGENVALUE DECOMPOSITION Mehdi Modares Tufts University Robert L. Mullen Case Western Reserve University.
Ultra-high dimensional feature selection Yun Li
Differential Equations MTH 242 Lecture # 28 Dr. Manshoor Ahmed.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Singular Value Decomposition and Numerical Rank. The SVD was established for real square matrices in the 1870’s by Beltrami & Jordan for complex square.
Geology 5670/6670 Inverse Theory 4 Feb 2015 © A.R. Lowry 2015 Read for Fri 6 Feb: Menke Ch 4 (69-88) Last time: The Generalized Inverse The Generalized.
Complex Eigenvalues and Phase Portraits. Fundamental Set of Solutions For Linear System of ODEs With Eigenvalues and Eigenvectors and The General Solution.
Random Walk for Similarity Testing in Complex Networks
Provable Learning of Noisy-OR Networks
Elementary Linear Algebra
Document Clustering Based on Non-negative Matrix Factorization
Spectral Clustering.
AP Statistics: Chapter 7
Jianping Fan Dept of CS UNC-Charlotte
Grouping.
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Ilan Ben-Bassat Omri Weinstein
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Dimension reduction : PCA and Clustering
3.3 Network-Centric Community Detection
Kazuyuki Tanaka Graduate School of Information Sciences
Presentation transcript:

Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Plan of the Talk A. Clustering (Brief overview). B. Deterministic Perturbation Theory. C. Statistical Perturbation Theory.

Graph Clustering

Graph Clustering + Perturbation ?

Gene Expression Data Clustering An Application There are over genes expressed in any one tissue; DNA arrays typically produce very noisy data. 1.Genes in same cluster behave similarly? 2. Genes in different clusters behave differently? 1.Genes in same cluster behave similarly? 2. Genes in different clusters behave differently? Issues:

Bi-partite Graphs

Matrix Form

A Real Data Matrix (Leukemia)

Spectral Clustering: General Idea Discrete Optimisation Problem (NP - Hard) Discrete Optimisation Problem (NP - Hard) Real Optimisation Problem (Tractable) Real Optimisation Problem (Tractable) Approximation Exact - Impractical Heuristic - Practical

Discrete Optimisation  SVD Active Inactive Active Solution: Singular Value Decomposition of W scaled

Clustering Algorithm: Summary ACTIVE INACTIVE

Literature

Types of Graph Matrices

How we Cluster

Leukemia Data

Clustered Leukemia Data

Inaccuracies in the Data (Perturbation Theory)

Perturbation Theory (Deterministic Noise)

Deterministic Perturbation (Symmetric Matrix)

Linear Solve

Taylor Expansions

Rectangular Case  Symmetric

Random Perturbations (plan) The Model Issues with the Theory A Possible Solution via Simulations? Experiments

The Model

Difficulties with Random Matrix Theory (RMT)

Deterministic Perturbation  Stochastic Perturbation (simple eigenvector)

Deterministic Perturbation  Stochastic Perturbation (simple eigenvalues)

PP Plot -Test for Normality (Largest eigenvalue of a Symmetric Matrix)

Simulated Random Perturbation (Largest eigenvalue of a Symmetric Matrix)

Deterministic Perturbation  Stochastic Perturbation (simple eigenvectors)

Results for Laplacian Matrices

Functional of the Eigenvector

Results for h T v 2

PP Plot of h T v’(0) - Test for Normality (h = e j )

Histogram of h T v’(0) - Simulations (h = e j )

PP Plot of Simulated v [j] (  ) (Distribution close to Normal)

Histogram of Simulated v [j] (  ) (Distribution close to Normal)

Extension to the Rectangular Case

Probability of “Wrong Clustering”

Issues with Numerics

Efficient Simulations

Solution via Simulations?

Solution via Simulations? (Algorithm)

Comparing: Direct Calculation Vs. Repeated Linear Solve