Download presentation
Presentation is loading. Please wait.
1
Fitting Tree Metrics: Hierarchical Clustering and Phylogeny Nir AilonMoses Charikar Princeton University
2
Data with dissimilarity information y x w v u 5 5 3 10 8 (big number = high dissimilarity) D(u,v)=1 7 13 Represented by matrix D Complete information 2 6
3
Goal: Fit data to tree structure v y x w u Preserve dissimilarity info d T (u,v) Tree metric d T close to D T
4
Objective function Minimize: cost(T) = || D – d T || p ( ) -dimensional real vectors n 2
5
Applications Evolutionary biology –Molecular phylogeny: Dissimilarity information from DNA Gene expression analysis Historical linguistics...
6
Special case: Ultrametrics vyxwu,`,` (Hierarchical clustering) M=3 y u v x w T d T (v,x)=1 d T (u,w)=3 Equivalently: Two largest distances in every equal
7
Previous results Fitting ultrametrics under ||. || in P [FKW95] Fitting trees under ||. || APX-Hard [ABFPT99] Fitting ultrametrics under ||. || 1 APX-Hard [W93] under ||. || 2 NP-Hard f(n)-approximation algorithm for ultrametrics (3f(n))-approximation algorithm for trees (under any ||. || p ) [ABFPT99]
8
Previous results O(min{n 1/p, (k logn) 1/p }) - approx for trees under ||. || p [HKM05] Fitting ultrametrics for M=2 under ||. || 1 : Correlation Clustering [BBC02, CGW03, ACN05..]...
9
Our results (M+1) – approx for fitting level M ultrametrics under ||.|| 1 O)(log n loglog n) 1/p ) - approx for general weighted trees under ||.|| p
10
M=3 Reconstructing T from ultrametric D Given ultrametric D {1..M} n x n Pick pivot vertex u Recursively solve for neighbor-classes u 1 2 3 M=2
11
Minimizing ||.|| 1 for inconsistent D Same algorithm! Pick pivot vertex u (uniformly@random) Freeze distances incident to u u 1 2 3 Fix inter-class distances Fix intra-class distances 3 2 1 3 X X 3 2 X (Total cost contribution: 4) Lemma: no cancellations Theorem: M+1 approximation Recurse... {1..M} n x n
12
Proof idea violating if: 1 > 2 ¸ 3 Optimal solution pays ¸ 1 - 2 Algorithm charging scheme: u v w 1 2 3 chosen as pivot ) charged uv w 1 - 2 2 - 3 + 1 - 2 ) 2 ) 1 ) ) 1
13
General ultrametrics D 2 R + n £ n Fit D to weighted ultrametric Ex: d t (v,w)=L 1 +L 2 M possible distances: 1 = L 1 2 = L 1 +L 2 : M = L 1 +... + L m vyxwu T L1L1 L2L2 LMLM..................
14
Fitting D to M-level weighted Ultrametric under ||. || 1 Integer program formulation: x t uv {0,1} x t uv = 1 u,v separated at level t 0 x M uv x M-1 uv ... x 1 uv =1 vyxwu T L1L1 L2L2 LMLM.................. x M uy = 0 x 2 uy = 0 x 1 uy = 1 - inequality at each level x t uv x t uw + x t wv Cost: min t=1 M L t ( x t uv + (1-x t uv ) ) D(u,v) t D(u,v) > t Linear relaxation [0,1]
15
Rounding the LP: An O(logn loglogn)-approximation A divisive (top-down) algorithm At each level t=M, M-1,..., 1: Solve a multi-cut-like problem Cluster so as to separate u,v ’s s.t. x t uv ¸ 2/3 Danger: High levels influence low ones!
16
General ||. || p cost Similar analysis gives same bound for ||. || p p Therefore: O( logn loglogn ) 1/p – approximation By [ABFPT99], applies also to fitting trees
17
Future work O( log n) – algorithm? Better? Stronger lower bounds Derandomize (M+1)-approx algorithm Aggregation [ACN05] Applications Thank You !!!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.