Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fitting Tree Metrics: Hierarchical Clustering and Phylogeny Nir AilonMoses Charikar Princeton University.

Similar presentations


Presentation on theme: "Fitting Tree Metrics: Hierarchical Clustering and Phylogeny Nir AilonMoses Charikar Princeton University."— Presentation transcript:

1 Fitting Tree Metrics: Hierarchical Clustering and Phylogeny Nir AilonMoses Charikar Princeton University

2 Data with dissimilarity information y x w v u 5 5 3 10 8 (big number = high dissimilarity) D(u,v)=1 7 13 Represented by matrix D Complete information 2 6

3 Goal: Fit data to tree structure v y x w u Preserve dissimilarity info d T (u,v) Tree metric d T close to D T

4 Objective function Minimize: cost(T) = || D – d T || p ( ) -dimensional real vectors n 2

5 Applications Evolutionary biology –Molecular phylogeny: Dissimilarity information from DNA Gene expression analysis Historical linguistics...

6 Special case: Ultrametrics vyxwu,`,` (Hierarchical clustering) M=3 y u v x w T d T (v,x)=1 d T (u,w)=3 Equivalently: Two largest distances in every  equal

7 Previous results Fitting ultrametrics under ||. ||  in P [FKW95] Fitting trees under ||. ||  APX-Hard [ABFPT99] Fitting ultrametrics under ||. || 1 APX-Hard [W93] under ||. || 2 NP-Hard f(n)-approximation algorithm for ultrametrics  (3f(n))-approximation algorithm for trees (under any ||. || p ) [ABFPT99]

8 Previous results O(min{n 1/p, (k logn) 1/p }) - approx for trees under ||. || p [HKM05] Fitting ultrametrics for M=2 under ||. || 1 : Correlation Clustering [BBC02, CGW03, ACN05..]...

9 Our results (M+1) – approx for fitting level M ultrametrics under ||.|| 1 O)(log n loglog n) 1/p ) - approx for general weighted trees under ||.|| p

10 M=3 Reconstructing T from ultrametric D Given ultrametric D  {1..M} n x n Pick pivot vertex u Recursively solve for neighbor-classes u 1 2 3 M=2

11 Minimizing ||.|| 1 for inconsistent D Same algorithm! Pick pivot vertex u (uniformly@random) Freeze distances incident to u u 1 2 3 Fix inter-class distances Fix intra-class distances 3 2 1 3 X X 3 2 X (Total cost contribution: 4) Lemma: no cancellations Theorem: M+1 approximation Recurse...  {1..M} n x n

12 Proof idea  violating if: 1 > 2 ¸ 3 Optimal solution pays ¸ 1 - 2 Algorithm charging scheme: u v w 1 2 3 chosen as pivot )  charged uv w 1 - 2 2 - 3 + 1 - 2 ) 2 ) 1 )  ) 1

13 General ultrametrics D 2 R + n £ n Fit D to weighted ultrametric Ex: d t (v,w)=L 1 +L 2 M possible distances:  1 = L 1  2 = L 1 +L 2 :  M = L 1 +... + L m vyxwu T L1L1 L2L2 LMLM..................

14 Fitting D to M-level weighted Ultrametric under ||. || 1 Integer program formulation: x t uv  {0,1} x t uv = 1  u,v separated at level t 0  x M uv  x M-1 uv ...  x 1 uv =1 vyxwu T L1L1 L2L2 LMLM.................. x M uy = 0 x 2 uy = 0 x 1 uy = 1  - inequality at each level x t uv  x t uw + x t wv Cost: min  t=1 M L t (  x t uv +  (1-x t uv ) ) D(u,v)   t D(u,v) >  t Linear relaxation [0,1]

15 Rounding the LP: An O(logn loglogn)-approximation A divisive (top-down) algorithm At each level t=M, M-1,..., 1: Solve a multi-cut-like problem Cluster so as to separate u,v ’s s.t. x t uv ¸ 2/3 Danger: High levels influence low ones!

16 General ||. || p cost Similar analysis gives same bound for ||. || p p Therefore: O( logn loglogn ) 1/p – approximation By [ABFPT99], applies also to fitting trees

17 Future work O( log n) – algorithm? Better? Stronger lower bounds Derandomize (M+1)-approx algorithm Aggregation [ACN05] Applications Thank You !!!


Download ppt "Fitting Tree Metrics: Hierarchical Clustering and Phylogeny Nir AilonMoses Charikar Princeton University."

Similar presentations


Ads by Google