Presentation is loading. Please wait.

Presentation is loading. Please wait.

Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering Sharlee Climer and Weixiong Zhang This research was supported in.

Similar presentations


Presentation on theme: "Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering Sharlee Climer and Weixiong Zhang This research was supported in."— Presentation transcript:

1 Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering Sharlee Climer and Weixiong Zhang This research was supported in part by NDSEG and Olin Fellowships and by NSF grants IIS-0196057 and ITR/EIA-0113618.

2 Sharlee Climer Washington University in St. Louis 2 Overview Introduction Example Results Conclusion

3 Sharlee Climer Washington University in St. Louis 3 Introduction Rearrangement clustering  Rearrange rows of a matrix  Minimize the sum of the differences between adjacent rows  min  d(i, i+1)  Rows correspond to objects  Columns correspond to features

4 Sharlee Climer Washington University in St. Louis 4 Introduction Applications  Information retrieval  Manufacturing  Software engineering

5 Sharlee Climer Washington University in St. Louis 5 Example

6 Sharlee Climer Washington University in St. Louis 6 Example Bond Energy Algorithm (BEA)  Introduced in 1972 (McCormick, Schweitzer, White)  Approximate solution  Still widely used

7 Sharlee Climer Washington University in St. Louis 7 Example

8 Sharlee Climer Washington University in St. Louis 8 Example Optimal solution Lenstra (1974) observed equivalence to the Traveling Salesman Problem (TSP)  Given n cities and the distance between each pair  Find shortest cycle visiting every city  NP-hard problem

9 Sharlee Climer Washington University in St. Louis 9 Example Transform into a TSP  Each object corresponds to a city  Distance between two cities equal to difference between the corresponding objects  Dummy city added to problem Costs from dummy city to all other cities equal a constant  Location of dummy city indicates position to cut cycle into a path

10 Sharlee Climer Washington University in St. Louis 10 Example TSP solvers extremely slow even for small problems in the 70’s Massive research efforts to solve TSP over last three decades Current solvers  Concorde (Applegate, Bixby, Chvatal, Cook, 2001) Solved a 15,112 city TSP

11 Sharlee Climer Washington University in St. Louis 11 Example

12 Sharlee Climer Washington University in St. Louis 12 Example BEA and TSP offer approximate and optimal solutions We have observed a flaw in the objective function when the objects form natural clusters The objective minimizes the sum of every pair of adjacent rows Inter-cluster distances tend to be significantly larger than intra-cluster distances Summation dominated by inter-cluster distances

13 Sharlee Climer Washington University in St. Louis 13 Example TSPCluster addresses this flaw Add k dummy cities  k clusters are specified by the output TSP solver ignores inter-cluster distances  Minimizes sum of intra-cluster distances Use sufficiently small constant for distances to/from dummy cities  Dummy cities never adjacent to each other

14 Sharlee Climer Washington University in St. Louis 14 Example

15 Sharlee Climer Washington University in St. Louis 15 Results Arabidopsis  499 genes  25 conditions Comparison with BEA  Used BEA similarity measure  BEA score: 447,070  TSPCluster score: 452,109 (k = 1)

16 Sharlee Climer Washington University in St. Louis 16 Results BEATSPCluster

17 Sharlee Climer Washington University in St. Louis 17 Results Compared with Cluster (Eisen et al., 1998) and k-ary (Bar-Joseph et al., 2003) Used Pearson correlation coefficient Cluster: 398 k-ary: 427 TSPCluster: 436 (k = 1)

18 Sharlee Climer Washington University in St. Louis 18 Results Clusterk-aryTSPCluster

19 Sharlee Climer Washington University in St. Louis 19 Results TSPCluster with k equal to 2 to 50 How many clusters? Average inter-cluster distances BEA local peaks:  6, 13, 19, 26, 29, 35, 40, 47 Pearson correlation coefficient local peaks:  3, 9, 12, 21, 26, 40 Computation time varied  Less than half minute to ~3 minutes

20 Sharlee Climer Washington University in St. Louis 20 Results k = 26k = 40

21 Sharlee Climer Washington University in St. Louis 21 Conclusion Most problems have errors in their data Error introduced by approximation algorithms can’t be expected to “undo” this error Computers are cheap Computers and solvers are sophisticated Don’t have to always resort on approximate solutions even for NP-hard problems

22 Sharlee Climer Washington University in St. Louis 22 Conclusion Rearrangement clustering provides a linear ordering Linear ordering inherent to many applications  Information retrieval  Manufacturing  Software engineering

23 Sharlee Climer Washington University in St. Louis 23 Conclusion Gene data arranged in linear order to examine data Linear ordering not necessarily essential to gene clustering problems Current work  Optimally solve subproblems in clustering algorithms

24 Sharlee Climer Washington University in St. Louis 24 Questions?


Download ppt "Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering Sharlee Climer and Weixiong Zhang This research was supported in."

Similar presentations


Ads by Google