Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College Joint work with: Mike Charleston (Univ. of Sydney) Chris Conow (USC) Ben Cousins.

Similar presentations


Presentation on theme: "Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College Joint work with: Mike Charleston (Univ. of Sydney) Chris Conow (USC) Ben Cousins."— Presentation transcript:

1 Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College Joint work with: Mike Charleston (Univ. of Sydney) Chris Conow (USC) Ben Cousins (Clemson) Daniel Fielder (HMC) John Peebles (HMC) Tselil Schramm (HMC) Anak Yodpinyanee (HMC)

2 Integrated CS/Bio Course Send to:

3 Overview A 75-minute research lecture to first-year students in our CS/Bio intro course Show first-year students that what theyve learned is relevant to current research Showcase research done with senior students What have they have done so far? – Biology: Genes, alignment, phylogenetic trees, RNA folding – CS: Programming, recursion, memoization

4 Specifically… Pairwise global alignment and RNA folding – Why you should care – Designed and implemented recursive solutions – Why are they slow? – How do we make them faster? – Memoization idea – Wow, thats fast! (but no actual analysis yet) – Designed and implemented memoized versions – Used their implementations to investigate questions Around 10 lines of Python code!

5 Specifically… Phylogenetic trees – Why you should care – Implemented simple algorithm (e.g. UPGMA) – Used their implementation to answer questions… – Existence and relative merits of other algorithms (mention maximum likelihood… but its slow!)

6 A 75-minute lecture in 30 minutes (or less)

7 Cophylogenetics I can understand how a flower and a bee might slowly become, either simultaneously or one after the other, modified and adapted in the most perfect manner to each other, by the continued preservation of individuals presenting mutual and slightly favourable deviations of structure. Charles Darwin, The Origin of Species Actual 75-minute lecture starts here! (Also a chapter in new B4B)

8 Obligate Mutualism of Figs and Fig Wasps From Cophylogeny of the Ficus Microcosm, A. Jackson, 2004 ovipostor

9 The Cophylogeny Problem From Hafner MS and Nadler SA, Phylogenetic trees support the coevolution of parasites and their hosts. Nature 1988, 332:

10 Indigobirds and Finches High level of host specificity (e.g. mouth markings)

11 The Question… Given a host tree, parasite tree, and tip mapping, what is the most plausible mapping between the trees and is it suggestive of coevolution? This seems to be a hard problem!

12 Measuring the Hardness of Computational Problems There are three kinds of problems… 1.Easy 2.Hard 3.Impossible!

13 Easy Problems Sorting a list of n numbers: [42, 3, 17, 26, …, 100] Multiplying two n x n matrices: () () = () n n nn n

14 Global Alignment is easy! Reminder of 2 n running time of alignment Informally motivate n 2 running time of memoized version

15 Snowplows of Northern Minnesota Burrsburg Frostbite City Shiversville Tundratown Freezeapolis Hard Problems

16 Snowplows of Northern Minnesota Burrsburg Frostbite City Shiversville Tundratown Freezeapolis Brute-force? Greed?

17 n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec 4900 < 1 sec

18 n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec days 4900 < 1 sec

19 n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec days 4900 < 1 sec trillion years

20 n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec days 4900 < 1 sec trillion years Computers double in speed every 2 years. Lets just wait 10 years! 37 trillion years ->

21 n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec days 4900 < 1 sec trillion years Computers double in speed every 2 years. Lets just wait 10 years! 37 trillion years -> 37 billion years!

22 Snowplows and Travelling Salesperson Revisited! Travelling Salesperson Problem Snowplow Problem Protein Folding NP-complete problems Tens of thousands of other known problems go in this cloud!! Phylogenetic trees by maximum likelihood Multiple sequence alignment

23 I cant find an efficient algorithm. I guess Im too dumb. Cartoon courtesy of Computers and Intractability: A Guide to the Theory of NP-completeness by M. Garey and D. Johnson

24 I cant find an efficient algorithm because no such algorithm is possible!

25 Cartoon courtesy of Computers and Intractability: A Guide to the Theory of NP-completeness by M. Garey and D. Johnson I cant find an efficient algorithm, but neither can all these famous people.

26 $1 million Vinay Deolalikar

27 Coping with NP-completeness… Brute force Ad hoc Heuristics Meta heuristics Approximation algorithms

28 Obligate Mutualism of Figs and Fig Wasps From Cophylogeny of the Ficus Microcosm, A. Jackson, 2004 ovipostor

29 The Cophylogeny Problem… Host tree abc Parasite tree d e

30 The Cophylogeny Problem Host tree Tips associations abc Parasite tree d e

31 Possible Solutions abc d e a bc d e Input

32 Event Cost Model cospeciation abc d e cospeciation a bc d e

33 Event Cost Model duplication abc d e duplication a bc d e

34 Event Cost Model host-switch abc d e host-switch a bc d e

35 Event Cost Model loss abc d e loss a bc d e

36 Event Cost Model abc d e cospeciation loss duplication host-switch loss cospeciation a bc d e Cost = duplication + cospeciation + 3 * loss Cost = cospeciation + host-switch + loss

37 Some typical costs abc d e a bc Cost = 8Cost = 5 cospeciation loss duplication host-switch loss cospeciation e d

38 This problem is hard! How hard? NP-complete! (Joint work with Charleston, Ovadia, Conow, Fielder) The host-switches are the culprits e f g h

39 Existing Methods TreeMapTarzan/CoRe-PA TechniqueBrute forceIgnore timing incompatibilities SolutionOptimalCan be BETTER than optimal! Running TimeExponential Polynomial, Very fast Tree BuilderNoYes Solution ViewerYes

40 A Metaheuristic Approach

41 t = 0 t = 1 t = 2 t = 3 t = 4 Dynamic Programming s t r u vwxy a Compute Cost[a,su,2] a c b parasite

42 t = 0 t = 1 t = 2 t = 3 t = 4 s t r u vwxy a Compute Cost[a,su,2] b c Cost[b,tw,3] Cost[c,y,4] a c b parasite Dynamic Programming

43 t = 0 t = 1 t = 2 t = 3 t = 4 Dynamic Programming a b c s t r u vwxy Cost[b,tw,3] loss host-switch loss Cost[c,y,4] a c b parasite Compute Cost[a,su,2]

44 t = 0 t = 1 t = 2 t = 3 t = 4 Dynamic Programming a b c s t r u vwxy Cost[b,tw,3] loss host-switch loss Cost[c,y,4] Candidate for Cost[a,su,2]: Cost[b, tw, 3] + Cost[c, uy, 4] + 2 * loss + host-switch

45 Dynamic Programming Running Time O(n 3 ) cells to fill in O(n 2 ) positions for first child O(n 2 ) positions for second child O(n) to count #losses from each child, but this is precomputable O(n 3 x (n 2 x n 2 )) = O(n 7 ) total

46 Dynamic Programming Running Time O(n 3 ) cells to fill in O(n 2 ) positions for first child O(n 2 ) positions for second child O(n) to count #losses from each child, but this is precomputable O(n 3 x (n 2 x n 2 )) = O(n 7 ) total Can be improved to O(n 3 )

47 Genetic Algorithm

48 Existing Software TreeMapTarzan/CoRe-PAJane 2 TechniqueBrute force DP, Ignore timing incompatibilities Genetic algorithm DP SolutionOptimal Can be BETTER than optimal! Sometimes suboptimal Running TimeExponential Polynomial, Very fast Polynomial, a lot faster! Can control running time Tree BuilderNoYes No, but Jane 2 can read CoRe-PAs trees Solution ViewerYes Also Interactive

49 The Fig/Wasp Challenge

50 Results

51 The Fig/Wasp Dataset… Randomly Generated Problem Instances Solve for optimal cost Original Problem Instance Solve for optimal cost

52

53 Paper recently completed… 30 Coauthors 18 Institutes 10 Countries

54 Results

55

56 Demo

57 Future Work… One parasite, many hosts (failure to diverge) Reticulate phylogenies Multifurcations Suggestions?

58 Questions/Comments


Download ppt "Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College Joint work with: Mike Charleston (Univ. of Sydney) Chris Conow (USC) Ben Cousins."

Similar presentations


Ads by Google