Download presentation

Presentation is loading. Please wait.

Published byJeffrey Larcom Modified over 2 years ago

1
Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College Joint work with: Mike Charleston (Univ. of Sydney) Chris Conow (USC) Ben Cousins (Clemson) Daniel Fielder (HMC) John Peebles (HMC) Tselil Schramm (HMC) Anak Yodpinyanee (HMC)

2
Integrated CS/Bio Course Send to:

3
Overview A 75-minute research lecture to first-year students in our CS/Bio intro course Show first-year students that what theyve learned is relevant to current research Showcase research done with senior students What have they have done so far? – Biology: Genes, alignment, phylogenetic trees, RNA folding – CS: Programming, recursion, memoization

4
Specifically… Pairwise global alignment and RNA folding – Why you should care – Designed and implemented recursive solutions – Why are they slow? – How do we make them faster? – Memoization idea – Wow, thats fast! (but no actual analysis yet) – Designed and implemented memoized versions – Used their implementations to investigate questions Around 10 lines of Python code!

5
Specifically… Phylogenetic trees – Why you should care – Implemented simple algorithm (e.g. UPGMA) – Used their implementation to answer questions… – Existence and relative merits of other algorithms (mention maximum likelihood… but its slow!)

6
A 75-minute lecture in 30 minutes (or less)

7
Cophylogenetics I can understand how a flower and a bee might slowly become, either simultaneously or one after the other, modified and adapted in the most perfect manner to each other, by the continued preservation of individuals presenting mutual and slightly favourable deviations of structure. Charles Darwin, The Origin of Species Actual 75-minute lecture starts here! (Also a chapter in new B4B)

8
Obligate Mutualism of Figs and Fig Wasps From Cophylogeny of the Ficus Microcosm, A. Jackson, 2004 ovipostor

9
The Cophylogeny Problem From Hafner MS and Nadler SA, Phylogenetic trees support the coevolution of parasites and their hosts. Nature 1988, 332:

10
Indigobirds and Finches High level of host specificity (e.g. mouth markings)

11
The Question… Given a host tree, parasite tree, and tip mapping, what is the most plausible mapping between the trees and is it suggestive of coevolution? This seems to be a hard problem!

12
Measuring the Hardness of Computational Problems There are three kinds of problems… 1.Easy 2.Hard 3.Impossible!

13
Easy Problems Sorting a list of n numbers: [42, 3, 17, 26, …, 100] Multiplying two n x n matrices: () () = () n n nn n

14
Global Alignment is easy! Reminder of 2 n running time of alignment Informally motivate n 2 running time of memoized version

15
Snowplows of Northern Minnesota Burrsburg Frostbite City Shiversville Tundratown Freezeapolis Hard Problems

16
Snowplows of Northern Minnesota Burrsburg Frostbite City Shiversville Tundratown Freezeapolis Brute-force? Greed?

17
n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec 4900 < 1 sec

18
n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec days 4900 < 1 sec

19
n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec days 4900 < 1 sec trillion years

20
n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec days 4900 < 1 sec trillion years Computers double in speed every 2 years. Lets just wait 10 years! 37 trillion years ->

21
n 2 versus 2 n The Ran-O-Matic performs 10 9 operations/sec Ran-O-Matic n22nn22n n = 10n = 30n = 50n = < 1 sec 900 < 1 sec 2500 < 1 sec 1024 < 1 sec sec days 4900 < 1 sec trillion years Computers double in speed every 2 years. Lets just wait 10 years! 37 trillion years -> 37 billion years!

22
Snowplows and Travelling Salesperson Revisited! Travelling Salesperson Problem Snowplow Problem Protein Folding NP-complete problems Tens of thousands of other known problems go in this cloud!! Phylogenetic trees by maximum likelihood Multiple sequence alignment

23
I cant find an efficient algorithm. I guess Im too dumb. Cartoon courtesy of Computers and Intractability: A Guide to the Theory of NP-completeness by M. Garey and D. Johnson

24
I cant find an efficient algorithm because no such algorithm is possible!

25
Cartoon courtesy of Computers and Intractability: A Guide to the Theory of NP-completeness by M. Garey and D. Johnson I cant find an efficient algorithm, but neither can all these famous people.

26
$1 million Vinay Deolalikar

27
Coping with NP-completeness… Brute force Ad hoc Heuristics Meta heuristics Approximation algorithms

28
Obligate Mutualism of Figs and Fig Wasps From Cophylogeny of the Ficus Microcosm, A. Jackson, 2004 ovipostor

29
The Cophylogeny Problem… Host tree abc Parasite tree d e

30
The Cophylogeny Problem Host tree Tips associations abc Parasite tree d e

31
Possible Solutions abc d e a bc d e Input

32
Event Cost Model cospeciation abc d e cospeciation a bc d e

33
Event Cost Model duplication abc d e duplication a bc d e

34
Event Cost Model host-switch abc d e host-switch a bc d e

35
Event Cost Model loss abc d e loss a bc d e

36
Event Cost Model abc d e cospeciation loss duplication host-switch loss cospeciation a bc d e Cost = duplication + cospeciation + 3 * loss Cost = cospeciation + host-switch + loss

37
Some typical costs abc d e a bc Cost = 8Cost = 5 cospeciation loss duplication host-switch loss cospeciation e d

38
This problem is hard! How hard? NP-complete! (Joint work with Charleston, Ovadia, Conow, Fielder) The host-switches are the culprits e f g h

39
Existing Methods TreeMapTarzan/CoRe-PA TechniqueBrute forceIgnore timing incompatibilities SolutionOptimalCan be BETTER than optimal! Running TimeExponential Polynomial, Very fast Tree BuilderNoYes Solution ViewerYes

40
A Metaheuristic Approach

41
t = 0 t = 1 t = 2 t = 3 t = 4 Dynamic Programming s t r u vwxy a Compute Cost[a,su,2] a c b parasite

42
t = 0 t = 1 t = 2 t = 3 t = 4 s t r u vwxy a Compute Cost[a,su,2] b c Cost[b,tw,3] Cost[c,y,4] a c b parasite Dynamic Programming

43
t = 0 t = 1 t = 2 t = 3 t = 4 Dynamic Programming a b c s t r u vwxy Cost[b,tw,3] loss host-switch loss Cost[c,y,4] a c b parasite Compute Cost[a,su,2]

44
t = 0 t = 1 t = 2 t = 3 t = 4 Dynamic Programming a b c s t r u vwxy Cost[b,tw,3] loss host-switch loss Cost[c,y,4] Candidate for Cost[a,su,2]: Cost[b, tw, 3] + Cost[c, uy, 4] + 2 * loss + host-switch

45
Dynamic Programming Running Time O(n 3 ) cells to fill in O(n 2 ) positions for first child O(n 2 ) positions for second child O(n) to count #losses from each child, but this is precomputable O(n 3 x (n 2 x n 2 )) = O(n 7 ) total

46
Dynamic Programming Running Time O(n 3 ) cells to fill in O(n 2 ) positions for first child O(n 2 ) positions for second child O(n) to count #losses from each child, but this is precomputable O(n 3 x (n 2 x n 2 )) = O(n 7 ) total Can be improved to O(n 3 )

47
Genetic Algorithm

48
Existing Software TreeMapTarzan/CoRe-PAJane 2 TechniqueBrute force DP, Ignore timing incompatibilities Genetic algorithm DP SolutionOptimal Can be BETTER than optimal! Sometimes suboptimal Running TimeExponential Polynomial, Very fast Polynomial, a lot faster! Can control running time Tree BuilderNoYes No, but Jane 2 can read CoRe-PAs trees Solution ViewerYes Also Interactive

49
The Fig/Wasp Challenge

50
Results

51
The Fig/Wasp Dataset… Randomly Generated Problem Instances Solve for optimal cost Original Problem Instance Solve for optimal cost

52

53
Paper recently completed… 30 Coauthors 18 Institutes 10 Countries

54
Results

55

56
Demo

57
Future Work… One parasite, many hosts (failure to diverge) Reticulate phylogenies Multifurcations Suggestions?

58
Questions/Comments

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google