Presentation is loading. Please wait.

Presentation is loading. Please wait.

MINATO ZDD Project Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data Toshiki Saitoh (ERATO) Joint work with Masashi.

Similar presentations


Presentation on theme: "MINATO ZDD Project Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data Toshiki Saitoh (ERATO) Joint work with Masashi."— Presentation transcript:

1 MINATO ZDD Project Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data Toshiki Saitoh (ERATO) Joint work with Masashi Kiyomi (JAIST) Yoshio Okamoto (JAIST) 11 th International Symposium on Experimental Algorithms Bordeaux, France, June 7-9, 2012 Kobe University Yokohama City University The University of Electro-Communications

2 MINATO ZDD Project c3c3 c1c1 c4c4 c2c2 c6c6 c5c5 s1s1 s4s4 s2s2 s3s3 s5s5 Input: A species-character matrix M – All characters are binary. m sc = 1 iff the species s has the character c Output: A directed perfect phylogeny – An unordered rooted tree whose leaves have one species label. – Each character is labeled one node. – A species s has a character c if and only if the leaf with label s is a descendant of the node with label c. Directed Binary Perfect Phylogeny c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 s1s1 001110 s2s2 011000 s3s3 100001 s4s4 001100 s5s5 100000 ○ 0 → 1× 1 → 0○ 0 → 1× 1 → 0

3 MINATO ZDD Project Directed Binary Perfect Phylogeny A matrix M admits a directed perfect phylogeny if and only if for every pair of columns i and j, either C i and C j are disjoint or one contains the other. Lemma [Jannson, 2008] C i : the set of species with the character c i c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 s1s1 001110 s2s2 011000 s3s3 100001 s4s4 001100 s5s5 100000 c3c3 c1c1 c4c4 c2c2 c6c6 c5c5 s1s1 s4s4 s2s2 s3s3 s5s5 We can construct a phylogeny in polynomial time. C 3 ={s 1, s 2, s 4 } C 4 ={s 1, s 4 }C 6 ={s 3 } C 4 C 3

4 MINATO ZDD Project Incomplete Directed Perfect Phylogenies Input: An incomplete species-character matrix – The states of some characters are unknown. Output: A directed perfect phylogeny – The unknown states are completed C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001?10 S2S2 011000 S3S3 ?0?001 S4S4 001100 S5S5 100?00 C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001110 S2S2 011000 S3S3 100001 S4S4 001100 S5S5 100000 We can find one phylogeny in polynomial time. [Pe’er et al., 2004]

5 MINATO ZDD Project Incomplete Directed Perfect Phylogenies Input: An incomplete species-character matrix – The states of some characters are unknown. Output: A directed perfect phylogeny – The unknown states are completed C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001?10 S2S2 011000 S3S3 ?0?001 S4S4 001100 S5S5 100?00 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001110 S2S2 011000 S3S3 100001 S4S4 001100 S5S5 100000 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001110 S2S2 011000 S3S3 001001 S4S4 001100 S5S5 100000 C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5 C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5 Enumeration of all perfect phylogenies from incomplete data

6 MINATO ZDD Project Why Enumeration? Data mining – Extraction of characters from all objects Indexing – Counting – Random sampling – Searching – Filtering C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001?10 S2S2 011000 S3S3 ?0?001 S4S4 001100 S5S5 100?00 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001110 S2S2 011000 S3S3 100001 S4S4 001100 S5S5 100000 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001110 S2S2 011000 S3S3 001001 S4S4 001100 S5S5 100000 C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5 C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5...

7 MINATO ZDD Project Our Contribution Proposing two enumeration algorithms – Branch and bound (B&B) Output all perfect phylogenies one by one Runs in O( |M| k h ) time – k : #“?” in M, h : #perfect phylogenies – ZDD approach Represent all perfect phylogenies compactly Many applications – Counting, random sampling, filtering Proof of #P-hardness of the counting problem – Reducing by counting the number of matchings in a bipartite graphs

8 MINATO ZDD Project What is a ZDD? ZDD: Zero-suppressed Binary Decision Diagram – Proposed by Minato [Minato, 1993] Compact representation for a boolean function – A boolean function corresponds to a family of sets. Reduction rules 1.Uniqueness 2.Zero-suppression x1x1 x2x2 x3x3 01 ZDD of F {{x 1, x 2 }, {x 1, x 3 }, {x 3 }} x1x1 x2x2 x3x3 x2x2 x3x3 x3x3 x3x3 010 01 Binary decision tree representing F 0100 1

9 MINATO ZDD Project Reduction Rules xxx 1.Uniqueness x 2. Zero-suppression 0 Merge duplicate nodes (isomorphism subgraph) Eliminate redundant nodes There are algebraic operations for families of sets over ZDDs. A ZDD represents a family of sets in a compressed way.

10 MINATO ZDD Project Algebraic Operations on ZDDs Family algebras – Union, intersection, difference, join, quotient, remainder, etc. – Filtering objects in ZDDs Counting (random sampling) and optimization x1x1 x2x2 x3x3 01 {{x 1, x 2 }, {x 1, x 3 }, {x 3 }} x1x1 x2x2 x3x3 01 {{x 1 },{x 1, x 2, x 3 }} ˅ x1x1 x2x2 x3x3 01 x3x3 {{x 1 }, {x 3 }, {x 1, x 2 }, {x 1, x 3 }, {x 1, x 2, x 3 }} These operations can be performed in almost linear time.

11 MINATO ZDD Project Perfect Phylogenies and ZDD Introducing a boolean variable x sc for each species s and character c – x sc = 1 if and only if the species s has the character c. c1c1 c2c2 s1s1 10 s2s2 ?1 s3s3 0? x 11 x 21 x 22 01 x 32 10 11 00 10 01 01 10 01 00 A matrix M admits a directed perfect phylogeny if and only if for every pair of columns i and j, either C i and C j are disjoint or one contains the other. Lemma [Jannson, 2008]

12 MINATO ZDD Project Perfect Phylogenies and ZDD for every distinct character c i and c j exactly one of the following three is satisfied. A)for all species s, if x sc i =1 then x sc j =1 B)for all species s, if x sc i =0 then x sc j =0 C)for all species s, if x sc i =1 then x sc j =0 A matrix M admits a directed perfect phylogeny if and only if for every pair of columns i and j, either C i and C j are disjoint or one contains the other. Lemma [Jannson, 2008] C i C j C j C i Introducing a boolean variable x sc for each species s and character c – x sc = 1 if and only if the species s has the character c.

13 MINATO ZDD Project Experiments Instances: – Constructing an incomplete data from complete data Random data set [Hudson, 2002] “1” or “0” -> “?” with probability p (={0.1, 0.2, 0.3, 0.4, 0.5}) – Matrix size (n, m): ({50, 100}, {50, 100}) – 100 instances for each triple (n, m, p) B&B algorithm is written by C. ZDD approach is written by C++ (ZDD library is developed by Minato) Machine spec – OS: SuSE Linux Enterprise Server 10 – CPU: Quad-Core AMD Opteron Processor 8393 #CPUs 16, #Processors 32, Clock Freq. 3092MHz – Memory: 512GB 0 10 0 ?? 1 0? 010 011 100

14 MINATO ZDD Project Experimental Results B&BZDD Approach m, n 50, 5050, 100100, 50100, 10050, 5050, 100100, 50100, 100 p=0.152170099 9390 p=0.20000573364 The number of solved instances by B&B and ZDD approach. (“solved” means that the algorithm successfully halts.) Timeout: 2 minutes

15 MINATO ZDD Project Experimental Results

16 MINATO ZDD Project Experimental Results

17 MINATO ZDD Project Experimental Results The size of ZDD is 10 17.77 times smaller than the number of perfect phylogenies.

18 MINATO ZDD Project Conclusion Our results – Proposing two enumeration algorithms Branch and bound algorithm (B&B) ZDD approach – ZDD approach solved more instances than B&B. – Spends more time with more the ZDD size. – Show high compression rate of ZDD for the random data. – Proof of #P-hardness of the counting problem Thank you for your attention!


Download ppt "MINATO ZDD Project Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data Toshiki Saitoh (ERATO) Joint work with Masashi."

Similar presentations


Ads by Google