MINATO ZDD Project Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data Toshiki Saitoh (ERATO) Joint work with Masashi.

Slides:



Advertisements
Similar presentations
Model Checking Lecture 4. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
Advertisements

Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
Lecture 24 MAS 714 Hartmut Klauck
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
Fast Algorithms For Hierarchical Range Histogram Constructions
A Model of Computation for MapReduce
Rooted Routing Using Structural Decompositions Jiao Tong University Shanghai, China June 17, 2013.
Chapter 6: Transform and Conquer
Toshiki Saitoh ERATO, Minato Project, JST Subgraph Isomorphism in Graph Classes Joint work with Yota Otachi, Shuji Kijima, and Takeaki Uno The 14 th Korea-Japan.
Reconstruction Algorithm for Permutation Graphs Masashi Kiyomi, Toshiki Saitoh, and Ryuhei Uehara School of Information Science Japan Advanced Institute.
June 2, Combinatorial methods in Bioinformatics: the haplotyping problem Paola Bonizzoni DISCo Università di Milano-Bicocca.
June 2, Combinatorial methods in Bioinformatics: the haplotyping problem Paola Bonizzoni DISCo Università di Milano-Bicocca.
© 2011 Carnegie Mellon University Binary Decision Diagrams Part Bug Catching: Automated Program Verification and Testing Sagar Chaki September.
Spring 07, Feb 13 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Binary Decision Diagrams Vishwani D. Agrawal James.
DATE-2002TED1 Taylor Expansion Diagrams: A Compact Canonical Representation for Symbolic Verification M. Ciesielski, P. Kalla, Z. Zeng B. Rouzeyre Electrical.
Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation J. Gramm, J. Guo, F. Hüffner and R. Niedermeier Theory of Computing Systems.
Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15 th IEEE International Conference on Application Specific Architectures.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
12/3CSE NP-complete CSE Algorithms NP Completeness Approximations.
L6: Haplotype phasing. Genotypes and Haplotypes Each individual has two “copies” of each chromosome. Each individual has two “copies” of each chromosome.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
A New Approach to Structural Analysis and Transformation of Networks Alan Mishchenko November 29, 1999.
A Compressed Breadth-First Search for Satisfiability DoRon B. Motter and Igor L. Markov University of Michigan, Ann Arbor.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Chapter 11: Limitations of Algorithmic Power
On Proof Systems Behind Efficient SAT Solvers DoRon B. Motter and Igor L. Markov University of Michigan, Ann Arbor.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
ECE Synthesis & Verification - Lecture 10 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Systems Binary.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
 2001 CiesielskiBDD Tutorial1 Decision Diagrams Maciej Ciesielski Electrical & Computer Engineering University of Massachusetts, Amherst, USA
ECE 667 Synthesis & Verification - BDD 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Binary Decision Diagrams (BDD)
Ramanujan Graphs of Every Degree Adam Marcus (Crisply, Yale) Daniel Spielman (Yale) Nikhil Srivastava (MSR India)
Theory of Computing Lecture 19 MAS 714 Hartmut Klauck.
Bipartite Permutation Graphs are Reconstructible Toshiki Saitoh (ERATO) Joint work with Masashi Kiyomi (JAIST) and Ryuhei Uehara (JAIST) COCOA /Dec/2010.
Simple Efficient Algorithm for MPQ-tree of an Interval Graph Toshiki SAITOH Masashi KIYOMI Ryuhei UEHARA Japan Advanced Institute of Science and Technology.
Graph Coalition Structure Generation Maria Polukarov University of Southampton Joint work with Tom Voice and Nick Jennings HUJI, 25 th September 2011.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
Binary Decision Diagrams (BDDs)
Chapter 11 Limitations of Algorithm Power. Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples:
Binary decision diagrams for computing the non-dominated set July 13, 2015 Antti Toppila and Ahti Salo 27th European Conference on Operational Research,
Kernel Bounds for Structural Parameterizations of Pathwidth Bart M. P. Jansen Joint work with Hans L. Bodlaender & Stefan Kratsch July 6th 2012, SWAT 2012,
Toshiki Saitoh ERATO, Minato Discrete Structure Manipulation System Project, JST Graph Classes and Subgraph Isomorphism Joint work with Yota Otachi, Shuji.
Section 2.1 Functions. 1. Relations A relation is any set of ordered pairs Definition DOMAINRANGE independent variable dependent variable.
Takeaki Uno Tatsuya Asai Yuzo Uchida Hiroki Arimura
Random Generation and Enumeration of Bipartite Permutation Graphs Toshiki Saitoh (JAIST, Japan) Yota Otachi (Gunma Univ., Japan) Katsuhisa Yamanaka (UEC,
Incomplete Directed Perfect Phylogeny Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan SIAM Journal on Computing Volume 33, Number 3, pp
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
Complexity 25-1 Complexity Andrei Bulatov Counting Problems.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View BDDs.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations Shuhei Denzumi1, Ryo Yoshinaka2,
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View BDDs.
2017/4/26 Rethinking Packet Classification for Global Network View of Software-Defined Networking Author: Takeru Inoue, Toru Mano, Kimihiro Mizutani, Shin-ichi.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Generating a d-dimensional linear subspace efficiently Raphael Yuster SODA’10.
Space Efficient and Output Sensitive Greedy Algorithms on Intervals Toshiki Saitoh (Kobe University) Joint work with ・ Takashi Horiyama (Saitama University)
BDDs1 Binary Tree Representation The recursive Shannon expansion corresponds to a binary tree Example: Each path from the root to a leaf corresponds to.
1 Design and Analysis of Algorithms Yoram Moses Lecture 13 June 17, 2010
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
Hybrid BDD and All-SAT Method for Model Checking
Tries 07/28/16 11:04 Text Compression
Decision trees Polynomial-Time
Finding Subgraphs with Maximum Total Density and Limited Overlap
Binary Decision Diagrams
Chapter 6: Transform and Conquer
Chapter 11 Limitations of Algorithm Power
Binary Decision Diagrams
The Polynomial Hierarchy Enumeration Problems 7.3.3
Fraction-Score: A New Support Measure for Co-location Pattern Mining
Presentation transcript:

MINATO ZDD Project Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data Toshiki Saitoh (ERATO) Joint work with Masashi Kiyomi (JAIST) Yoshio Okamoto (JAIST) 11 th International Symposium on Experimental Algorithms Bordeaux, France, June 7-9, 2012 Kobe University Yokohama City University The University of Electro-Communications

MINATO ZDD Project c3c3 c1c1 c4c4 c2c2 c6c6 c5c5 s1s1 s4s4 s2s2 s3s3 s5s5 Input: A species-character matrix M – All characters are binary. m sc = 1 iff the species s has the character c Output: A directed perfect phylogeny – An unordered rooted tree whose leaves have one species label. – Each character is labeled one node. – A species s has a character c if and only if the leaf with label s is a descendant of the node with label c. Directed Binary Perfect Phylogeny c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 s1s s2s s3s s4s s5s ○ 0 → 1× 1 → 0○ 0 → 1× 1 → 0

MINATO ZDD Project Directed Binary Perfect Phylogeny A matrix M admits a directed perfect phylogeny if and only if for every pair of columns i and j, either C i and C j are disjoint or one contains the other. Lemma [Jannson, 2008] C i : the set of species with the character c i c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 s1s s2s s3s s4s s5s c3c3 c1c1 c4c4 c2c2 c6c6 c5c5 s1s1 s4s4 s2s2 s3s3 s5s5 We can construct a phylogeny in polynomial time. C 3 ={s 1, s 2, s 4 } C 4 ={s 1, s 4 }C 6 ={s 3 } C 4 C 3

MINATO ZDD Project Incomplete Directed Perfect Phylogenies Input: An incomplete species-character matrix – The states of some characters are unknown. Output: A directed perfect phylogeny – The unknown states are completed C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001?10 S2S S3S3 ?0?001 S4S S5S5 100?00 C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S S2S S3S S4S S5S We can find one phylogeny in polynomial time. [Pe’er et al., 2004]

MINATO ZDD Project Incomplete Directed Perfect Phylogenies Input: An incomplete species-character matrix – The states of some characters are unknown. Output: A directed perfect phylogeny – The unknown states are completed C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001?10 S2S S3S3 ?0?001 S4S S5S5 100?00 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S S2S S3S S4S S5S C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S S2S S3S S4S S5S C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5 C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5 Enumeration of all perfect phylogenies from incomplete data

MINATO ZDD Project Why Enumeration? Data mining – Extraction of characters from all objects Indexing – Counting – Random sampling – Searching – Filtering C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S1 001?10 S2S S3S3 ?0?001 S4S S5S5 100?00 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S S2S S3S S4S S5S C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 S1S S2S S3S S4S S5S C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5 C3C3 C1C1 C4C4 C2C2 C6C6 C5C5 S1S1 S4S4 S2S2 S3S3 S5S5...

MINATO ZDD Project Our Contribution Proposing two enumeration algorithms – Branch and bound (B&B) Output all perfect phylogenies one by one Runs in O( |M| k h ) time – k : #“?” in M, h : #perfect phylogenies – ZDD approach Represent all perfect phylogenies compactly Many applications – Counting, random sampling, filtering Proof of #P-hardness of the counting problem – Reducing by counting the number of matchings in a bipartite graphs

MINATO ZDD Project What is a ZDD? ZDD: Zero-suppressed Binary Decision Diagram – Proposed by Minato [Minato, 1993] Compact representation for a boolean function – A boolean function corresponds to a family of sets. Reduction rules 1.Uniqueness 2.Zero-suppression x1x1 x2x2 x3x3 01 ZDD of F {{x 1, x 2 }, {x 1, x 3 }, {x 3 }} x1x1 x2x2 x3x3 x2x2 x3x3 x3x3 x3x Binary decision tree representing F

MINATO ZDD Project Reduction Rules xxx 1.Uniqueness x 2. Zero-suppression 0 Merge duplicate nodes (isomorphism subgraph) Eliminate redundant nodes There are algebraic operations for families of sets over ZDDs. A ZDD represents a family of sets in a compressed way.

MINATO ZDD Project Algebraic Operations on ZDDs Family algebras – Union, intersection, difference, join, quotient, remainder, etc. – Filtering objects in ZDDs Counting (random sampling) and optimization x1x1 x2x2 x3x3 01 {{x 1, x 2 }, {x 1, x 3 }, {x 3 }} x1x1 x2x2 x3x3 01 {{x 1 },{x 1, x 2, x 3 }} ˅ x1x1 x2x2 x3x3 01 x3x3 {{x 1 }, {x 3 }, {x 1, x 2 }, {x 1, x 3 }, {x 1, x 2, x 3 }} These operations can be performed in almost linear time.

MINATO ZDD Project Perfect Phylogenies and ZDD Introducing a boolean variable x sc for each species s and character c – x sc = 1 if and only if the species s has the character c. c1c1 c2c2 s1s1 10 s2s2 ?1 s3s3 0? x 11 x 21 x x A matrix M admits a directed perfect phylogeny if and only if for every pair of columns i and j, either C i and C j are disjoint or one contains the other. Lemma [Jannson, 2008]

MINATO ZDD Project Perfect Phylogenies and ZDD for every distinct character c i and c j exactly one of the following three is satisfied. A)for all species s, if x sc i =1 then x sc j =1 B)for all species s, if x sc i =0 then x sc j =0 C)for all species s, if x sc i =1 then x sc j =0 A matrix M admits a directed perfect phylogeny if and only if for every pair of columns i and j, either C i and C j are disjoint or one contains the other. Lemma [Jannson, 2008] C i C j C j C i Introducing a boolean variable x sc for each species s and character c – x sc = 1 if and only if the species s has the character c.

MINATO ZDD Project Experiments Instances: – Constructing an incomplete data from complete data Random data set [Hudson, 2002] “1” or “0” -> “?” with probability p (={0.1, 0.2, 0.3, 0.4, 0.5}) – Matrix size (n, m): ({50, 100}, {50, 100}) – 100 instances for each triple (n, m, p) B&B algorithm is written by C. ZDD approach is written by C++ (ZDD library is developed by Minato) Machine spec – OS: SuSE Linux Enterprise Server 10 – CPU: Quad-Core AMD Opteron Processor 8393 #CPUs 16, #Processors 32, Clock Freq. 3092MHz – Memory: 512GB ?? 1 0?

MINATO ZDD Project Experimental Results B&BZDD Approach m, n 50, 5050, , 50100, 10050, 5050, , 50100, 100 p= p= The number of solved instances by B&B and ZDD approach. (“solved” means that the algorithm successfully halts.) Timeout: 2 minutes

MINATO ZDD Project Experimental Results

MINATO ZDD Project Experimental Results

MINATO ZDD Project Experimental Results The size of ZDD is times smaller than the number of perfect phylogenies.

MINATO ZDD Project Conclusion Our results – Proposing two enumeration algorithms Branch and bound algorithm (B&B) ZDD approach – ZDD approach solved more instances than B&B. – Spends more time with more the ZDD size. – Show high compression rate of ZDD for the random data. – Proof of #P-hardness of the counting problem Thank you for your attention!