Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Algorithm for the Consecutive Ones Property Claudio Eccher.

Similar presentations


Presentation on theme: "An Algorithm for the Consecutive Ones Property Claudio Eccher."— Presentation transcript:

1

2 An Algorithm for the Consecutive Ones Property Claudio Eccher

3 Outline 3.An algorithm for the C1P problem Dividing in components Taking care of a component Joining the components together 2.Biological background Hybridization mapping 1.C1P definition

4 The consecutive ones property Definition: A binary matrix is said to have the consecutive ones property (C1P) if a permutation of its columns can be found such that all 1s in each row are consecutive ABCD 11001 20101 31010 CADB 10110 20011 31100

5 The consecutive ones property Observation: the C1P is closed under taking submatrices CAD 1011 2101 3110 A bad matrix: Whichever column x I put in the middle there is a row in which x is 0 Hence, every matrix containing this submatrix is ‘bad’

6 Hybridization mapping (1) The possible binding of small sequences (probes) to a clone are checked, the subset of the probes bounded (hybridized) to a clone becomes its fingerprint Clones’ overlap, and thus their relative order, are determined by comparing fingerprints Copies of a DNA molecule are broken into several fragments (~10 4 bases) and replicated by cloning (clones)

7 Hybridization mapping (2) Clone 1 Clone 2 ADCBProbes Two clones sharing part of their respective fingerprints are likely to have come from overlapping DNA regions

8 Assumptions All “clones x probes” hybridization experiments have been done There are no errors Probes are unique

9 Model n x m binary matrix M built from experimental data  M ij = 1  probe j hybridized to clone i  M ij = 0  probe j not hybridized to clone i n clones and m probes

10 Problem Obtaining a physical map from M Finding a permutation of the columns such that all 1s in each row are consecutive Determing if M has the C1P for rows

11 An algorithm for the C1P problem The problem belongs to P Without loss of generality we can assume that: All rows are different No row is all zeros The algorithm is from Fulkerson and Gross (1965)

12 Algorithm sketch Join of the components together Separation of the rows into components (subsets of rows) Permutation of the columns of each component

13 Row relations Definition:  row i  S i ={columns k | M i,k =1} Given two rows i and j: 1.S i  S j =  or 2.S i  S j or S j  S i or 3.S i  S j   and none of them is a subset of the other

14 Dividing in components (1) Let’s initially lump together in the same component the rows with non empty intersection If  a row k s.t.: Then row k can be put in its own component S k  S i =  or S k  S i  i  k in this component

15 Dividing in components (2) A graph G c = (V,E) is built from matrix M Each vertex V is a row of M There is an undirected edge E from V i to V j if S i  S j   and none of them is a subset of the other The components we want are the connected components of G c

16 Building G c : an example c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l1 110110101 l2l2 011111111 l3l3 010110101 l4l4 001000010 l5l5 001001000 l6l6 000100100 l7l7 010000100 l8l8 000110001 l2l2 l1l1  l3l3  l4l4 l5l5  l6l6 l7l7 l8l8  GcGc Edge (l 1, l 2 )

17 Building G c : an example c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l1 110110101 l2l2 011111111 l3l3 010110101 l4l4 001000010 l5l5 001001000 l6l6 000100100 l7l7 010000100 l8l8 000110001 l2l2 l1l1  l3l3  l4l4 l5l5  l6l6 l7l7 l8l8  GcGc Edge (l 4, l 5 )

18 Building G c : an example c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l1 110110101 l2l2 011111111 l3l3 010110101 l4l4 001000010 l5l5 001001000 l6l6 000100100 l7l7 010000100 l8l8 000110001 l2l2 l1l1  l3l3  l4l4 l5l5  l6l6 l7l7 l8l8  GcGc Edge (l 6, l 7 )

19 Building G c : an example c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l1 110110101 l2l2 011111111 l3l3 010110101 l4l4 001000010 l5l5 001001000 l6l6 000100100 l7l7 010000100 l8l8 000110001 l2l2 l1l1  l3l3  l4l4 l5l5  l6l6 l7l7 l8l8  GcGc Edge (l 6, l 8 )

20 Taking care of a component (1) {2,7,8} l1l1 …01110… {5}{2,7} {8} l1l1 …001110… l2l2 …011100… The 1s of the first row have to be put consecutive. The possible solutions can be represented as follows: The second row is adjacent to the first one. Hence, for the second row (l 2 ) there are 2 choices: the 1s can be placed to the left or to the right of those of the row l 1. In any case the direction does not really matter c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l1 01000011 l2l2 01001010 l3l3 10010011 l1l1 l2l2 l3l3

21 Taking care of a component (2) c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l1 01000011 l2l2 01001010 l3l3 10010011 For the third row (l 3 ) we have to consider the relations with the rows connected by edges to l 3 l1l1 l2l2 l3l3 Let’s place l 3 with respect to l 2 : we cannot place l 3 in either direction (left or right) because of its relation with l 1 To take into account the relation between l 1 and l 3 is necessary to consider the number of elements in the intersections between S 1, S 2 and S 3

22 Taking care of a component (3) c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l1 01000011 l2l2 01001010 l3l3 10010011 Definition: Let x·y = | S x  S y | be the internal product of rows x and y l1l1 l2l2 l3l3 If l 1 ·l 3 < min(l 1 ·l 2, l 2 ·l 3 ) then l 3 has to be placed in the same direction that l 2 was placed with respect to l 1 If l 1 ·l 3 > min(l 1 ·l 2, l 2 ·l 3 ) then l 3 has to be placed in the opposite direction that l 2 was placed with respect to l 1 If we have equality it isn’t possible to have the 1s of l 3 consecutive

23 Taking care of a component (4) For l 3, S 3 = {1,4,7,8}, l 1 ·l 3 = 2, l 1 ·l 2 = 2, l 1 ·l 3 = 1, so l 3 have to be put to the right of l 2 : {5}{2}{7}{8}{1,4} l1l1 …00111000… l2l2 …01110000… l3l3 …00011110… c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l1 01000011 l2l2 01001010 l3l3 10010011 l1l1 l2l2 l3l3

24 Taking care of a component (5) The only choice made was in the placement of l 2 with respect to l 1 and both possibilities result in the same solutions up to reversal. Therefore, if the component has the C1P, then l 1 and l 3 must result properly placed If, on the contrary, l 1 and l 3 are not properly placed, then we conclude that the component (and hence the matrix) doesn’t have the C1P We had no choice in placing l 3

25 String generator We have seen the following examples of string generator {2,7,8} {{5}{2,7}{8}} {{5}{2}{7}{8}{1,4}} A permutation p of the probes is compatible with a string generator if whenever A, B, C appear in this order in p and A and C are in a group G, then B is also included in G An invariant of the algorithm is that, after considering rows 1..k, a permutation p certificates the C1P of the submatrix on rows 1..k iff either p or its reversal is compatible with the string generator

26 Taking care of a component: a ‘bad’ component The relations between the rows are the same as the preceding component {5}{2}{7}{8}{3}{1,4} l1l1 …00111100… l2l2 …01110000… l3l3 …00011011… c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l1 01100011 l2l2 01001010 l3l3 10010011 {5}{2,7}{8, 3} l1l1 …00110… l2l2 …01100…

27 Taking care of a component (6) For a new row k in the same component find two previously placed rows i and j s.t.  E(k,i), E(i,j) in G c and proceed as for the three-row case. Check also the consistency with the solution generator The algorithm gives all possible permutations of a component having the C1P, up to reversal

28 Algorithm implementation When visiting a vertex invoke procedure Place If column sets are not consistent then the component doesn’t have the C1P Construct G c and traverse it using depth-first search Algorithm Place input: u, v, w vertices of Gc=(V,E) s.t. (u,v)  E and (v,w)  E output: A placement for row u, if possible if v = nil and w = nil then Place all 1s of u consecutively else if w = nil then Left- or right-place the 1s of u with respect to the 1s of v Record direction used else if u · w < min(u · v, v · w) then Place u with respect to v in the same direction used in v, w placement. Record direction used else Place u with respect to v in the opposite direction used in v, w placement. Record direction used Check consistency of column set

29 Algorithm running time For a n x m matrix building graph G c takes O(nm) time To check consistency of column sets requires O(m) time per row and there are n rows to process Total time is thus O(nm)

30 Joining components together (1) Construct a new graph G M = (V,E) in which: Each component  k of M is a vertex in G M For ,   V, there is a directed edge from  to  if  row i   sets S i are contained in at least one set S j of  G M tells us how the components of M fit together

31 G M for the example matrix c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l1 110110101 l2l2 011111111 l3l3 010110101 l4l4 001000010 l5l5 001001000 l6l6 000100100 l7l7 010000100 l8l8 000110001    GMGM    

32 Joining components together (2) For two sets S i  , S j  , if S i  S j then there is no row k   s.t. S i  S k and S i  S k  The exact same containments and disjunctions hold for all other sets from  G M is acyclic

33 Joining components together (3) The joining of components depends on the way sets in one component contain or are contained in sets from other components Components having sets not contained anywhere else should be processed first Containment is specified by the directed edges in G M

34 Joining components together (4) G M has to be processed in topological order Remove all sources from G M (e.g.  ) and make the union of their string generators While G M is not empty take the next source  remove  from G M, and refine the current string generator with the string generator of 

35 Example (1) c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l1 110110101 l2l2 011111111 l3l3 010110101 l4l4 001000010 l5l5 001001000 l6l6 000100100 l7l7 010000100 l8l8 000110001    GMGM     One topological order is , , , 

36 Example (2) l1l1 …111111000… l2l2 …011111111… {1}{2,4,5,7,9}{3,6,8} l1l1 …11111… {2,4,5,7,9} l6l6 …00110… l7l7 …00011… l8l8 …11100… l4l4 …011… l5l5 …110… {6}{3}{8} {9,5}{4}{7}{2}    

37 Example (3) l1l1 …111111000… l2l2 …011111111… l3l3 …011111000… {1}{2,4,5,7,9}{3,6,8} l6l6 …00110… l7l7 …00011… l8l8 …11100… {9,5}{4}{7}{2}

38 Example (4) l1l1 …111111000… l2l2 …011111111… l3l3 …011111000… l6l6 …000110000… l7l7 …000011000… l8l8 …011100000… {1}{9,5}{4}{7}{2}{3,6,8} l4l4 …011… l5l5 …110… {6}{3}{8}

39 Example (5) l1l1 …111111000… l2l2 …011111111… l3l3 …011111000… l6l6 …000110000… l7l7 …000011000… l8l8 …011100000… l4l4 …000000011… l5l5 …000000110… {1}{9,5}{4}{7}{2}{6}{3}{8} In this particular case there are two solutions corresponding to the permutation of identical columns (5 and 9)

40 Algorithm solution is not unique In general multiple solutions may exist because: Each component may on its own have several solutions Each solution can be used in two ways: the permutation and its reversal

41 Algorithm running time Topological sorting of G M takes time O(n+m) If the entries of M are preprocessed the queries needed for traversing G M can take constant time Preprocessing takes at most O(nm) Total time for processing each component c i is O(n i m) Algorithm running time is O(nm)

42 Concluding remarks (1) Even if a C1P permutation exists, this is not necessarily the true permutation: In general errors do exist, so the true permutation is not the C1P one The solution is not unique

43 Concluding remarks (2) Generalizations to account for errors yield NP- hard problems Also relaxing the assumption of unique probes yields NP-hard problems

44 Related works A considerably more complicated algorithm from Booth and Leuker exists (1976) that takes O(n+m+r) time (r is the total number of 1s) Quite recently a simple O(n+m+r)-time algorithm has been presented by Hsu - J Algorithms 43 (2002), no. 1, 1-16


Download ppt "An Algorithm for the Consecutive Ones Property Claudio Eccher."

Similar presentations


Ads by Google