1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.

Slides:



Advertisements
Similar presentations
CS 336 March 19, 2012 Tandy Warnow.
Advertisements

Great Theoretical Ideas in Computer Science
8.3 Representing Relations Connection Matrices Let R be a relation from A = {a 1, a 2,..., a m } to B = {b 1, b 2,..., b n }. Definition: A n m  n connection.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
Introduction to Graph Theory Lecture 11: Eulerian and Hamiltonian Graphs.
The Theory of NP-Completeness
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Combinatorial Algorithms
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Applied Discrete Mathematics Week 12: Trees
Graph & BFS.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Graph & BFS Lecture 22 COMP171 Fall Graph & BFS / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
On Testing Convexity and Submodularity Michal Parnas Dana Ron Ronitt Rubinfeld.
1 Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network Prof. Yu-Chee Tseng Department of Computer Science National Chiao-Tung University.
ARCHEOLOGICAL SERIATION AND INTERVAL GRAPHS
Physical Mapping II + Perl CIS 667 March 2, 2004.
Let us switch to a new topic:
MATRICES. Matrices A matrix is a rectangular array of objects (usually numbers) arranged in m horizontal rows and n vertical columns. A matrix with m.
Applied Discrete Mathematics Week 10: Equivalence Relations
Randomized Algorithms Morteza ZadiMoghaddam Amin Sayedi.
GRAPH Learning Outcomes Students should be able to:
Physical Mapping of DNA Shanna Terry March 2, 2004.
Systems of Linear Equation and Matrices
MAPS OF DNA AND INTERVAL GRAPHS by Akshita Gurram.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
© by Kenneth H. Rosen, Discrete Mathematics & its Applications, Sixth Edition, Mc Graw-Hill, 2007 Chapter 9 (Part 2): Graphs  Graph Terminology (9.2)
Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.
TECH Computer Science NP-Complete Problems Problems  Abstract Problems  Decision Problem, Optimal value, Optimal solution  Encodings  //Data Structure.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Restricted Track Assignment with Applications 報告人:林添進.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Techniques for Proving NP-Completeness Show that a special case of the problem you are interested in is NP- complete. For example: The problem of finding.
Indian Institute of Technology Kharagpur PALLAB DASGUPTA Graph Theory: Introduction Pallab Dasgupta, Professor, Dept. of Computer Sc. and Engineering,
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Nonunique Probe Selection and Group Testing Ding-Zhu Du.
1 Closures of Relations: Transitive Closure and Partitions Sections 8.4 and 8.5.
September1999 CMSC 203 / 0201 Fall 2002 Week #13 – 18/20/22 November 2002 Prof. Marie desJardins.
Lecture 6 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
LIMITATIONS OF ALGORITHM POWER
Lecture 25 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
CSC 413/513: Intro to Algorithms
Chapter 9: Graphs.
Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
(CSC 102) Lecture 30 Discrete Structures. Graphs.
Grade 11 AP Mathematics Graph Theory Definition: A graph, G, is a set of vertices v(G) = {v 1, v 2, v 3, …, v n } and edges e(G) = {v i v j where 1 ≤ i,
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Dr Nazir A. Zafar Advanced Algorithms Analysis and Design Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar.
Maryam Pourebadi Kent State University April 2016.
Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Greedy Algorithms / Minimum Spanning Tree Yin Tat Lee
ICS 353: Design and Analysis of Algorithms
Enumerating Distances Using Spanners of Bounded Degree
Graphs Chapter 11 Objectives Upon completion you will be able to:
Lectures on Graph Algorithms: searching, testing and sorting
CSE 589 Applied Algorithms Spring 1999
Presentation transcript:

1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004

2 Introduction Why physical mapping? -Physical mapping is a central in Molecular Biology. -DNA is cut into small fragments for replicate and study, and information on the ordering is lost. -The goal of physical mapping is to reconstruct the relative ordering of the clones.

3 Introduction Two Popular ways of obtaining fingerprints: - Restriction site analysis. Measure fragment ’ s length which is its fingerprint. - Hybridization. Check whether a small sequence known as a probe binds or hybridizes to the clone which is DNA fragment. Most often a probe is a STS (sequence tagged sites) – DNA string of bp whose ends occur only once in the entire genome.

4 Models for Hybridization Mapping -Interval Graph Models: Vertices represent clones and edges represent overlap information between clones. -Disadvantage: complexity NP-hard.

5 Models for Hybridization Mapping-C1P definition Definition: A binary matrix is said to have the consecutive ones property (C1P) if a permutation of its columns can be found such that all 1s in each row are consecutive. ABCD CADB

6 Models for Hybridization Mapping – C1P Assumptions for Consecutive Ones Property (C1P) Model : a. Probes are unique – a probe can bind to a clone in at most one place – use STS (sequence tagged sites); b. No errors – (C1P permutation exists); c. All “ clones*probes ” hybridization experiments have been done – difficult to achieve. Advantage: Polynomial-time solvable.

7 Models for Hybridization Mapping – C1P model n clones and m probes n * m binary matrix M built from experimental data M ij = 1 probe j hybridized to clone i M ij = 0 probe j not hybridized to clone i c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l l2l l3l l4l l5l l6l l7l l8l

8 Algorithm for C1P - Introduction Goal – Find a permutation of the columns such that in each row all 1s are consecutive. Assumptions: All rows are different, i.e. no two clones have the same fingerprint. No row is all zeros, i.e. every clone is hybridized by at least one probe.

9 Algorithm for C1P – Algorithm sketch Separation of the rows into components (subsets of rows). Permutation of the columns of each component. Join of the components together.

10 Algorithm for C1P – Row relations Definition:  row i  S i ={columns k | M i,k =1} Given two rows i and j: 1.S i  S j =  or 2.S i  S j or S j  S i or 3.S i  S j   and none is a subset of the other. First case: i and j have no conflicts - they can be dealt with separately. Second case: i and j are compatible - any solution for the row with fewer 1s is acceptable. Third case: i and j have to be treated simultaneously - they are connected.

11 Algorithm for C1P – Taking care of a component α β γ δ Figure 5.7 Graph Gc corresponding to the matrix of Table 5.1 l4l4 l5l5 l3l3 l2l2 l1l1 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l l2l l3l l4l l5l l6l l7l l8l TABLE 5.1 A binary matrix. l8l8 l6l6 l7l7

12 Algorithm for C1P – Example Matrix c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l l2l l3l A section of a binary matrix l 2 → … … {2,7,8} {2,7,8}{2,7,8} l 1 → … … {5} {2,7} {2,7} {8} l 1 → … … l1l1 l2l2 l3l3

13 Algorithm for C1P – Example Matrix What will happen if we place 5 on the right? {8} {7,2} {7,2} {5} l 1 → … … l 2 → … … c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l l2l l3l l1l1 l2l2 l3l3

14 Algorithm for C1P – Example Matrix How to place l 3 ? Consider the number of elements in the intersections between S 1, S 2 and S 3. Definition: Let x*y = |S x ∩S y | be the internal product of rows x and y. -If l 1 *l 3 < min(l 1 *l 2, l 2 *l 3 ), place l 3 in the same direction that l 2 was placed with respect to l 1. -If l 1 *l 3 > min(l 1 *l 2, l 2 *l 3 ), place l 3 in the opposite direction that l 2 was placed with respect to l 1. c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l l2l l3l l1l1 l2l2 l3l3

15 Algorithm for C1P – Example Matrix In our case: S3 = {1,4,7,8}, Then l 1 *l 3 = 2, l 1 *l 2 = 2, l 3 *l 2 = 1. So, place l 3 to the right of l 2. {5}{2}{7}{8}{1,4}{1,4} l 1 → … … l 2 → … … l 3 → … … c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 l1l l2l l3l l1l1 l2l2 l3l3

16 Algorithm for C1P – Complexity Building Graph Gc takes O(nm) time. Process n rows, spending O(m) per row to check consistency of column sets. Total time is O(nm).

17 Algorithm for C1P – Joining Components Together β Figure 5.9 Graph G M corresponding to the components of the matrix from Table 5.1. α δ γ c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 c8c8 c9c9 l1l l2l l3l l4l l5l l6l l7l l8l TABLE 5.1 A binary matrix.

18 Algorithm for C1P – Joining Components Together Process G M in topological ordering: -Process first components that have sets that are not contained anywhere else. -Suppose following edge (α,β), find “ reference column ” in component αthat will tell us how to place the rows of β. a. Choose row l fromβthat has the leftmost 1, and call the column where this 1 is c β. b. Find all rows fromαthat contain S l, and find the leftmost column where all such rows have 1s, this column c α is the reference column.

19 Algorithm for C1P – Joining Components Together {1} {2,4,5,7,9} {3,6,8} l 1 → … … l 2 → … … {2,4,5,7,9} l 3 → … … {1} {2,4,5,7,9} {3,6,8} l 1 → … … l 2 → … … l 3 → … … α β

20 Algorithm for C1P – Joining Components Together {9,5} {4} {7} {2} l 6 → … … l 7 → … … l 8 → … … {1} {9,5} {4} {7} {2} {3,6,8} … l 1 → … … l 2 → … … l 3 → … … l 6 → … … l 7 → … … l 8 → … … δ

21 Algorithm for C1P – Joining Components Together {6} {3} {8} l 4 → … … l 5 → … … {1} {9,5} {4} {7} {2} {6} {3} {8}… l 1 → … … l 2 → … … l 3 → … … l 6 → … … l 7 → … … l 8 → … … l 4 → … … l 5 → … … γ α β δ γ

22 Algorithm for C1P – Joining Components Together Complexity: Topological sorting O(n+m); Preprocessing takes at most O(nm), e.g. store for each row the column where its leftmost 1 is; Total time O(nm).

23 Approximation for Hybridization Mapping with Errors a false negation separate two blocks of 1s, creating another gap Approach: find a permutation where the total number of gaps in the matrix is minimum.

24 Approximation - Graph Model Gap minimization is equivalent to solving traveling salesman problem (TSP). TABLE 5.3 A clones*probes matrix with added column p 6 *. p1p1 p2p2 p3p3 p4p4 p5p5 P6*P6* c1c c2c c3c c4c

25 Approximation - Graph Model FIGURE 5.10 TSP graph for matrix of Table 5.3. p1p1 p2p2 p3p3 p4p4 p5p5 P6*P6* The weight on each edge of G is the number of rows where the two corresponding columns differ. p1p1 p2p2 p3p3 p4p4 p5p5 P6*P6* c1c c2c c3c c4c

26 Approximation - Graph Model  a gap: a transition from 1 to 0 and further on a transition from 0 to 1. -two transitions for each gap, each gap contributes 2 to the weight of the cycle.  extremal transitions: transitions between elements in extremal (1 or m) column. -include an extra column of zeros in column m+1 to ensure every row has a pair of extremal transitions. prevent consecutive 1s to wrap around in each row.

27 Approximation - Graph Model -Relationship between cycles and permutations: Cycle weight = number of gap transitions + 2n For a given n, minimizing cycle weight is the same as minimizing the number of gaps. -Drawback: one or a few rows may have many gaps, while others may have none. One clone was subject to many more errors than other clones, and this contradicts laboratory experience. -Solution: minimizing the number of gaps per row.

28 Approximation - Guarantee -Assumptions: a. The number of probes is sufficiently large. b. The mapping process obeys a certain mathematical model. -Features: a. Each clone ’ s position is an independent random variable, clone locators are distributed uniformly over [0, N-1]. b. Occurrences of a given probe obey a Poisson process with rate λ. Pr{a given probe occurs k times in a given clone} =e -λ λ k / k!.

29 Approximation - Guarantee TSP permutation is a good approximation to the true permutation. Prove in terms of graph weights or clone distances. t ij = |l j – l i | + |r j -r i | = 2|l j -l i | t ij : true distance; clone ’ s coordinates: l (left), r (right); h ij : Hamming distance between clones i and j. Given any four clones i, j, r, and s, h ij < h rs implies t ij < t rs t ij < t rs implies h ij < h rs.

30 Approximation – Computational Practice Define hybridization graph H as a bipartite graph (U, V, E): Clones are the vertices of the U partition; Probes are the vertices of the V partition; There is an edge between two vertices if the corresponding probe hybridized to the corresponding clone.

31 Approximation – Computational Practice p1p1 p2p2 p3p3 p4p4 p5p5 c1c1 c2c2 c3c3 c4c4 FIGURE 5.11 Hybridization graph H corresponding to hybridization matrix from Table 5.3, without the added column. p1p1 p2p2 p3p3 p4p4 p5p5 P6*P6* c1c c2c c3c c4c TABLE 5.3 A clones*probes matrix with added column p6*.

32 Approximation – Computational Practice Observations: a. H may not be connected, not be able to tell the relative order between probes that belong to different components. b. Connected component may be as simple as a singleton vertex. No hybridization - 0 in Column. c. Redundant probes, or probes that hybridize to exactly the same set of clones - same 1s and 0s in columns.

33 Approximation – Computational Practice Evaluation of a mapping algorithm is a difficult task. The fraction of strong adjacencies is used to measure a mapping algorithm. -Strong adjacencies: the number b of blocks of consecutive 1s present in a hybridization matrix with a given probe permutation π = p 1, p 2, …, p m. -Translocations: operations that reverse the order of a set of consecutive probes. Two adjacent probes p i and p i+1 represent a strong adjacency if placing these probes apart by any translocation increases b in each row.

34 Approximation – Computational Practice Strong adjacency cost: 100(1/m-1∑δ i ) δ i = 1, if p i and p i+1 is a strong adjacency in the true permutation but these probes are not adjacent in the proposed permutation. δ i = 0, otherwise.

35 Approximation – Computational Practice TABLE 5.4 Strong adjacency costs for two algorithms on matrices with different kinds of errors. Error rates are indicates in the heading of each column (only one type of error per column). Coverage in all cases is 10, where coverage is the ratio between the total length of all clones and target DNA length. C1P 0 Chimerism 0.5 False Positives 0.04 False Negatives 0.32 Greedy TSP Random

36 REFERENCES 1. Sections 5.3 and 5.4 in our textbook: Introduction to Computational Molecualar Biology, Setubal/Meidanis, On the Complexity of DNA Physical Mapping, Martin Charles Golumbic, Haim Kaplan and Ron Shamir, Advances in Applied Mathematics 15, (1994).

37