Presentation is loading. Please wait.

Presentation is loading. Please wait.

Physical Mapping of DNA Shanna Terry March 2, 2004.

Similar presentations


Presentation on theme: "Physical Mapping of DNA Shanna Terry March 2, 2004."— Presentation transcript:

1 Physical Mapping of DNA Shanna Terry March 2, 2004

2 Overview Background Types of Mapping Mathematical Models Enhanced Double Digest Problem

3 Background Given a sequence of DNA, how do we figure out where on some larger chromosome the sequence lies? ?

4 Background II Look for markers that match in both the chromosome and the shorter sequence. –Markers: Usually short, precisely defined sequences match!

5 Background III How do we create the original map? Generate fingerprints (markers) with: –Restriction site mapping –Hybridization

6 Background IV But why? Can’t we just expand the sequence assembly techniques we’ve already learned? NO! (with one exception) Why not? A chromosome isn’t just 150k bps long. … Human chromosomes range in length from 51 million to 245 million base pairs.

7 Overview Background Types of Mapping Mathematical Models Enhanced Double Digest Problem

8 Restriction Site Mapping In this situation, the fingerprint is the length between restriction sites of given enzymes (recall from previous lectures). Make three copies of target DNA: strings A, B, C. Apply one enzyme (α) to string A, another (β) to string B, and both (α and β) to string C. Line up the fragments in A and B so they match C: this is the double digest problem. 6 14 7 5 8 3 15 4 6 2 3 9 7 1 4

9 Restriction Site Mapping II A variant is the partial digest approach: Use only one enzyme, but allow it to act for different time periods. Different restriction sites will be recognized. Fragment sites: 6, 20, 27, 32; 14, 21, 26; 7, 12; and 5 6 14 7 5

10 Restriction Site Mapping III 6 20 … 14 21 etc … Fragment sites: 6, 20, 27, 32; 14, 21, 26; 7, 12; and 5 6 14 7 5 6 6 14 14 14 7

11 Hybridization Mapping Check whether specific small sequences (called probes) bind (hybridize) to fragments (clones) The fingerprint is the subset of probes that successfully hybridize to the clone. If some portion of one clone’s fingerprint matches another, they are likely to be from overlapping regions of the target.

12 Hybridization Mapping II Probes x, y, z, bound to clone A; x, w and z bound to clone B… overlap in x and z. yx z w Except… we don’t know that much. We only know which probes bind to which clones. Not ordering or even relative lengths!

13 Background Types of Mapping Mathematical Models Enhanced Double Digest Problem

14 Restriction Site Models Back to the double digest problem: we’ve split the strings A, B, C into fragments with two enzymes. We have the multisets made up of the fragment lengths: –From previous example: A = {5, 6, 7, 14} B = {3, 4, 8, 15} C = {1, 2, 3, 4, 6, 7, 9} Find permutations of A and B such that there is a one-to-one correspondence between all the subintervals and C. Not too bad, right?

15 Restriction Site Models II BAD NEWS: The double digest problem is NP-complete. It is a generalization of the set-partition problem, already known to be NP-complete. To give you an idea…the number of solutions is (k-1)! for k = number of restriction sites. BUT… we will see a heuristic later…

16 Interval Graph Models Model hybridization mapping in terms of interval graphs –Interval graph: A graph G which is mapped from a series of intervals. For each interval there is a vertex in G. For each intersection of intervals there is an edge in G.

17 Interval Graph Models II Ex: a b c d e b a c e d

18 Interval Graph Models III To apply this to the hybridization mapping problem… We create graphs with vertices representing clones (fragments), and edges representing overlaps between clones. Two graphs: one for known overlaps and one with known and unknown overlap information (neither are necessarily interval graphs).

19 Interval Graph Models IV Now find the true interval graph (a subgraph of the known and unknown graph) given the two graphs. Known overlaps + Known/Unknown Actual Interval Overlaps Graph Hmm… Not too easy.

20 Interval Graph Models V MORE BAD NEWS! This is NP-hard. Maybe another model? –There are two other possible models for hybridization mapping (described in the book). But…. Those are NP-hard too!

21 Consecutive Ones Property We’re sick of NP hard problems. Give us something a little easier. The Consecutive Ones Property Model (C1P) can be solved in linear time! Assumptions: 1.The probes are unique. 2.There are no errors. (!!) 3.All of the correspondences of clones and probes have been found. (!!!)

22 Consecutive Ones Property II Build a matrix (n x m), n = # of clones, m = # of probes. Entry i,j is a binary code for whether probe j hybridized to clone i. above: probe 1 hybridized to clone 1, probe 2 hybridized to clone 1, probe 1 hybridized to clone 3, probe 4 hybridized to clone 3. 1101 0101 1011

23 Consecutive Ones Property III Find a permutation of the columns (probes) such that all the 1s in each row (clone) are consecutive. 1001 0101 1010 0110 0011 1100

24 Consecutive Ones Property IV This algorithm can be run in linear time! Unfortunately, the assumption that there are no errors isn’t useful because biology isn’t a mathematical model. Probes may not bind, DNA may be replicated incorrectly. And generalizations make the problem NP-hard again! We need a good heuristic…

25 Background Types of Mapping Mathematical Models Enhanced Double Digest Problem

26 The Enhanced Double Digest (EDD) problem is NP-hard in the general case, but if the lengths of fragments in C (the string acted upon by both types of enzymes) are distinct, it can be solved in linear time! Why do the fragments have to be distinct? … What if all the fragments are the same length?

27 Problem Formulation We have the multisets A and B. A = {6, 14, 7, 5} B = {8, 3, 15, 4} Take the actual fragments corresponding to each member of the either set (since the sets are only lengths). Apply the other enzyme to the fragment (i.e. apply enzyme β to fragments from A, and vice versa) to create subfragments. AB i is the multiset of subfragments created by applying enzyme β to fragments from A; BA j from applying enzyme α to fragments of B.

28 Problem Formulation II Example: 8 {2,6} 5 {1,4} A={5, 6, 7, 14} B={3, 4, 8, 15} AB 1 ={1,4}, AB 2 ={6}, AB 3 ={7}, AB 4 ={2,3,9} BA 1 ={3}, BA 2 ={4}, BA 3 ={2,6}, BA 4 ={1,7,9} 6 14 7 5 8 3 15 4 6 2 3 9 7 1 4

29 Algorithm Given A, B, AB i and BA j for all i, j, construct an undirected graph that connects each element of A and B to its corresponding AB/BA. Note that all elements in C will be covered A: 5 6 7 14 C: 1234679 B: 3 4 8 15

30 Algorithm II Create a spanning tree: Start at random node, follow all paths from the node, don’t repeat edges. 6 A 6 C 8 B 2 C 14 A 9 C 15 B 1 C 5 A 4 C 4 B 3 C 7 C 3 B 7 A

31 Properties The graph (G) will always be connected, and every node in A and B will only be adjacent to nodes from C. Each node from C connects to only one node each from A and B. If the problem can be solved: G will be a spanning tree, and any subtree that “hangs” on the longest path will be a 2-node length path (dangler).

32 Properties II Danglers: 6 A 6 C 8 B 2 C 14 A 9 C 15 B 1 C 5 A 4 C 4 B 3 C 7 C 3 B 7 A

33 Algorithm III If the graph G is not a spanning tree, and not all subtrees hanged off the longest path are danglers, then there is no valid permutation. Perform Dangler-first search on the graph G…

34 Dangler-First Search Traverse G starting at one end of a path S with the largest number of edges, reading only the nodes from C. Whenever reaching a node with degree greater than 2 (must have a dangler), read the nodes in C from the hanging danglers first, then continue to traverse S. This sequence is π c.

35 Dangler-First Search II π c = 6, 2, 3, 9, 7, 1, 4. 6 A 6 C 8 B 2 C 14 A 9 C 15 B 1 C 5 A 4 C 4 B 3 C 7 C 3 B 7 A

36 Algorithm IV The elements in each AB i form a consecutive subsequence in π c. Likewise, the elements in each BA j also form a consecutive subsequence in π c. –This permutation is a valid permutation… meaning: you have the answer!

37 Solution A= 6, 14, 7, 5 π c = 6, 2, 3, 9, 7, 1, 4 B= 8, 3, 15, 4 6 A 6 C 8 B 2 C 14 A 9 C 15 B 1 C 5 A 4 C 4 B 3 C 7 C 3 B 7 A 6 14 7 5 8 3 15 4 6 2 3 9 7 1 4

38 Enhanced Double Digest Problem II This can be solved in O(n) time! A generalization (assuming a constant number of duplicates) can also be solved in O(n) time. The general enhanced double digest problem is still NP-Hard. This can be shown by reduction from the Hamiltonian Path problem.


Download ppt "Physical Mapping of DNA Shanna Terry March 2, 2004."

Similar presentations


Ads by Google