Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structural genomics includes the genetic mapping, physical mapping and sequencing of entire genomes.

Similar presentations


Presentation on theme: "Structural genomics includes the genetic mapping, physical mapping and sequencing of entire genomes."— Presentation transcript:

1 Structural genomics includes the genetic mapping, physical mapping and sequencing of entire genomes

2 3 1 2 4 5 6 How to get a genomic library: Breaking the DNA, cloning the fragments, and ordering 1,...,6 Cloned DNA Fragments Cleavage site Let us cut the isolated DNA with a restriction enzyme taken at a low concentration many sites will remain unrestricted

3 Marker every fifth lane Marra et al., Genome Res., 7, 1072-1084 (1997) 96 samples, 25 marker lanes BAC Fingerprinting: Gel-based Fragment Separation

4  Hamming distance H(A,B) =  |A i – B i | (mutual overlap) A: 0011101110110111 B: 1101110101111001 i=1i=1 n n  Probability that at least one fragment will be shared by chance between clones A and B: p = 1- (1- 1/t) m (t=L/2R - number of bins on gel length L; R - resolution). Distance functions Clones as math vectors: A B Limited fingerpinting resolution  bands shared by chance                  

5 Genome physical mapping problems are computationally challenging “ … We have been looking at the assemblies of large genomes … and for every ‘draft’ genome we look at, we find hundreds - and sometimes thousands - of mis-assemblies ”. Salzberg & Yorke (2005) Beware of mis-assembled genomes. Bioinformatics, 21: 4320-4322

6 Bioinformatics and Human Factors  Reading the scores  Clustering (contig assembly)  Ordering the clusters  Merging contigs  Anchoring (getting genetic and physical maps together)  Verification of mapping results (at each stage) Which factors may affect the quality of physical map ? Where bioinformatics can help ?

7 “Mapping” means “positioning” based on some distance The major mapping steps Fingerprinted clones, C k k=1,…, 100000 Distances d ij for (C i, C j ) shared bands Clustering (high stringency) Ordering (high stringency) Merging (lower stringency) Anchoring and verification

8 P-value of clone overlaps Sulston score (Sulston et al., 1988): p = 1-(1-1/N) n(c2) is the probability of random incidence of two bands; n(c) – number of bands in clone c; N – total number of distinguishable bands

9 Approximation of the exact model of random clone overlap IoE approximation Wendl’s exact theory (J. Com. Biol. 2005, 12: 283-297)

10 Band abundances: Unexploited source to improve mapping quality 3B

11 Varying cutoff: increasing rather than decreasing stringency 1100 244 protected clusters Adaptive Clustering

12 Network representation of signific ant clone overlaps vertices correspond to clones and edges – to significant clone overlaps

13 clones clones from the selected diametric path (MTP) wheat 1B Network representation o f significant clone overlaps 13

14 Identification of putative Q-clones and Q-overlaps

15 Identification of contig non-linearity diam Wheat 1BS Ctg13 width Width >1 is diagnostic for a non-linear cluster Using net of significant clone overlaps to find diametric path and calculate width o f the net 15

16 Diametric path: Calculate ranks r j =r j (c i ) for all clones c j relative to clone c i (through significant clone overlaps). Diametric path (  MTP) is the shortes t path through significant clone overla ps connecting clones c i and c j with ma ximal r j (c i ). Width of net: maximal rank relative to diametric path Width >1  non-linear cluster Identification of contig non-linearity 01234567890123456789 16 0 1 2

17 Identification of contig non-linearity Example with Q-clone: 17

18 Using net of significant clone overlaps, for each clo ne c i calculate ranks r ij for all clones c j. Diametric path: for pair of clones with maximal r ij id entify the shortest path through significant clone ov erlaps MTP Width of net: maximal rank relative to diametric path Width >1 is diagnostic for a non-linear cluster PAG-19 2011 Identification of contig non-linearity

19 “Linearization” by removing clones in cluster branching

20 Reducing genome mapping (linear ordering) problems to traveler salesman problem (TSP) Order 1: a b c d e f g h k l m n l 1 Order 2: b a c d e f g h k l m n l 2 ……… Order N: f c m h e a g n k l b d l N n=60 N =60!/2 ~ 3. 10 56 orders The problem How to chose the best (true) order, i.e., the one that gives the map of minimal length? A B C D EF G H … a b c d e f g h … a b c d e f g h i j k

21 Example: A Contig

22 Re-sampling based order verification Excluding parallel clones allows constructing a stable "skeleton" map and specifying coordinates of all clones relative to this map.

23 Testing the FPC contigs by using LTC wheat 1B

24 Testing the FPC contigs by using LTC wheat 1B

25 Wheat 1B: Some of FPC contigs have non-linear to pological structure inconsistent with chromosome li near structure : Q - clones ? Testing the FPC contigs by using LTC

26 Edges represent the significant overlaps (with cutoff e-25 of Sulston score). Increasing the stringency up to 1e-75 does not help here in gettingnon-trivial linearization! Ctg2 FPC contigs with non-linear topology, and even cycles Testing the FPC contigs by using LTC

27 Problematic contigs (simulated maize)

28 1 2 3 4 5 6 40 41 42 43 44 45 46 47 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 39 38 Xuhw258 Xuhiuw264Xuhiuw265 Xuhw259 Xuhw264-5-T7 Xuhw264-3-T7 Xuhw264-5- T7 Yr15 #3 #28 #4 #5 #6 #7 Brachypodium synteny-based markers French clones-based markers 450 Kb ?


Download ppt "Structural genomics includes the genetic mapping, physical mapping and sequencing of entire genomes."

Similar presentations


Ads by Google