Presentation is loading. Please wait.

Presentation is loading. Please wait.

A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp. 149-159, 2004.

Similar presentations


Presentation on theme: "A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp. 149-159, 2004."— Presentation transcript:

1 A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp. 149-159, 2004.

2 Sequencing pipeline Random sequencing un-related reads 500-700 base-pairs Assembly un-related contigs 5,000-10,000 base-pairs Scaffolding un-related scaffolds 30,000-50,000 base-pairs Finishing/gap closure completed genomes millions-billions of base- pairs

3 Scaffolding Given a set of non-overlapping contigs order and orient them along a chromosome II IIIIIV I II III IV

4 Clone-mates Clone Insert F R FR III R I F F R I

5 Scaffolder output Sequencing gaps Physical gaps

6 Problems with the data Incorrect sizing of inserts cut from gel – sizing is subjective error increases with size Chimeras (ends belong to different inserts) biological reasons (esp. for large sized inserts) sample tracking (human error) Software must handle a certain error rate.

7 Theoretical abstraction Given a set of entities (reads/contigs) and constraints between them (overlaps/mate pairs) provide a linear/circular embedding that preserves most constraints.

8 Graph representation Nodes: contigs Directed edges: constraints on relative placement of contigs – relative order and relative orientation Embedding: order (coordinate along chromosome) and orientation (strand sampled)

9 Challenges Orientation – node coloring problem (forward/reverse) feasibility – no cycles with odd number of “reversal” edges (blue edges) optimality – remove minimum number of edges such that a solution exists (NP-hard)

10 Challenges Ordering – generate a linear embedding feasibility – lengths of parallel DAG paths are consistent optimality – remove minimum number of edges such that DAG is feasible (NP-hard)

11 The real world Use of scaffolds Analysis – longest unambiguous sub-graphs Finishing – present all “reliable” relationships between contigs Sources of error mis-assemblies sizing errors (increases with library size) chimeras

12 Ambiguous scaffold I II III IIIIII I IIIII I I’ II

13 Hierarchical scaffolding 1. For each contig pair, consolidate all linking data into a single relationship – 2 correct links required

14 Hierarchical scaffolding 2. Use most reliable links to build scaffolds 3. Repeatedly build super-scaffolds based on less reliable linking data

15 Rationale Hierarchical step Problem complexity problem size (#nodes) error rate

16 Linking information Overlaps Mate-pair links Similarity links Physical markers Gene synteny reference genome physical map

17 BAMBOO (BAMBUS) Best effort Attempt Multiple Branches allowed Order, Orient

18 Inputs Set of contigs: names and lengths Groups of contig links: groups correspond to “quality” of links link: relative distance between contig origins relative orientation of contigs Priorities for each group – specify order in which links are considered BEEB min <= dist <= max

19 Outputs XML representation of layout: contig orientations contig position (x-coordinate of contig origin) links used to construct layout Graphical display of the layout uses GraphViz package from AT&T

20 1.0 release XML input not yet supported All scaffold placed in the same output file Only Linux executable released Hacker friendly

21 Current release: 2.33 XML input and more general output module Collection of input modules from common assembly formats Better handling of priority data Repeat masking features More platforms supported and source code released as open source http://amos.sourceforge.net

22 Future enhancements Option to generate un-ambiguous (non- branching) scaffolds Better layout algorithms Specialized drawing tools Interactive browser Represent/handle multiple haplotypes

23 Acknowledgements Dan Kosack Steven Salzberg Martin Shumway Hean Koo Luke Tallon Jessica Vamathevan


Download ppt "A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp. 149-159, 2004."

Similar presentations


Ads by Google