Presentation on theme: "Advancing Science with DNA Sequence Maize Missouri 17 chromosome 10 project update Dan Rokhsar 3 October 2006."— Presentation transcript:
Advancing Science with DNA Sequence Maize Missouri 17 chromosome 10 project update Dan Rokhsar 3 October 2006
Advancing Science with DNA Sequence Aims: Plan A Generate and annotate gene space for the ~180 Mbp chromosome 10 of Mo17 using a random shotgun approach from flow-sorted chromosomes. This resource will complement the BAC-by-BAC sequencing of B73, informing our understanding of intra-species variation, from SNPs to chromosomal organization. The project will serve as a pilot R&D study for chromosome-scale random shotgun sequencing of complex genomes
Advancing Science with DNA Sequence Challenges Produce high-quality shotgun library from a single chromosome (year 1) –Apply flow sorting methods to root tip preparations or oat- maize hybrid lines with maize Mo17-10 Assemble shotgun sequences and relevant mapping data to recover non-repetitive and distinguishable repetitive regions (years 1-2) –DuPont Mo17 BAC library, BAC-end sequence –Targeted mapping to link across complex repeats Targeted finishing of gene space from whole- chromosome-shotgun draft (year 2) –Interplay of finishing with annotation
Advancing Science with DNA Sequence Project goals for researchers and breeders Unlimited markers for mapping Nearly complete gene set for Mo17-10 Conserved synteny/chromosome dynamics with sorghum Evolutionary approaches empowered Novel reagents begin to emerge Framework for understanding strain differences
Advancing Science with DNA Sequence Milestones Year 1 –Produce test libraries from mock flow sorted material (JGI) –Produce preliminary flow sorting data for discussion at Advisory Committee meetings (NFCR) –Produce 1-10 micrograms of flow sorted chromosome 10 material (NFCR). –Complete library production (JGI) –Begin shotgun sequencing, with associated data deposition (JGI)
Advancing Science with DNA Sequence Milestones Year 2. –Complete initial shotgun assembly, with associated data deposition (JGI) –Integrate with physical map data from DuPont (JGI) –Complete two rounds of primer walking (SHGC) –Annotate initial draft assembly, with data release (JGI) –Complete subsequent rounds of targeted finishing reactions (SHGC) –Complete physical mapping of markers and release to public repositories (PGML) –Produce final assembly incorporating finishing data (JGI, SHGC) –Publish detailed analyses of Maize Genome Project outcomes (all) –Offer summer course on maize genome data (JGI)
Advancing Science with DNA Sequence Problems at first step First milestone from plan A not met –Flow sorting system is going … –But no significant progress to chromosome flow sorting at preparative scale –Some small-scale root tip chromosome preps have been done, but not ready to scale up –Three months of chromosome preps (~10,000 root tips) would be needed to obtain even a few tenths of micrograms of DNA for first chromosome-specific cloning attempt, outcome not guaranteed –JGI library group would prefer more material for robust shotgun library prep (minimum of several ug); previous chromosome- specific lambda cloning (Arumuganathan) is more forgiving, still gave low coverage (2X) –Attempted to contract to Dolezels group in Czech. but their capacity is taken with wheat BAC preps. Willing to advise. Arumuganathan is now doing human cell sorting, not working with chromosome preps, and cannot take on task.
Advancing Science with DNA Sequence Even in expert hands, purity of chromosome prep is 85-90% Li, Arumuganathan, et al. Flow cytometric sorting of maize chromosome 9 from an oat-maize chromosome addition line. TAG (2001).
Advancing Science with DNA Sequence Proposal for Plan B Continue development of flow sorting chromosome 10, but decouple from sequencing plans in current project Produce ~3/4 X random whole genome shotgun sequence of Mo17 in plasmid and fosmid paired ends (mix TBD) –~3 months to bulk prep DNA, make libraries, do quality control testing/sampling (Jan 2007) –<3 month to schedule and perform production sequencing run (Apr 2007) Note: JGI is not in position to take on significant BAC- based shotgun from B73 project –perhaps a few hundred clones, maybe ~1% of project
Advancing Science with DNA Sequence Alignment of Mo17 gene space with B73 allele ~97% identity Mo17 1 AACCAATTGGCAGCATTATTATTTTGAACAGATAAAAATCACGCCAGGGCGATGGATACT 60 B C C Query 61 CAGCTCAATCACGGAATTCATCCATGAACTTCTCGTGGAACTCCTTGAGCCTGGATACTA 120 Sbjct Query 121 TCGCAGGTATCTTGTCCTCCTGCGGCAGTATCGTGCACCTGAAGTGCCACGTTCCAGGGA 180 Sbjct Query 181 CCTTCA CG--G-T--G-T-C-GC-AAAGCAACGTGTCAGTATCGTGTGCATC 223 Sbjct CGGTGTCG..AA.T.AA.A.C.A..A G Query 224 TGAAGCTTAACGATGCTTTGAAACGGCAGGGACTTCCACaaaaaaaGG-CTTTTGAGATT 282 Sbjct G..G Query 283 ACCCACCTGTCCAAACCCAGAACCGGGGACGACGACGATTCCAGTGGCTTCCAGTAGGCG 342 Sbjct Query 343 TTTTGCGTAGTATGCATCTGGCGCAGTGCCGACTGCTTGGGCAGCTCCAATTGCCTTCTG 402 Sbjct T Query 403 GGGTAAATGAAGGCGTGGGAACAGATACATTGCACCTTCGGCTTTGTTGCATGTAATTCC 462 Sbjct Query 463 TTCTAAACTGTTGAATGCTTCTTCCAAAGCCTGTGACAGAAGAACACGTAACAATAAGAA 522 Sbjct Query 523 GGTGCTTATAAGATTCAGGaaaaaaaa--TCTTTTTTAAAGTTGTTTTGCATATGTTAAC 580 Sbjct GA Query 581 GGACTACTCGACCAGGGGTATAGCTTTTATTCTTGTTTGATATTTCCATATTAGGACTCT 640 Sbjct G In unique genic regions (especially coding sequence), can easily align Mo17 and B73 to detect polymorphism. Cf comparable human-chimp alignments at ~98.5% (putative aminotransferase, Morgante et al.)
Advancing Science with DNA Sequence Likely outcomes of Plan B Align Mo17 shotgun to emerging B73 draft (at quarterly intervals) –Should be easy to recognize allelic variants in non-repetitive (i.e., genic) regions, based on Morgante et al. results. Expect unique coverage of ~40% of B73 sequence. (alternative: MeF, C0t) –In a typical genic locus of 5 kb, conservatively expect ~100 mismatches or indels. Dense markers allows rapid development of multiple markers per gene. (Distribute via Gramene, NCBI) –Repetitive regions within B73 differ by ~90-99%, so identifying allelic repeats will be difficult given ~97% polymorphism (Attempt to localize sisters of unique reads based on B73 map.) –In places where both ends of a clone are alignable, can confirm local colinearity of B73 and Mo17, or identify rearrangements and/or deletions (A la human-chimp comparison, but expect worse) –Mo17 fosmid clones with localized ends will be available for distribution and/or targeted sequencing of loci-of-interest –Potential start towards Mo17 WGS if desirable
Advancing Science with DNA Sequence JGI Sorghum update Sorghum WGS currently at ~7X (in Trace Archive) –mostly small insert plasmids sequenced to date BAC-end and fosmid-end sequences coming by end 2006 –but uniformity of BAC library is in question, may limit assembly Quick and dirty assemblies look good using skeleton of method proposed for maize –~13 kb contigs and ~300 kb scaffolds (N50 #s) at ~5X –considerable scaffolding even without much BAC/fosmid data –recovering ~2/3 of genome is easy even setting aside difficult repeats, as predicted for maize –Expect full 8X assembly (with map integration) ready late Q Quick and dirty annotation: ~42,000 genes in low copy families – plus >100K retrotransposon-ish genes even in easy-to-assemble regions
Advancing Science with DNA Sequence Early peek at Sorghum-rice comparison shows syntenic segments Transversions/synonymous site Loci in syntenic block Sorghum-Rice syntenic segments are of uniform molecular age Comparable to human-chicken divergence Younger than Rice-Rice paralogs (from cereal-specific duplication)
Advancing Science with DNA Sequence Maize divergences (transversions) rice Arabidopsis sugarcane sorghum Maize: 7,960 complete/29,922 partial peptides Sorghum: 5,927 complete/19,681 peptides Sugarcane: 6,566 complete/ 21,850 peptides ~16,000 gene families at base of grasses ~12,000 families defined by rice/arabidopsis/poplar