Presentation is loading. Please wait.

Presentation is loading. Please wait.

Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library.

Similar presentations


Presentation on theme: "Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library."— Presentation transcript:

1 Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library

2 Summary of Project from WTSI Transfer of BAC selection/contig building QC- checking has moved to Imperial College London Call for your help in annotation Overview

3 Clone by Clone Sequencing Strategy Subcloning & Shotgun sequencing Overlapping Clones anchored by mapped markers Minimal tiling path “Finishing” Order contigs Gap closure Sequence Quality Contiguous Sequence < 1 error in 10,000 …….. TAGCTGTGTACGATGATC ………. Mapped Markers Computer assembly – paired plasmid reads BAC Library

4 Sequences Uploaded to SGN BAC Registry Updated Ready for Annotation Plasmid Prep Sequencing & Processing Clone DNA Prep Digest Confirmation Library Construction (plasmid) Clone Selection and Verification Clones entered into pipeline Overview of Clone Pipeline Shotgun Sequencing Finished Sequence Final EMBL submission “Complete Sequence” HTGS Phase 3 Mapping Subcloning Sequence Contigs >2Kb available on Sanger FTP site and Public Databases “Sequencing in Progress” BACs assigned to chr4 sequencing project on SGN BAC registry Sequence Improvement Contig Orientation and Gap Closure Confirmation of Assemby (QC) Finishing HTGS Phase 1 HTGS Phase 2

5 BAC Library & Map Resources Library No. of clones Average Insert Genome equivalents Fingerprints End Sequenced ? LE_HBa129,024117 kb15 X 10x (88,000 AGI ) Yes (188,130) SL_MboI52,992135 kb7 X 5x (43,000 WTSI) Yes (112,507) SL_EcoRI72,26495-100 kb7 X- Yes (101,375) Tomato EXPEN-2000 map - 2585 mapped markers across genome - 242 Chr4 mapped markers - Overgo analysis at Cornell BAC Libraries FPC Map Construction 1st FPC map build of HindIII Library by Arizona Genomics Institute 2nd FPC map build incorporated MboI Library mid 2006 at WTSI

6 Fosmid Library End Sequencing 150 plates (1-150) End Sequenced at WTSI December 2007 Approx. ~57600 fosmids (115200 FES) Fosmid Analysis (January 2008) 107681 reads with total bp count of 70900576 giving average length = 658.4bp (after quality and vector clipping) (60.3% bases repeat masked) Hits within existing chromosome 4 BACs: 380 fosmids with good read pair alignments (within expected size range) 31 fosmids with bad read pair alignments Hits to single end 48 single ends - from fosmids with only 1 end sequenced 1027 single ends - from fosmids where only 1 end is found

7 Selection of Minimum Tile Path Fingerprinted BACs Markers Overlaps identified by fpc and BES alignment Seed BAC Anchored by marker (Cornell) Framework Markers in FPC Verify overlaps by colony pcr Anchor further BACs by hybridisation to marker sequences and FISH

8 Increasing Map Coverage using PseudoGoldenPath (PGP) Analysis MAP GAP Bridging clones identified from BES alignments to sequence Sequenced clones

9 FISH Map for Chr 4 on SGN FISH is used: to confirm BAC assignment to chr 4 to confirm contig order along chr 4 Steve Stack Dora Szinay, Hans DeJong

10 FISH Map for Chr 4 on SGN FISH is used: to confirm BAC assignment to chr 4 to confirm contig order along chr 4

11 WTSI Tomato Clone Pipeline 2006-2008 Number of BACS Pipeline Stage Dec 2006 Dec 2007Dec 2008 Subcloning3517 Shotgun143 Assembly Start18 Auto-prefinishing168 Finishing910 8 QC Checking12 Finished1886174 Total94134182 Phase 3 Phase 1 Phase 2 HTGS:

12 Chr 4 Map and Sequence Update  Chromosome 4 estimate : 19 Mb of euchromatin  80 contigs with sequence December 2006 December 2007 December 2008 Total sequence5,007,106 bp12,590,598 bp19,018,752 bp Unique sequence 4,860,935 bp11,789,635 bp18,778,752 bp Total Finished Length 1,963,352 bp9,211,278 bp18,056,067 bp

13 Distribution of Contigs Centromere = Euchromatin = Heterochromatin {62 markers}{41 markers} {124 markers} 29 contigs11 contigs 36 contigs 55 BACs (27 markers) 59 BACs (16 markers) 63 BACs (61 markers) Average Contig Length = 250Kb Average BACs/Contig = 2.3 Largest Contigs = ~450-500Kb 227 markers mapped to Chr_4 Unordered: 4 contigs (5 BACs)

14 Some facts and figures ~81 contigs (80 contigs with sequence available). Average contig length is just under 250 kb. The average number of BACs per contig is 2.3. The largest sequence contigs are in the range of 450kb-500kb with 5 or 6 BACs.

15 Summary of Progress on Chromosome 4 81 map contigs have been built 119 BACs/44 contigs definitely on chr4 in FISH/ IL mapped 57 BACs under confirmation of Chr4 location (28 on SGN 29 to be placed after confirmed location) ~60 Markers for which BACs have not been identified. ~13 BACs have been sequenced to HTGS3 and placed on chr0, definitely not on chr4 (others initiated, in same contig etc but stopped in pipeline). 22 Missing markers missing sequence?

16 Summary of what we will do next 1) Confirm chr4 location of BACs that lack chr4 marker sequence and or have conflicting map location. IL mapping. 2) Use missing marker sequences to identify further BACs (3D pools) and confirm chr4 location using IL mapping. 3) Use 3D BAC pools to identify BACs to extend current contigs. 4) Analyse output from GS-FLX and GA-Illumina sequencing runs on cDNA from chr4 IL and parental lines to identify SNPs and further chr4 markers. 5) Use any markers from (4) to isolate further BACs for sequencing.

17 Analysis of cDNA sequence to identify Chr4 specific sequences 454 (GS-FLX)Illumina (GA) Chr4 IL lines22.11 MB0.45 GB LA716 pennellii17.25 MB1.9 GB Chr4 sub line8.47 MB1.5 GB Heinz cDNA 4.02 MB1 GB 51.85 MB IL4-10.5 GB IL4-2 0.6 GB IL4-30.19 GB IL4-40.44 GB 6.9 GB

18 Call for your help Need your help in checking and verifying the automated annotation. Please respond to e-mails in 2009 calling for help in annotation your favourite genes.

19 Acknowledgements Wellcome Trust Sanger Institute: Carol Churcher Jane Rogers Sean Humphray Clare Riddle and Mapping Core Group Karen McLaren and Finishing Team 46 Stuart McLaren and Pre-finishing Team 58 Christine Lloyd and QC Team 57 Karen Oliver Matt Jones Carol Scott Imperial College London: Gerard Bishop Daniel Buchan James Abbott Sarah Butcher Rosa Lopez-Cobollo University of Nottingham: Graham Seymour Scottish Crop Research Institute: Glenn Bryan Cornell University: Lukas Mueller Jim Giovannoni MIPS/IBI Institute for Bioinformatics: Klaus Mayer Remy Bruggmann FISH Resources Stephen Stack Group (Colorado) Hans de Jong (Wageningen) Dora Szinay (Wageningen) FUNDING


Download ppt "Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library."

Similar presentations


Ads by Google