Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.

Slides:



Advertisements
Similar presentations
Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.
Advertisements

Advancing Science with DNA Sequence Maize Missouri 17 chromosome 10 project update Dan Rokhsar 3 October 2006.
Sequencing the Maize Genome Maize Genome Sequencing Consortium
Maize Production Sequencing
Accurate Assembly of Maize BACs Patrick S. Schnable Srinivas Aluru Iowa State University.
Maize Genetics, Genomics, Bioinformatics workshop
1 1  1 =.
1  1 =.
State-of-the-art France GBF-Toulouse Sequencing Team BAC selection and Finishing Murielle Philippot Pierre Frasse Genome Assembly Vincent Cahais Sana Hakim.
High throughput sequencing Barbera van Schaik
Schulich School of Medicine & Dentistry The University of Western Ontario London Regional Genomics Centre Next Generation Sequencing Meeting April 1, 2010.
Least Common Multiples and Greatest Common Factors
Benjamin Banneker Charter Academy of Technology Making AYP Benjamin Banneker Charter Academy of Technology Making AYP.
Introduction 1.Ordering of P. knowlesi contigs v P. falciparum methodology progress/status towards a synteny map – ‘true’ scaffold 2. Gene prediction generating.
Number bonds to 10,
1 / 30 Data Mining with BioMart
Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
Huong Le Department of Molecular & Clinical Genetics, Royal Prince Alfred Hospital Click mouse to move to the next slide.
Celera Assembler Arthur L. Delcher Senior Research Scientist CBCB University of Maryland.
Transcriptome Sequencing with Reference
The IWGSC: Building the sequence-based foundation for accelerated wheat breeding Kellye A. Eversole IWGSC Executive Director & The IWGSC Cereals for Food,
9 Genomics and Beyond Brief Chapter Outline
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
SNP Discovery in the Human Genome C244/144 November 21, 2005.
CSE182-L12 Gene Finding.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Informatics for next-generation sequence analysis – SNP calling Gabor T. Marth Boston College Biology Department PSB 2008 January
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Next generation sequencing Xusheng Wang 4/29/2010.
De-novo Assembly Day 4.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Mouse Genome Sequencing
Todd J. Treangen, Steven L. Salzberg
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
The Changing Face of Sequencing
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Spliced Transcripts Alignment & Reconstruction
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
UK NGS Sequencing Update July 2009 Dr Gerard Bishop - Division of Biology Dr Sarah Butcher – Centre for Bioinformatics.
1.Data production 2.General outline of assembly strategy.
Human Genome.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
Accessing and visualizing genomics data
Welcome to the combined BLAST and Genome Browser Tutorial.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Virginia Commonwealth University
Short Read Sequencing Analysis Workshop
Genome sequence assembly
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Very important to know the difference between the trees!
CSE182-L12 Gene Finding.
Next-generation DNA sequencing
Sequence the 3 billion base pairs of human
Human Genome Project Seminal achievement. Scientific milestone.
Presentation transcript:

Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of ~12X Include ~3kb paired-end sequencing (for short-range structural variation) Ultra-short-read Solexa or ABI-SOLID (for polishing) Preparation of methyl-spanning linkers to augment IBM map integration, detect rearrangements (Sanger end-sequence) (Ideally would add Mo17 BAC-ends from DuPont, if available)

Shotgun Independent of tiling path -Can detect non-repetitive gene space even within otherwise complex regions that may not be in tiling path Disadvantages of short-reads -Cant expect to recover repetitive sequences

Four Phases of Sequencing Complete in 2007 Sequencing contract established with 454/Roche. Four Phases, including collaborative runs at no cost in P2-4. Phase I underway (30 FLX runs.) Library QC and initial assessment of data quality (30 FLX runs). 10 FLX runs totaling 1 Gb (~0.4X) 20 FLX pair runs spanning 12 Gb (~5X span in 3kb inserts) Assess quality, coverage, contamination, chimerism, accuracy Phase II. (80 runs plus 30 runs from Roche, total 110 runs). Rough draft stage. 40 FLX-pair runs spanning 36 Gb (total 48 Gb~10X span) 70 FLX runs for 7 Gb (total 8Gb ~3.5X sequence) Assess rough draft assembly (3 methods), compare B73, sorghum

Phases III and IV Phase III (50 runs + 20 contributed) –20 FLX-pair runs (total spanning cover ~20X) –50 FLX runs (total 13 Gb sequence ~5.5X) –Draft assembly. Rough annnotation. Assessment of structural variation based on 20X clone cover. Assessment complete by end of Phase IV (60 runs + 30 contributed) –90 FLX runs (to reach total 22 Gb ~10X) –Data collection complete by end of –Early 08. Final assembly. Integration with MSSL ends and IBM map. Proceed to annotation and full analysis. Note: Later phases may use next FLX release with longer read lengths. To be conservative, sequence from FLX-pair reads not included in sequence coverage estimates. Total sequencing cost for Phase I-IV: $1.6M

454-FLX reads are typically either mostly masked, or mostly clean Percent masked by over-repd 16mers ~29% of reads have < quarter of positions masked ~58% of reads have > 2/3 of positions masked

Mo unique full length alignments vs. B73 MAGIs show high quality of unique alignments Residual repeats in MAGIs with multiple hits in 454 data Unique full alignments

SNPs and indels of 454 reads relative to MAGIs consistent with few % variation of Mo17/B73 (combines variation with sequencing errors) SNPs or indels per base Frequency of reads

Multiple assembly alternate plans Divide and conquer –Reduce ~100 million reads to ~50K unique gene spaces of ~thousands of reads each (~10kb) by clustering based on various comparisons Plan A: De novo clustering of masked reads Plan B: map to B73, assemble (de novo for remainder) Plan C: sorghum-assisted –Use various assemblers to lay-out and produce consensus for each cluster (454 assembly team engaged) –Polish sequence with Solexa or SOLID for accuracy –Link with MSSL pairs, integrate with map

Backup analyses vs. B73 reference SNP/variation detection by alignment to B73 sequence -454/Solexa/Solid (various successful models in other species at JGI, elsewhere) Structural variation detection via paired-end placements -Needs to be tolerant of chimerism rate -Model of successful human structural analysis done with 454 (unpublished)

Timeline Phase I in progress, complete by end of month. Analysis to OK phase II ~10 days. Phase II: October Phase III: November Phase IV: December 454 sequencing complete by end of year

~58% of each BAC is masked by over-represented 16-mers

Outreach Dick McCombie

Types of Outreach Public presentations Collaborations CSHL DNA Learning Center

Public Presentations

Collaborations –The Maize Genetics and Genomics Database. --Letter for Carolyn Lawrence-MaizeGDB –MaizeGDB-web site text, links to data –Gramene –EBI Ensembl –Affymetrix Maize Pilot Expression Array Project –Optical map –TWINSCAN –Vmatch –Full-Length cDNA Project

CSHL DNA Learning Center