CUGI Pilot Sequencing/Assembly Projects Christopher Saski.

Slides:



Advertisements
Similar presentations
Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
Advertisements

Maize Genetics, Genomics, Bioinformatics workshop
Introduction 1.Ordering of P. knowlesi contigs v P. falciparum methodology progress/status towards a synteny map – ‘true’ scaffold 2. Gene prediction generating.
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Proprietary Signal Generation and Imaging Photons Generated Reagent Flow PicoTiterPlate Wells Sequencing By Synthesis 1600K field of addressable wells.
Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.
FHI Biotechnology Approaches Genome sequencing Clonal testing Transgenics GE trees New varieties Marker-aided breeding.
Lecture 14 Genome sequencing projects
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Expanding the Tool Kit for BAC Extension Summary of completion criteria developed for NSF Tomato Sequencing Workshop January 14, 2007.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
University of Oklahoma Genome Center4/14/12.
Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.
Bacterial Genome Finishing Using Optical Mapping Dibyendu Kumar, Fahong Yu and William Farmerie Interdisciplinary Center for Biotechnology Research, University.
BioInformatics (2). Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Mouse Genome Sequencing
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genomics BIT 220 Chapter 21.
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Update on Cacao Genome Sequencing Project August 4, 2009 NCGR, Santa Fe, NM.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
Status report on gap closure of the human chromosome 5 BAC map Authentication of C5 BAC maps Map and sequence status Gap status and steps used to close.
The Changing Face of Sequencing
Towards your own genome. Designing your Sequencing Run Sequencing strategy Genome size and genome.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Theobroma cacao Integrated Physical and Genetic Map 2 BAC Libraries 250 Genetic Markers.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Chromosome 2 Doil Choi, Sunghwan Jo KOREA. Cytological architecture of chromosome kb/µm DAPI (4’-6-diamidino-2-phenylindole) stained pachytene chromosome.
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
August 2008Bioinformatics Tools for Comparative Genomics of Vectors1 Genomes Daniel Lawson EBI.
UK NGS Sequencing Update July 2009 Dr Gerard Bishop - Division of Biology Dr Sarah Butcher – Centre for Bioinformatics.
1.Data production 2.General outline of assembly strategy.
Human Genome.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
Center for Integrated Fungal Research
De novo assembly validation
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Genome Analysis Assaad text book slides only Lectures by F. Assaad can be downlaoded from muenchen.de/~farhah/index.htm.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
OPERA highthroughput paired-end sequences Reconstructing optimal genomic scaffolds with.
How to design arrays with Next generation sequencing (NGS) data Lecture 2 Christopher Wheat.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
Virginia Commonwealth University
Human Genome Project.
Cross_genome: Assembly Scaffolding using Cross-species Synteny
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.
How to Build a Horse: Final Report
Discovery tools for human genetic variations
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Presentation transcript:

CUGI Pilot Sequencing/Assembly Projects Christopher Saski

Sequencing the Cacao Genome: 3 Megabases at a Time Pilot project to sequence and assemble 3Mbp segment of cacao genome IBM in silico assembly project – Testing the assembly pipeline

Sequencing the Cacao Genome: 3 Megabases at a Time Combination of: – “Old School Genomics” BAC libraries, physical mapping, and clone-by-clone sequencing – Roche 454 Titanium and FLX De Novo sequencing Key: – Not yet accurately assembled a eukaryotic genome with NGS alone – Reduce assembly complexity

3 Megabase segments Rounsley et al., 2009

Advantages Reduce assembly complexity Limit number of sequencing libraries Prioritize critical genomic regions Outsource BAC pools for sequencing in parallel at any center that has a 454 Titanium/GS-FLX sequencer Flexibility – Start slow with minimal investment – Could redesign strategy to reduce sequence runs

Strategy Components Integrated Physical/Genetic framework Pool development and sequencing: – BAC-end – Titanium 454 (paired/non-paired) – Draft sequence Assembly and integration: – Newbler – Celera (CABOG)

Cacao Integrated Physical/Genetic Framework Represents ~29X coverage (3 BAC libraries) Assembled into small number of large contigs Suggests reasonable levels of heterozygosity Manageable amounts of repetitive sequence 220 anchored genetic markers spanning 10 linkage groups – Resemble recombinational derived order

Pool Development Select contiguous BAC clones from MTP Pools will contain clones – 20-30kb overlap Complete Cacao MTP will require pools Repetitive-type regions: – BAC-end sequence and physical map data predictive tool Modify pools accordingly

Pool Development Estimate contig size using Consensus Band (CB) algorithm Example: Cacao cp genome is 160,604bp – Hybridization revealed cp containing contig and is estimated to be ~160 kb based on CB algorithm. Purified pool DNA can be produced at CUGI – Treat with ATP-dependent Dnase

Sequencing 3 Levels of Sequence: – Paired BAC-end Sequence – 20 kb increments – End sequencing of pool members – 454 sequencing of BAC pools Paired 3.5X-5.1X coverage (Roche 454/FLX) Non-paired 17X-26X coverage (Titanium)

454 Runs—Whole Genome 454 Titanium non-paired – 26X coverage/pool – 4 pools per slide (up to 150 pools total) Up to 38 slide runs 454 FLX paired-end (3kb) – 5X coverage/pool – 16 pools per slide (up to 150 pools total) Up to 10 slide runs total

Assembly/Curation of 3Mbp Segment Preprocessing – Filter reads to remove: Pair-end that did not contain both ends BAC vector E. coli (host DNA) Newbler Assembler (Roche) Celera Assembler (CABOG) – Improvements in homopolymer calls, and heterogeneous read length issues – Recently shown N50 contig size double to Newbler Human (50% repetitive) and microbes

Assembly Curation of 3Mbp Segment Assembly at various depths (5X, 10X, 15X) – Determine optimal sequencing coverage Utilize available data to scaffold contigs: – BAC end sequences every 20kb – Genetic marker sequences – RNA-seq clusters – Arabidopsis – Cacao synteny – Draft Sequence (2X) Augment approach by covering regions missed by clones – assist in selecting MTP

Assembly Curation of 3Mbp Segment Deliverable will be a pseudomolecule sequence for the 3Mbp region – Gaps will be strings of N Assess and employ lab-based gap filling strategies Make every attempt to close gaps

Assembly Validation and Correction In-silico virtual digest of scaffold sequence and compare to physical map restriction fragments – Draft sequence integration (DSI) via FPC Integrate and visualize physical map, 3 Mbp segments, and draft sequence

Sequence/Assembly Pipeline

IBM in silico Sequences IBM will provide a set of sequences that mimic the pilot caco sequences – Input error Indels, homopolymer calls, nucleotide substitutions Simulated data to test pipeline: – Physical map – Simulated BAC end sequences – Simulated pseudo-reads from pooled BACs – EST clusters – Indicate reference species for syntenic comparisons

Pilot Project Budget BAC-end sequencing (30K BACs), 20Kb increments – $206, Assembly/curation/validation of cacao 3Mbp – $16, Assembly of IBM in-silico derived sequences – $15,400.00

ESTIMATED Budget – Whole Genome Assembly Assembly, curation, validation of , 3Mbp segments – $147, Automated structural/functional annotation – $8,800.00

Acknowledgements USDA-ARS Mars Inc. Dr. Alex Feltus Stephen Ficklin Dr. Keith Murphy Dr. Margaret Staton