Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.

Slides:



Advertisements
Similar presentations
Plan A Topics? 1.Making a probiotic strain of E.coli that destroys oxalate to help treat kidney stones in collaboration with Dr. Lucent and Dr. VanWert.
Advertisements

Next-generation sequencing
9 Genomics and Beyond Brief Chapter Outline
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
Genome sequencing and assembling
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Sequencing a genome (a) outline the steps involved in sequencing the genome of an organism; (b) outline how gene sequencing allows for genome-wide comparisons.
DNA Technology and Genomics
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Mouse Genome Sequencing
Todd J. Treangen, Steven L. Salzberg
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
Genome mapping. Techniques Used in the Human Genome Project 1.Linkage mapping can be used to locate genes on particular chromosomes and establish the.
Today: Genetic Technology Wrap-up Exam Review Remember: Final Exam is Wednesday, 12/13 at 1 pm!
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
Copyright © 2009 Pearson Education, Inc. Art and Photos in PowerPoint ® Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino Chapter 12.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
DNA TECHNOLOGY AND GENOMICS CHAPTER 20 P
Initial sequencing and analysis of the human genome Averya Johnson Nick Patrick Aaron Lerner Joel Burrill Computer Science 4G October 18, 2005.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Human Genome.
billion-piece genome puzzle
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
De novo assembly validation
Lindsay A. Shearer1, Lorinda K
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
Genomics Chapter 18.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Biotechnology Notes. Biotechnology = the manipulation of living organisms or parts of organisms to make products useful to humans.
Plasmodium falciparum (3D7) - published in Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
VECTORS: TYPES AND CHARACTERISTICS
Selective Breeding Definition: breeding or crossing of organisms with favorable traits –Allows the favorable allele to remain in the population Cats Domestic.
Physical Map and Organization of Arabidopsis thaliana Chromosome 4
Radiation hybrid map of the zebrafish genome
The Molecular Basis of Inheritance
Virginia Commonwealth University
Physical and transcript mapping
Human Genome Project.
Rachel E. Diner UC San Diego, Scripps Institution of Oceanography
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Very important to know the difference between the trees!
Stephen W Scherer, Joseph Cheung  Current Biology 
Finishing the human genome sequence?
The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation John C. Schwartz,
DNA Organization in Chromosomes
Henrik Lantz - NBIS/SciLife/Uppsala University
Lecture 9 Genome Mapping By Ms. Shumaila Azam
Chapter 9 Molecular Genetic Techniques and Genomics
Eukaryotic Chromosomes:
THE ORGANIZATION AND CONTROL OF EUKARYOTIC GENOMES
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from.
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Single-Molecule Sequencing: Towards Clinical Applications
The Right Tool for the Job: Two Platforms for Targeted DNA Sequencing
Sequence the 3 billion base pairs of human
The Content of the Genome
Unit Genomic sequencing
Restriction Fragment Length Polymorphism (RFLP)
9-3 DNA Typing with Tandem Repeats
Alisdair R. Fernie, Jianbing Yan  Molecular Plant 
Presentation transcript:

Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome

Why This Study Is Impressive? Lower sequencing cost + more sophisticated algorithms = more species with genome assemblies The problem ?  These assemblies are fragmented, contain gaps and errors which make downstream applications difficult This study  Uses combined technologies (3) that give the most continuous de novo mammalian assembly 1- Long reads for contig formation 2 – Short reads for consensus validation 3 – Scaffolding by optical and chromatin interaction mapping 400 fold improvement in continuity Only 649 gaps De novo = starting from the beginning Contig = overlapping sequence data Consensus validation = confirmation or corroboration; the declaration of validity. Scaffolds = are composed of contigs and gaps.

How is This Applicable in The Real World? Agriculture  accurate genome reference is essential for plants and animal species Researchers of such organism GMO = is the result of a laboratory process where genes from the DNA of one species are extracted and artificially forced into the genes of an unrelated plant or animal. The foreign genes may come from bacteria, viruses, insects, animals or even humans.

Current Progress of Gene Sequencing Progress has been made in techniques to generate contig regions Finish the genome is the challenge  extremely difficult for repetitive genomes The human genome  Draft in 2001  followed by 3 years curation by 18 institutions Short-read sequencing  inexpensive, yield draft genome assemblies but they are highly fragmented  (hence this paper combining 3 techniques)

Current Progress of Gene Sequencing Repetition in the genome is the biggest challenge in its assembly  leads to gaps Scaffolding technologies  Order and orient the assembly of contigs 8 -12 (used in this paper) Chromosome interaction mapping  identifies long-range chromosome interactions Optical mapping  inexpensive, HD scaffolding data Both methods have limited ability to scaffold small contigs in fragmented short read assemblies Scaffolds = are composed of contigs and gaps.

Current Progress of Gene Sequencing Single-molecule sequencing  produces reads of 10’s kb but has high error rate The Pacific Biosciences sequencing platform  produces reads at an average of 14 kb (peak over 60 kb)  used to construct bacterial and continuous eukaryotic genomes Combination of long –read sequencing + long-range scaffolding = most efficient way to produce near-complete genome reference assemblies

- Combination of long –read sequencing + long-range scaffolding = most efficient way to produce near-complete genome reference assemblies

Online Methods Listed Animals  Under IACUC-approved protocol and other federal regulations Reference individual selection  DNA panel composed of 96 US goats assembled to find most homozygous goat  Determined by raw count of homozygous methods Genome sequencing, analysis and sequencing Conflict resolutions  To resolve misassembles in prior steps Assembly polishing and contaminant identification Assembly annotation Gap resolution and repeat analysis Centromeric and telomeric repeat analysis Fosmid end sequencing and analysis Statistical analysis Code availability Data availability - Detailed and above undergraduate understanding, but students may refer to the online methods if they recognize these methods / steps listed and would like to know more

RR Genome sequencing, analysis and sequencing Conflict resolutions  To resolve misassembles in prior steps Gap resolution and repeat analysis

Results Adult male goat (San Clemente breed) sequenced Goat had high degree of homozygosity to minimize heterozygous alleles to simplify the genome assembly Long-read single-molecule sequencing High fidelity short-read sequencing Optical mapping (scaffolding tech) Chromatin interaction mapping (scaffolding map) Stepwise assembly of this complementary data as observed in table 1

Stepwise assembly of complementary data Validated with statistical methods

Research Limitations RH mapping used to maximize the accuracy of the final reference assembly Corrected 21 inversions  consisting of 83 scaffolds Corrected 4 misplacements before final gap filling ARS1  Final assembly After error correction and validation, ARS 1 contains 4 discrepancies with the RH map  needs further research to fix these (figure 3) ARS1 compares favorably with the human genome ! RH map = radiation hybrid mapping  technique used to map mammalian chromosomes Uses x-ray breakage of chromosomes to determine the distances between DNA markers as well as their order on the chromosomes ARS 1 = autonomously replicating sequence 1 ARS1 Human Genome Scaffolds 31 24 Gaps 649 832

-R

Implications Paper presents near-finished reference genome for the domestic goat using: Long-read single-molecule sequencing High fidelity short-read sequencing Optical mapping (scaffolding tech) Chromatin interaction mapping (scaffolding map) Unlike cattle that come from two different subspecies, dometic goats appear to come from one single ancestor  bezoar33 This new assembly strategy is superior in accuracy and cost effectiveness compared to the past Provides new standard reference for ruminant genetics Creation of the reference goat genome could mean easier identification of adaptive variants in the sequence data of descendent breeds

Adult male San Clemente breed Unlike cattle that come from two different subspecies, dometic goats appear to come from one single ancestor  bezoar33 This new assembly strategy is superior in accuracy and cost effectiveness compared to the past

Discussion Long-read sequencing  improved mammalian genome assemblies Complex genomic regions continue to interfere with the complete assembly Current long-read technologies  still falling short Cannot regularly produce completely assembled chromosomes Scaffolding technologies  Must be reliable and affordable  becomes important to generate HD, finished reference genome because current long-read is not enough This is why this paper combined all three methods ! Demonstrated that optical + chromatin interaction mapping are complementary and useful in conjunction with long-read assemblies

Discussion Methods of this study reduced the cost of genome finishing It would cause around $100,000 to perform a similar genome assembly using current PacBio RS II and the scaffolding techniques used in this study 3X cost of a short-read assembly but would provide unparallel gain in continuity and quality of the genome assembly From this study, it is expected that these methods will allow the de novo assembly of many vertebrate species without compromising the quality ! - The Pacific Biosciences sequencing platform  produces reads at an average of 14 kb (peak over 60 kb)  used to construct bacterial and continuous eukaryotic genomes

Questions?

References Bickhart, Derek M. et al. Single-Molecule Sequencing and Chromatin Conformation Capture Enable De Novo Reference Assembly of the Domestic Goat Genome. Nature Genetics , 6 Mar. 2017, www.nature.com/ng/journal/v49/n4/full/ng.3802.html.