1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,

Slides:



Advertisements
Similar presentations
Lecture 2 Strachan and Read Chapter 13
Advertisements

Polymorphisms: Clinical Implications By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of Medicine, KSU.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Structural and Functional Genomics of Tomato Barone et al Tomato (Solanum Lycopersicon) – economically important crop worldwide, – intensively investigated.
SolCAP Solanaceae Coordinated Agricultural Project SNP Development for Elite Potato Germplasm David Douches Walter De Jong Robin Buell David Francis John.
Sequencing Status of the Chromosome 8 and New Marker Development toward a Genetic Map Construction between Micro-Tom and Ailsa Craig SOL Genomics Workshop.
SolCAP Solanaceae Coordinated Agricultural Project Dedicated to the Improvement of Potato and Tomato Executive Commitee : David Douches Walter De Jong.
Some new sequencing technologies. Molecular Inversion Probes.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
SolCAP Solanaceae Coordinated Agricultural Project What is SolCAP? The SolCAP project links together people from public institutions, private institutions.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
SOL Genomics Network Formed in 2003 to answer two questions: – How can a common set of genes give rise to such a wide range of morphologically and ecologically.
ARC Biotechnology Platform: Sequencing for Game Genomics Dr Jasper Rees
What is SGN? S GN is a rapidly evolving comparative resource for the plants of the Solanaceae family, which includes important crop and model plants such.
Solanum lycopersicum Chromosome 4 Sequencing Update SOL Germany– October 2008 Wellcome Trust Medical Photographic Library.
The New Zealand Institute for Plant & Food Research Limited Potato Genome Sequencing Consortium, notes from the edge Dr Susan Thomson, Dr Mark Fiers, Dr.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Bioinformatics and Sequencing Relevant to SolCAP
“Recent next generation sequencing results” MACHADO LAB.
APPLICATION OF MOLECULAR MARKERS FOR CHARACTERIZATION OF LATVIAN CROP PLANTS Nils Rostoks University of Latvia Vienošanās Nr. 2009/0218/1DP/ /09/APIA/VIAA/099.
Abstract To understand the population dynamics in rice we conducted detailed sequence study in 32 accessions of rice (10 japonica, 7 indica, 10 Asiatic.
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library.
The use of complex populations in breeding with markers SBC “Breeding with molecular markers” David Francis Contact:
Development and Application of SNP markers in Genome of shrimp (Fenneropenaeus chinensis) Jianyong Zhang Marine Biology.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
© 2010 by The Samuel Roberts Noble Foundation, Inc. 1 The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA 2 National Center.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Comparative analyses of the potato and tomato transcriptomes
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Marker Assisted Selection in Tomato Pathway approach for candidate gene identification and introduction to metabolic pathway databases. Identification.
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
Software and Databases for managing and selecting molecular markers General introduction Pathway approach for candidate gene identification and introduction.
CASE7——RAD-seq for Grape genetic map construction
Genomics Chapter 18.
The Wellcome Trust Sanger Institute
Chapter 12 Assessment How could manipulating DNA be beneficial?
BLAST Sequences queried against the nr or grass databases. GO ANALYSIS Contigs classified based on homology to known plant or fungal genes Next.
Accessing and visualizing genomics data
SNP Discovery in Whole-Genome Light-Shotgun 454 Pyrosequences Aaron Quinlan 1, Andrew Clark 2, Elaine Mardis 3, Gabor Marth 1 (1) Department of Biology,
Notes: Human Genome (Right side page)
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Sequencing and Assembly of the WheatD Genome using BAC Pools A Preliminary Study Daniela Puiu Sept 23rd 2013.
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
Risheng Chen et al BMC Genomics
Short Read Sequencing Analysis Workshop
MOLECULAR MARKERS.
Cross_genome: Assembly Scaffolding using Cross-species Synteny
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Denovo genome assembly of Moniliophthora roreri
Professors: Dr. Gribskov and Dr. Weil
Very important to know the difference between the trees!
Relationship between Genotype and Phenotype
Evolution of Biodiversity
Barley (Hordeum vulgare subsp. vulgare)
Sequence the 3 billion base pairs of human
(A) Scale map of connections between contigs in Ver_v2 suggested by the alignment of paired-end Illumina reads (insert size, ~300 bp). (A) Scale map of.
Presentation transcript:

1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang, and C. Robin Buell Supported by the AFRI Plant Breeding, Genetics, and Genomics Program of USDA’s National Institute of Food and Agriculture

2 Questions International Sol Project: How can a common set of genes/proteins give rise to such a wide range of morphologically and ecologically distinct organisms? SolCAP: How can variation be harnessed to improve varieties that benefit the consumer, processors, and the environment? Sequence data available to address these questions: S. phureja draft genome sequence S. tuberosum, S. lycopersicum, S. pimpinellifolium GAII transcriptomes Technology Next Generation Sequencing SNP genotyping

3 What comparisons do we want to make? How well do S. tuberosum expressed sequences align to S. phureja genomic sequences? How well do S. lycopersicum expressed sequences align to S. phureja genomic sequences? How is variation distributed within a Species? within a market class? within a variety? within a gene? Which sequence variation is important to phenotypic variation?

Library creation/QC GAII sequencing (single and paired end) Data Collection Assembly Analysis: transcriptome complexity SNP calling/validation identification of genes under selection

SampleTotal Clusters Total PE Reads PF Passed Clusters % PF Passed Clusters Total PE PF Reads Actual PE Reads Atlantic 17,601,27715,202,5546,382, ,765,496 Atlantic 210,544,54221,089,0849,252, ,504,33630,185,186 Premier 17,812,39415,624,7886,652, ,304,242 Premier 211,678,37923,356,7589,999, ,999,85231,949,096 Snowden 17,996,41815,992,8366,837, ,675,106 Snowden 211,781,67123,563,34210,393, ,786,64433,288,120 Illumina GA II Output for Potato

Velvet Assemblies of Potato Illumina Sequences With a minimum kmer of 31 and a minimum contig length of 150bp: Variety Total Gb Transcriptome Size (Mb) No. ContigsN50 (bp) Maximum Contig (Kb) Atlantic Premier Snowden

Velvet Assemblies of Potato Illumina Sequences Atlantic: contigs align with GMAP(95%id, 50%cov) align with GMAP(95%id, 90%cov) Premier: contigs align with GMAP (95%id, 50%cov) align with GMAP (95%id, 90%cov) Snowden: contigs align with GMAP (95%id, 50%cov) align with GMAP (95%id, 90%cov) Alignment of the S. tuberosum GAII-transcriptome contigs to the PGSC draft genome sequence from S. phureja :

Tomato Illumina GA II Output Variety Insert Size Read LengthTotal ReadsPF Reads%PF PassedTotal PF FL /4722,491,30420,685, FL ,025,97614,382, FL ,645,16413,985, ,053,794 NC /6127,079,94622,687, NC ,058,43110,366, NC ,401,24012,687, ,539,617 OH /4726,960,89824,874, OH ,316,7759,671, OH ,676,81412,879, ,954,487 T535061/4726,799,94424,677, T ,822,63914,738, T ,726,25713,744, ,348,840 PI /4717,721,22616,422, PI ,115,34914,902, PI ,890,64915,248, ,727,224 PI /4717,631,90616,450, PI ,238,17915,354, PI ,829,62218,500, ,699,707

Variety Total Gb Transcriptome Size (Mb) No. ContigsN50 (bp) Maximum Contig (Kb) FL , NC , OH , T , PI , PI , Velvet Assemblies of Tomato Illumina Sequences With a k-mer length of 31 and a minimum contig length of 150bp:

Sequence quality: Viewing an Atlantic potato contig from the Velvet assembly

FL7600 (93.7 % id; 94.4 % coverage) Snowden (97.9; 94.7) Alignment of contigs relative to S. phureja

QuerySNPsFiltered SNPs Atlantic Asm Premier Asm Snowden Asm Identify intra-varietal SNPs A/C SNP

Filtered SNP counts RefQuery d 10 d 20 d 30 d 40 d 50 d 60 d 100 atlantic atlanticpremier atlanticsnowden premieratlantic premier premiersnowden snowdenatlantic snowdenpremier snowden Filtering on SNP quality and 1 SNP/ 150bp window

Genotyping platforms…. Comments on quality control… Data…. direct comparison of sequence analysis of SNPs across populations

COS R-gene Comparison of two genes on tomato chromosome 9 BAC

COSII Fresh Market vs Fresh Market Identities = 573/573 (100%), Gaps = 0/573 (0%) Fresh Market vs Processing Identities = 569/569 (100%), Gaps = 0/569 (0%) S. lycopersicum vs S. pimpinellifolium Identities = 339/341 (99%), Gaps = 0/341 (0%) Potato vs Potato Identities = 606/612 (99%), Gaps = 0/612 (0%) Tomato vs Potato Identities = 914/948 (96%), Gaps = 6/948 (0%)

DIVERGED SEQUENCE Fresh Market vs Fresh Market Identities = 959/959 (100%), Gaps = 0/959 (0%) Fresh Market vs Processing Identities=1560/1560(100%), Gaps=0/1560 (0%) S. lycopersicum vs S. pimpinellifolium Identities = 612/613 (99%), Gaps = 0/613 (0%) Tomato vs Potato Identities = 223/280 (79%), Gaps = 11/280 (3%) Potato vs Potato Identities = 246/278 (88%), Gaps = 7/278 (2%)

What patterns do we expect to see for genes “under selection”? Low Variation (fixed) High Ka/Ks (mutations affect protein, possible diversifying selection) Mutations (loss of function) F ST (genes that distinguish populations)

All 173 markers (K=6) 89 Coding markers (K=5) 84 Non-coding markers (K=6) ProcessingFresh-marketVintageLandrace 500K burnin/750K MCMC reps, 20 runs for each K from 3 to 8 Population structure: coding vs. non-coding CA & OHOH CA OH CN

Distribution of F ST for genes ovate: 0 fw2.2: 0 sp6: 0.14 ovate: 0.26 fw2.2: 0 sp6: 0.73 ovate: 0.31 fw2.2: 0 sp6: 0.47 ovate: 0 fw2.2: 0.5 sp6: 1 ovate: 0 fw2.2: 0.42 sp6: 0.74 ovate: 0.14 fw2.2: 0.46 sp6: 0.05

Examples of highly polymorphic genes within S. lycopersicum Note: I am working on a replacement that compares Ka/Ks for selected tomato and potato genes

Examples of highly polymorphic genes within S. lycopersicum Note: I am working on a replacement that compares Ka/Ks for selected tomato and potato genes

Processing Fresh Market Vintage Wild Distribution of PM genes across populations is not random

Conclusions ~5.7 Gb PF potato transcriptome sequence (3 varieties) ~14.3 Gb PF tomato transcriptome sequence (6 varieties) S. phureja draft genome is an excellent scaffold for potato and tomato GAII transcriptome alignments SNPs are not evenly distributed in genes Genes with signatures of selection (Ka/Ks; high F ST ) tend to be genes associated with response to abiotic and biotic stress. Breeders have selected for groups of genes suggesting that co-adapted complexes

Acknowledgments Collaborators, OSU Matt Robbins Sung-Chur Sim Troy Aldrich Collaborators, Cornell Walter de Jong Lucas Mueller Joyce van Eck Collaborators, CAU Wencai Yang Collaborators, CAAS Sanwen Huang Collaborators, UCD Allen Van Deynze Kevin Stoffel Alex Kozic Funding USDA/AFRI Collaborators, MSU David Douches C Robin Buell John Hamilton Kelly Zarka