1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang, and C. Robin Buell Supported by the AFRI Plant Breeding, Genetics, and Genomics Program of USDA’s National Institute of Food and Agriculture
2 Questions International Sol Project: How can a common set of genes/proteins give rise to such a wide range of morphologically and ecologically distinct organisms? SolCAP: How can variation be harnessed to improve varieties that benefit the consumer, processors, and the environment? Sequence data available to address these questions: S. phureja draft genome sequence S. tuberosum, S. lycopersicum, S. pimpinellifolium GAII transcriptomes Technology Next Generation Sequencing SNP genotyping
3 What comparisons do we want to make? How well do S. tuberosum expressed sequences align to S. phureja genomic sequences? How well do S. lycopersicum expressed sequences align to S. phureja genomic sequences? How is variation distributed within a Species? within a market class? within a variety? within a gene? Which sequence variation is important to phenotypic variation?
Library creation/QC GAII sequencing (single and paired end) Data Collection Assembly Analysis: transcriptome complexity SNP calling/validation identification of genes under selection
SampleTotal Clusters Total PE Reads PF Passed Clusters % PF Passed Clusters Total PE PF Reads Actual PE Reads Atlantic 17,601,27715,202,5546,382, ,765,496 Atlantic 210,544,54221,089,0849,252, ,504,33630,185,186 Premier 17,812,39415,624,7886,652, ,304,242 Premier 211,678,37923,356,7589,999, ,999,85231,949,096 Snowden 17,996,41815,992,8366,837, ,675,106 Snowden 211,781,67123,563,34210,393, ,786,64433,288,120 Illumina GA II Output for Potato
Velvet Assemblies of Potato Illumina Sequences With a minimum kmer of 31 and a minimum contig length of 150bp: Variety Total Gb Transcriptome Size (Mb) No. ContigsN50 (bp) Maximum Contig (Kb) Atlantic Premier Snowden
Velvet Assemblies of Potato Illumina Sequences Atlantic: contigs align with GMAP(95%id, 50%cov) align with GMAP(95%id, 90%cov) Premier: contigs align with GMAP (95%id, 50%cov) align with GMAP (95%id, 90%cov) Snowden: contigs align with GMAP (95%id, 50%cov) align with GMAP (95%id, 90%cov) Alignment of the S. tuberosum GAII-transcriptome contigs to the PGSC draft genome sequence from S. phureja :
Tomato Illumina GA II Output Variety Insert Size Read LengthTotal ReadsPF Reads%PF PassedTotal PF FL /4722,491,30420,685, FL ,025,97614,382, FL ,645,16413,985, ,053,794 NC /6127,079,94622,687, NC ,058,43110,366, NC ,401,24012,687, ,539,617 OH /4726,960,89824,874, OH ,316,7759,671, OH ,676,81412,879, ,954,487 T535061/4726,799,94424,677, T ,822,63914,738, T ,726,25713,744, ,348,840 PI /4717,721,22616,422, PI ,115,34914,902, PI ,890,64915,248, ,727,224 PI /4717,631,90616,450, PI ,238,17915,354, PI ,829,62218,500, ,699,707
Variety Total Gb Transcriptome Size (Mb) No. ContigsN50 (bp) Maximum Contig (Kb) FL , NC , OH , T , PI , PI , Velvet Assemblies of Tomato Illumina Sequences With a k-mer length of 31 and a minimum contig length of 150bp:
Sequence quality: Viewing an Atlantic potato contig from the Velvet assembly
FL7600 (93.7 % id; 94.4 % coverage) Snowden (97.9; 94.7) Alignment of contigs relative to S. phureja
QuerySNPsFiltered SNPs Atlantic Asm Premier Asm Snowden Asm Identify intra-varietal SNPs A/C SNP
Filtered SNP counts RefQuery d 10 d 20 d 30 d 40 d 50 d 60 d 100 atlantic atlanticpremier atlanticsnowden premieratlantic premier premiersnowden snowdenatlantic snowdenpremier snowden Filtering on SNP quality and 1 SNP/ 150bp window
Genotyping platforms…. Comments on quality control… Data…. direct comparison of sequence analysis of SNPs across populations
COS R-gene Comparison of two genes on tomato chromosome 9 BAC
COSII Fresh Market vs Fresh Market Identities = 573/573 (100%), Gaps = 0/573 (0%) Fresh Market vs Processing Identities = 569/569 (100%), Gaps = 0/569 (0%) S. lycopersicum vs S. pimpinellifolium Identities = 339/341 (99%), Gaps = 0/341 (0%) Potato vs Potato Identities = 606/612 (99%), Gaps = 0/612 (0%) Tomato vs Potato Identities = 914/948 (96%), Gaps = 6/948 (0%)
DIVERGED SEQUENCE Fresh Market vs Fresh Market Identities = 959/959 (100%), Gaps = 0/959 (0%) Fresh Market vs Processing Identities=1560/1560(100%), Gaps=0/1560 (0%) S. lycopersicum vs S. pimpinellifolium Identities = 612/613 (99%), Gaps = 0/613 (0%) Tomato vs Potato Identities = 223/280 (79%), Gaps = 11/280 (3%) Potato vs Potato Identities = 246/278 (88%), Gaps = 7/278 (2%)
What patterns do we expect to see for genes “under selection”? Low Variation (fixed) High Ka/Ks (mutations affect protein, possible diversifying selection) Mutations (loss of function) F ST (genes that distinguish populations)
All 173 markers (K=6) 89 Coding markers (K=5) 84 Non-coding markers (K=6) ProcessingFresh-marketVintageLandrace 500K burnin/750K MCMC reps, 20 runs for each K from 3 to 8 Population structure: coding vs. non-coding CA & OHOH CA OH CN
Distribution of F ST for genes ovate: 0 fw2.2: 0 sp6: 0.14 ovate: 0.26 fw2.2: 0 sp6: 0.73 ovate: 0.31 fw2.2: 0 sp6: 0.47 ovate: 0 fw2.2: 0.5 sp6: 1 ovate: 0 fw2.2: 0.42 sp6: 0.74 ovate: 0.14 fw2.2: 0.46 sp6: 0.05
Examples of highly polymorphic genes within S. lycopersicum Note: I am working on a replacement that compares Ka/Ks for selected tomato and potato genes
Examples of highly polymorphic genes within S. lycopersicum Note: I am working on a replacement that compares Ka/Ks for selected tomato and potato genes
Processing Fresh Market Vintage Wild Distribution of PM genes across populations is not random
Conclusions ~5.7 Gb PF potato transcriptome sequence (3 varieties) ~14.3 Gb PF tomato transcriptome sequence (6 varieties) S. phureja draft genome is an excellent scaffold for potato and tomato GAII transcriptome alignments SNPs are not evenly distributed in genes Genes with signatures of selection (Ka/Ks; high F ST ) tend to be genes associated with response to abiotic and biotic stress. Breeders have selected for groups of genes suggesting that co-adapted complexes
Acknowledgments Collaborators, OSU Matt Robbins Sung-Chur Sim Troy Aldrich Collaborators, Cornell Walter de Jong Lucas Mueller Joyce van Eck Collaborators, CAU Wencai Yang Collaborators, CAAS Sanwen Huang Collaborators, UCD Allen Van Deynze Kevin Stoffel Alex Kozic Funding USDA/AFRI Collaborators, MSU David Douches C Robin Buell John Hamilton Kelly Zarka