Presentation on theme: "BackgroundPopulation Structure Research Questions SCU Biology Artwork by Edward Rooks Island biogeography predicts that populations occupying island-like."— Presentation transcript:
BackgroundPopulation Structure Research Questions SCU Biology Artwork by Edward Rooks Island biogeography predicts that populations occupying island-like habitats near genetic reservoirs will contain higher levels of diversity than more isolated populations (Vellend 2003). Genetic structure within such islands then reflects isolation by distance theory (Wright 1943). Genetic diversity is also predicted to be positively correlated with population size (Leimu et al. 2003). The Zayante Sandhills of Santa Cruz, California, are island-like xeric habitats separated by mesic redwoods and mixed evergreen Julie A. Herman, Khaaliq DeJan, Justen B. Whittall Julie A. Herman, Khaaliq DeJan, Justen B. Whittall Santa Clara University, CA Assessing Genetic Diversity in the Rare Sandhill Endemic Erysimum teretifolium Using Microsatellites and Next-Generation Sequencing Fig. 3. Average probability of group assignments. Pie diagrams depict the average group assignment probabilities in each population for the two genetic clusters identified by Structure for E. teretifolium. A. B. Fig. 1. The Ben Lomond Wallflower. Erysimum teretifolium occupies inland sandhills of Santa Cruz Co. (A) which have been largely destroyed by sand quarrying (B). Conclusions Acknowledgements Cindy Dick, Miranda Melen, & Devin Wakefield at SCU provided invaluable assistance, as well as Inés Casimiro-Soriguer from Universidad Pablo de Olavide Charles Nicolet from USC’s Epigenome Center provided critical assistance with the NGS library preps & sequencing. Jodi McGraw, Ingrid Parker, Val Haley & Terris Kasteen provided essential field assistance. Funding was provided by an SCU ALZA Scholarship to JH and Section VI funds from the California Department of Fish and Wildlife to JW. forests. These unique habitats are home to many endemic plant and animal species, including the Ben Lomond Wallflower (Erysimum teretifolium; Fig. 1A). This naturally patchy habitat is threatened by the sand quarrying industry (Fig. 1B) and residential development. An unknown number of populations of E. teretifolium remain, several of which contain fewer than 100 individuals. Using two distinct methods, microsatellite analysis and Next-Generation sequencing (NGS), this project investigates the distribution of genetic diversity within and among eight extant populations to determine whether E. teretifolium’s island-like habitat influences its genetic distribution and to guide future conservation priorities. Such data will help land managers determine appropriate seed sources for establishing new populations of E. teretifolium. In particular, this project addresses the complexity of analyzing microsatellite data from a hexaploid plant species and discusses whether NGS may provide a viable alternative to estimating genetic diversity in such taxa. Is there discernible population structure in E. teretifolium? Is the distribution of genetic diversity within and among populations consistent with this species’ insular habitat? Do population size or geographic isolation impact genetic diversity within populations? Can NGS complement traditional microsatellite approaches for conservation genetics? Methods Samples were collected from 186 individuals representing 8 populations of E. teretifolium (11-32 individuals per population). DNA was extracted with a NucleoSpin Plant II kit using lysis buffer 1 (Machery & Nagel). PCR amplification was carried out on 3 microsatellite loci (18 total alleles) developed for the European E. mediohispanicum according to the methods of Muñoz-Pajares et al. (2011). Alleles were separated on an ABI3730 with a LIZ600 size standard, and lengths were determined using PeakScanner Software v1.0 (Life Technologies). Due to hexaploidy in E. teretifolium, we could not confidently determine genotypes, so we analyzed the data with the restriction model in Structure (Pritchard et al. 2000). A range of population clusters (k = 1-10) were tested using location priors and allowing for admixture (ngen=10 6, 5 replicates per k-value, burnin=5*10 5, lambda=0.51202, determined empirically). The number of population clusters that best fit the data was calculated using the Δk method of Evanno et al. (2005) in Structure Harvester (Earl et al. 2011). Runs with identical parameters were conducted including samples from the closely related wallflower, E. capitatum ssp. angustatum (ERCAAN), to ensure the model could differentiate these taxa. Average group assignments for E. teretifolium were used for later analyses. Samples were analyzed in Arlequin v3.5 (Excoffier et al. 2005) for AMOVA and F ST using groupings predicted by Structure. The total number of differences between each pair of individuals was calculated in PAUP v4.0 (Swofford 2002). The distribution of genetic distances within and among populations was calculated from the resulting distance matrix. Geographic distances were determined in Google Earth based on GPS coordinates. A Mantel nonparametric test was used to compare the geographic and genetic distance matrices ( Liedloff 1999). Population size estimates were based on censuses of juveniles, flowering individuals, and fruiting individuals at each site. Remaining analyses were carried out in Excel. Two primary geographic clusters emerge based on Structure assignments: Northwest/South (QH, BD, AZA/Hwy17), and Central (OLY, GEY, SHGW) with MTH acting as a bridge between the Central and South groupings. Groupings may be arising from a central versus peripheral division References Earl D & von Holdt B (2011). Structure harvester: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources:1-3. Evanno G, Regnaut S, & Goudet J (2005) Detecting the number of clusters of individuals using the software Structure: a simulation study. Molecular Ecology 14(8):2611-2620. Excoffier, Laval LG, & Schneider S (2005) Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online 1:47-50. Leimu R, Mutikainen P, Koricheva J, Fischer M (2006) How general are positive relationships between plant population size, fitness, and genetic variation? Journal of Ecology 94(5):942-952. Liedloff, AC (1999) Mantel Nonparametric Test Calculator. Version 2.0. School of Natural Resource Sciences, Queensland University of Technology, Australia. Muñoz-Pajares AJ, Herrador MB, Abdelaziz M, Picó FX, Sharbel TF, Gómez JM &Perfectti F (2011) Characterization of microsatellite loci in Erysimum mediohispanicum (Brassicaceae) and cross-amplification in related species. American Journal of Botany e287-e289. Pritchard JK, Stephens M, & Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945-959. Swofford, D L (2002) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts. Vellend M (2003) Island Biogeography of Genes and Species. The American Naturalist 162(3):358-365. Wright S (1943). Isolation by distance. Genetics 28(2), 114. Team Wallflower, Summer 2012 Fig. 4. Analysis of Molecular Variance. Populations assigned to groups based on average group assignment probability from Structure k=2 categories without ERCAAN. 82% of the variation exists within populations. Although AMOVA shows most of the variation is contained within populations, F st reveals that most populations are significantly different from one another. There is no correlation between geographic distance and genetic distance. These results suggest that an island-like model is inappropriate to describe these populations although they superficially physically resemble island habitats Most of the genetic diversity exists within populations and correlates weakly with population size. Continental islands such as the Zayante sandhills may not act the same as oceanic islands, as seen in the case of E. teretifolium, which does not fit an isolation by distance model. F st 24 of 28 comparisons between populations had F st significantly greater than 0 (p<0.05). Hwy17, one of the smallest, most disturbed, and isolated populations, has the highest pairwise F st. AZA, one of the largest, least disturbed, and central populations, has the lowest F st. Fig. 5. Isolation by distance. Genetic distances are averages of all pairwise comparisons of individuals for each pairwise comparison of populations. No correlation (Mantel test: 10 4 iterations, 8x8 half matrix, randomization, r = -0.3098, n.s.). A B Next-Generation Sequencing Approach 25 individuals per population were pooled into a single barcode. 4 populations in total were barcoded and sequenced on a single lane of Illumina HiSeq (shared with a total of 8 barcodes/lane). Plant tissue (fresh) Illumina HiSeq (USC Epigenome Center) Identify SNPs Library Prep (Nextera) DNA Extraction DNA De Novo Assembly (Velvet) Contiguous Sequences Contig 1 Contigs Blasted to A. thaliana for identification A. thaliana Contig 1 Microreads produced by Illumina HiSeq (50bp paired-end) BarcodeMicroreads k-mer Length Median Depth of CoverageN50 Max Contig Length (bp) CAGGCG50,044,686233.3752747969 273.2253959373 313.11627710583 353.17015113277 394.53936718453 CTAGCT51,868,830233.4122268764 273.25938110764 313.15326310583 353.20314710587 394.57536316179 TAATCG49,746,950233.27924011043 273.12434211317 313.0331919369 353.14013313990 394.76438416179 TGACCA50,491,668233.3832348196 273.2273779578 313.12824710583 353.19214715095 394.60236818453 Fig. 8. De novo assembly of contigs for four populations of E. teretifolium across a range of k-mer lengths. All four of the longest contigs (k-mer length=39) are similar to known A. thaliana mitochondrial sequences but contain SNPs and indels (megablast, E=0.0).