Presentation is loading. Please wait.

Presentation is loading. Please wait.

Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.

Similar presentations


Presentation on theme: "Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph."— Presentation transcript:

1 Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph by Stephen Dalton/Animals Animals - Earth Scenes Preliminary Results

2 Ambystoma tigrinum complex

3 Coalescent Processes Stochastic Incomplete lineage sorting Gene tree incongruence Capture variance Many loci Degnan and Rosenberg, 2006 PLOS Genetics

4 Goals Sequence >100 independent loci from 100s of samples –both alleles Population genetics Species delimitation Gene phylogenies Species phylogeny Jeremiah Smith

5 Past Option Sanger Sequencing –expensive –cloning or computational phasing alleles –low throughput

6 454 (Roche) Next Generation Sequencing 1 million reads × 400 bp each = 400 Million bp

7 Meyer et al. 2008 Nature Protocols Barcoding

8

9 Methods Screened ~250 EST loci across 16 representative samples Found >100 variable loci that amplify well at the same temperature Amplified 95 loci for one individual in one plate 94 individuals –8930 amplicons Pooled across 95 loci for each individual Barcoded 94 individuals and pooled UKY-AGTC: 454 Libraries, emPCR, 454 sequencing

10 Preliminary Results Two test runs: 1/8 th picotiter plate –65K + 20K sequences One final run: 1/4 th picotiter plate –225K sequences Total ~ 300K sequences Coverage of about 34X per sample per locus Sorted >95%

11 1664 seqs / 95 loci = 18X coverage 96% loci have sequence 45 loci had >10X coverage

12 Genotyping Clonal amplification through emPCR Each sequence is derived from a single DNA strand Identify both alleles without bacterial cloning

13 Errors Homopolymer regions Single nucleotide mismatches

14 Automated Statistical Genotyping Hohenlohe et al., 2010 PLOS Genetics

15 Genotyping Let n be the total number of reads per site Let n = n 1 + n 2 + n 3, where n i is the read count for each possible nucleotide at the site For diploid, there are 10 possible genotypes –4 homozygous (AA, TT, GG, CC) –6 heterozygous (AT, AG, AC, TG, TC, GC) Calculate the likelihood of each possible genotype using a multinomial sampling distribution, which gives the probability of observing a set of read counts (n 1,n 2,n 3,n 4 )

16 Likelihood of a Homozygote

17 Likelihood of a Heterozygote

18 Assigning Genotypes The 2 equations give the likelihoods of the two most likely hypotheses out of 10 Use a LRT to compare the Homo vs. Het hypotheses (df=1) If the test is significant, we assign the most likely genotype at that site for that individual If the test is not significant, we do not assign a genotype This process tests for each SNP independently, but we want to genotype the entire sequence

19 8 ways to be Het at 3 SNPs: C—T—CG—T—C C—C—CG—C—C C—T—TG—T—T C—C—TG—C—T We need to maintain the correct info.

20 Desired Workflow 454 data received as FASTA files Sort by barcode –Tommy has some code for this Assemble by locus (alignments) –Currently in Geneious, what other options? Genotype (phase the alleles) –Need to implement automated method –Quality scores Export data as sequences for phylogenetic analysis Export data as alleles for population genetic analysis

21 Current Challenges Normalization Sorting and Assembly Genotyping Data analysis –gene trees –species trees

22 Cost 454 Cost: ~$4,400 Sanger Cost: >$85,000 + big dye, cloning, etc. Sanger Cost for ~9K amplicons: $9,000

23 Communal Sequencing 10X of 9000 amplicons in <1/4 th plate Pool across loci, species, markers, whatever –Any targeted loci nDNA and mtDNA SNPs Microsatellites

24 Luikart et al., 2003 Nature Reviews Genetics The power and promise of population genomics: from genotyping to genome typing “In 5-10 years, the generation of sequence data might be affordable enough to be used to study numerous loci in hundreds of individuals from non-model species. Sequences are desirable because ascertainment bias is avoided, haplotypes can be identified (or inferred), and coalescent times and allele relatedness (genealogies) can be estimated. Difficulties with sequencing include the analysis of heterozygous sites and insertion/deletion polymorphisms.”

25

26 David Weisrock Yukie Kajita Stephi Mitchell Alex NobleBen Tuttle Justin Kratovil Josh Williams http://sweb.uky.edu/~dweis2/ Chris Schardl Jenny Webb


Download ppt "Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph."

Similar presentations


Ads by Google