Presentation is loading. Please wait.

Presentation is loading. Please wait.

GBS & GWAS using the iPlant Discovery Plant & Animal Genome XXI - San Diego, CA.

Similar presentations


Presentation on theme: "GBS & GWAS using the iPlant Discovery Plant & Animal Genome XXI - San Diego, CA."— Presentation transcript:

1 GBS & GWAS using the iPlant Discovery Plant & Animal Genome XXI - San Diego, CA

2 Overview: This training module is designed to demonstrate the Genotype by Sequencing Workflow and Genome Wide Association Study using a Mixed Linear Model Questions: 1.How can we determine genotypes using sequencing technology? 2.How can we find genetic variants (e.g. SNPs) associated with a phenotype?

3 Tools for Statistical Genetics in the DE ToolPurpose Genotype by Sequencing WorkflowAutomatic pipeline for extracting SNPs from GBS data (with genome from user or from iPlant database) UNEAK pipelineAutomatic pipeline for extracting SNPs from GBS data without reference genomes MLM workflowAutomatic workflow for fitting Mixed Linear Model GLM workflowAutomatic workflow for fitting General Linear Model QTLC workflowAutomatic workflow for composite interval mapping QTL simulation workflowAutomatic workflow for simulating trait data with given linkage map PLINKPLINK implementation of various association models ZmapqtlInterval mapping and composite interval mapping with the options to perform a permutation test LRmapqtlLinear regression modeling SRmapqtlStepwise regression modeling AntEpiSeekerEpistatic interaction modeling Random JungleRandom Forest implementation for GWAS FaST-LMMFactored Spectrally Transformed Linear Mixed Modeling QxpakVersatile mixed modeling gluH2PConvert Hapmap format to Ped format LDLinkage Disequilibrium plot StructureEstimation of population structure PGDSpiderData conversion tool GLMstrucutreGLM with population structure as fixed effect

4 Elshire et al. PLoS One May 4;6(5):e doi: /journal.pone

5 Genotype By Sequencing Elshire et al. PLoS One May 4;6(5):e doi: /journal.pone Ed Buckler (Cornell University)

6 GBS Overview

7 Identification of markers with/without the reference genome SNP and small INDELs B73 Mo17 Loss of cut site

8 Reads -> Tags -> Aligned Tags -> SNPs/INDELs CAGCAAAAAAAAAAAAGAGGGATG C GGCGGCTTGCGTGCATGGGACACAAGCGTGTAGACGGGC CAGCAAAAAAAAAAAAGAGGGATG G GGCGGCTTGCGTGCATGGGACACAAGCGTGTAGACGGGC Two ways of alignments: a.Anchored to reference genome b.Pair-wise alignment between tags

9 GBS Lab Protocol From:

10

11 Input files: Sequence (QSEQ or FASTQ) Key file (bar-code to sample) Input files: Sequence (QSEQ or FASTQ) Key file (bar-code to sample)

12

13 Input Key File

14

15 Trims and cleans reads to 64 bp tags

16

17 Locates tags on genome

18

19 Associates tags to germplasms

20 Saved as a binary file

21

22

23 “Genotype By Sequencing Workflow” in DE Individual steps strung together to run with a single click Some steps merged to reduce I/O

24 GBS Workflow Output in the DE Final filtered hapmap files in folder “filt”

25 Final Notes on GBS If you do not have a reference genome: -- use “UNEAK” (also part of TASSEL) If your reference genome is not support by the DE: -- use “GBS Workflow with user genome” tories/bioinformatics/TASSEL/uneak_pi peline_documentation.pdf

26 MLM Pipeline for GWAS marker trait filter convert impute K GLM MLM Mixed Linear Model alternative to General Linear Model: Reduces false positives by controlling for population structure Uses compression to decrease effective sample size P3D protocol to eliminate need to re-compute variance components Speeds compute time up to ~7500x faster than GLM Zhang et al. Nature Genetics. 2010; doi: /ng.546 Ed Buckler (Cornell University) TASSEL

27 MLM Input Files Hapmap file Phenotype data Kinship matrix* Population structure* strain traits Phenotype data strain 3 populations sum to 1 * Kinship matrix & population structure data can be generated using TASSEL or with “MLM Workflow” App in DE Population structure

28 MLM Output MLM1.txt – Marker – “df” degrees of freedom – “F” F distribution for test of marker – “p” p-value – “errordf” df used for denominator of F-test – etc. MLM2.txt – Estimated effect for each allele for each marker MLM3.txt – The compression results shows the likelihood, genetic variance, and error variance for each compression level tested during the optimization process. See TASSEL manual for details: See TASSEL manual for details:

29 THANKS!


Download ppt "GBS & GWAS using the iPlant Discovery Plant & Animal Genome XXI - San Diego, CA."

Similar presentations


Ads by Google