Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association Modeling With iPlant

Similar presentations


Presentation on theme: "Association Modeling With iPlant"— Presentation transcript:

1 Association Modeling With iPlant

2 Goals of this Section Familiarize with the basic concepts of quantitative genetics: Traits, phenotypes, genotypes Understand the basics of trait mapping Understand the conceptual foundations of association studies Lear how to perform a genome wide association study in the iPlant Discovery Environment Obtain genotypes Run a Mixed Linear Model

3 Phenotype Observable (measurable) trait (character) of an organism
Trait: eye color Phenotype: wild type (red), white eyed, orange eyed

4 Qualitative Traits Campbell, 8e

5 Controlled by One Locus

6 Co-segregation in Pedigree
Donahue, R. P., et al., Probable assignment of the Duffy blood group locus to chromosome 1 in man, Proceedings of the National Academy of Sciences 61, (1968).

7 Quantitative Trait Carlos Harjes

8 Trait Varies on a Continuous Scale
Frequency Trait Value

9 Quantitative Traits Probably caused by multiple loci
Interaction effects Environment If the mean trait value for individuals with marker state MM is different from the mean trait value of individuals with marker state mm (i.e. the marker is associated with the phenotype), then the marker is linked to a quantitative trait locus.

10 Individuals Trait value Markers Marker #6 Mean Trait Value Present 110 ± 10 Absent 115 ± 13 Marker #3 Mean Trait Value Present 99 ± 5 Absent 118 ± 8

11 Quantitative Genetics
Exploring the Genetic Architecture* Underlying Quantitative Traits *Genetic Architecture How many loci? Which location? How strong?

12 Tools for Statistical Genetics in the DE
Purpose Genotype by Sequencing Workflow Automatic pipeline for extracting SNPs from GBS data (with genome from user or from iPlant database) UNEAK pipeline Automatic pipeline for extracting SNPs from GBS data without reference genomes MLM workflow Automatic workflow for fitting Mixed Linear Model GLM workflow Automatic workflow for fitting General Linear Model QTLC workflow Automatic workflow for composite interval mapping QTL simulation workflow Automatic workflow for simulating trait data with given linkage map PLINK PLINK implementation of various association models Zmapqtl Interval mapping and composite interval mapping with the options to perform a permutation test LRmapqtl Linear regression modeling SRmapqtl Stepwise regression modeling AntEpiSeeker Epistatic interaction modeling Random Jungle Random Forest implementation for GWAS FaST-LMM Factored Spectrally Transformed Linear Mixed Modeling Qxpak Versatile mixed modeling gluH2P Convert Hapmap format to Ped format LD Linkage Disequilibrium plot Structure Estimation of population structure PGDSpider Data conversion tool GLMstrucutre GLM with population structure as fixed effect

13 A Model for Quantitative Traits
Phenotype Genotype Environment P = G + E + GG + GE P = G + e P=Phenotype G=Genotype E=Environment GG=Interaction between genotypes GE=Interaction between genotype and environment

14 A Statistical Model for QTLs
P=G + e yij trait value in individual j with genotype i β0 population average of trait value β1 effect of marker i on trait value xi marker genotype i εij error term General Linear Model (in matrix notation): Y=Xb + e Note: If errors are not normally distributed, use generalized linear models

15

16 Linkage Mapping (QTL Mapping)
Designed population F2 Recombinant inbred (RIL) Double-Haploid (DH) Back-cross (B2)

17

18 Limitation of Linkage Mapping
Needs large number of related individuals Resolution limited (interval contains 100s of genes) QTL position and effect are confounded

19 Association Mapping Use random collection of individuals from natural population Very dense marker map = very high resolution

20 Linkage & Recombination
Recombination causes linkage decay Other factors affecting LD: Selection (artificial or natural) Drift Mutations Population structure Demography

21 Linkage Disequilibrium

22 Pitfalls: Population Structure
Difference in allele frequencies between subpopulations Due to neutral or adaptive processes Can create spurious association

23 No association within groups

24 Similar effect due to presence of related individuals (esp. in plants)
Can be accounted for using the data: Estimate number of subpopulations Assign individuals to subpopulation Estimate kinship

25 Accounting for Random Effects: Mixed Linear Models
"Cost" associated with estimating a parameter We are not interested in the value of the parameter, only the variance Q-K method (structured association) y=Xβ+Sα+Qv+Zu+e Fixed effects: β Vector of fixed effects α Vector of SNPs effects v Vector of subpopulation effects Random effects: u Vector of kinship effects e Residuals Q Matrix of population association (STRUCTURE) X, S, Z Incidence Matrices

26 Traits MLM Markers Population Structure STRUCTURE Kinship TASSEL

27 Obtain Markers Genome Resequencing Workflow Genotyping By Sequencing

28 MLM Pipeline for GWAS Ed Buckler (Cornell University) TASSEL
marker trait filter convert impute K GLM MLM Zhang et al. Nature Genetics. 2010; doi: /ng.546

29 MLM Input Files Hapmap file Phenotype data Kinship matrix*
traits strain Hapmap file Phenotype data Kinship matrix* Population structure* Population structure 3 populations sum to 1 strain * Kinship matrix & population structure data can be generated using TASSEL or with “MLM Workflow” App in DE

30 Origin Hapmap file: Phenotype data Kinship matrix Population structure
Download (e.g. Convert from PLINK (.map/.ped) using Tassel 3 Conversion Impute with NPUTE Transform to numerical format with NumericalTransform Phenotype data Kinship matrix Generate from hapmap marker data with Kinship Population structure Generate using ParallelStructure Convert to matrix with Structure2Tassel

31 MLM Output MLM1.txt MLM2.txt MLM3.txt See TASSEL manual for details:
Marker “df” degrees of freedom “F” F distribution for test of marker “p” p-value “errordf” df used for denominator of F-test etc. MLM2.txt Estimated effect for each allele for each marker MLM3.txt The compression results shows the likelihood, genetic variance, and error variance for each compression level tested during the optimization process. See TASSEL manual for details:

32 THANKS!


Download ppt "Association Modeling With iPlant"

Similar presentations


Ads by Google