Presentation on theme: "Many loci Effect on trait small Combine together to affect phenotype Environmental sensitivity Introduction to QTL Mapping."— Presentation transcript:
Many loci Effect on trait small Combine together to affect phenotype Environmental sensitivity Introduction to QTL Mapping
Genetic Architecture of Quantitative Traits Loci? Distribution of effects on trait? Distribution of pleiotropic effects (including fitness) Distribution of context-dependent effects? Sex Environment Genetic background (epistasis) Allele frequency? Causal molecular variant? QTL Mapping
QTL effects too small to be detected by Mendelian segregation Need to map QTLs by linkage to marker loci with genotypes than can be unambiguously scored Principle dates back to 1923, but abundant, polymorphic molecular markers only relatively recently available Most studies use single nucleotide polymorphism (SNP) markers and insertion/deletion (indel) markers Massively parallel sequencing technology is revolutionizing our ability to rapidly map QTLs
Genotype IndividualM1M2M3M4M5M6M7M8Phenotype QTL Mapping Data
QTL Mapping: A Primer Linkage MappingAssociation Mapping Two (or more) parental strains that differ genetically for trait Population sample of individuals with genetic variation for the trait Molecular markers that distinguish the parental strains Molecular markers (whole genome or candidate gene) Mapping population: Genotype all individuals for markers Measure trait phenotype Map QTLs by linkage to markers: Single marker analysis Interval mapping Map QTLs by Linkage Disequilibrium (LD) with markers Mapping population: Genotype all individuals for markers Measure trait phenotype Map QTLs in pedigrees or populations derived from crosses of inbred lines Map QTLs in individuals from an outbred population
Linkage Mapping: Find Parental Strains H 2 = 0.56 H 2 = 0.23 H 2 = 0.58 H 2 = 0.54
Linkage Mapping: Create Mapping Population P1P2 F1F1 BC1: F 1 P1BC2: F 1 P2F 2 : F 1 F 1 RILs
M A1 - - N O1M A2 - - N O2 M A1 - - N O1 vs.M A2 - - N O2 Test for: Linkage of a QTL (A) to individual markers (M, N, or O) = single marker analysis QTL in each interval in turn (M-N and N-O) = interval mapping If there is a difference in trait mean between marker genotype classes, then a QTL is linked to the marker Infer chromosomal locations and effects (a, d) of QTLs Linkage Mapping: Test for Associations Between Markers and Trait
MMarker locus AQTL c recombination fraction between M and A Line Cross Analysis: Single Markers MA c
MA c Generation Genotype Value P1 M 1 A 1 / M 1 A 1 a P2M 2 A 2 / M 2 A 2 –a F 1 M 1 A 1 / M 2 A 2 d F 1 gametes: Genotype Frequency M 1 A 1 (1 – c)/2 M 2 A 2 (1 – c)/2 M 1 A 2 c/2 M 2 A 1 c/2 Non-recombinant genotypes Recombinant genotypes
Random mating of the F 1 gives 10 possible F 2 genotypic classes. The contribution of each marker genotype class to the F 2 mean is obtained by multiplying the frequency of each genotype by its genotypic value, then summing within marker genotype classes. We want actual means, which are got by dividing the contribution to the F 2 mean by the frequency of that marker class, which is the Mendelian segregation ratio of ¼ for the homozygotes and ½ for the heterozygotes. Line Cross Analysis: Single Markers, F 2 Mapping Population
GenotypeFreq. Value Marker Total ContributionActual Class Freq. to F 2 MeanMean M 1 A 1 /M 1 A 1 (1 – c) 2 /4 a M 1 A 1 /M 1 A 2 c(1 – c)/2 d M 1 /M 1 ¼ a(1 – 2c)/4a(1 – 2c) M 1 A 2 /M 1 A 2 c 2 /4 –a + dc(1 – c)/2+ 2dc(1 – c) M 1 A 1 /M 2 A 1 c(1 – c)/2 a M 1 A 1 /M 2 A 2 (1 – c) 2 /2 d M 1 /M 2 ½ d[(1 – c) 2 + c 2 ]/2d[(1 – c) 2 +c 2 ] M 1 A 2 /M 2 A 1 c 2 /2 d M 1 A 2 /M 2 A 2 c(1 – c)/2 –a M 2 A 1 /M 2 A 1 c 2 /4 a M 2 A 1 /M 2 A 2 c(1 – c)/2 d M 2 /M 2 ¼ – a(1 – 2c)/4 – a(1 – 2c) M 2 A 2 /M 2 A 2 (1 – c) 2 /4 –a + dc(1 – c)/2+2dc(1 – c) F 2 Genotypes With One Marker Locus, M, and a Linked QTL, A
The following two contrasts of marker class means are functions of a and d: Contrast 1: (M 1 /M 1 – M 2 /M 2 )/2 = a(1 –2c) Contrast 2: M 1 /M 2 – [(M 1 /M 1 + M 2 /M 2 )/2] = d(1 –2c) 2 This contrast, in combination with the first, therefore allows estimation of d/a, but will always be underestimated by (1 –2c) F 2 Genotypes With One Marker Locus, M, and a Linked QTL, A
In summary: A significant difference in the mean value of a quantitative trait between homozygous marker genotype classes indicates linkage of a QTL and the marker locus. Estimates of a and d/a from single marker analysis are confounded with recombination frequency, and will generally underestimate the true values by (1 –2c). Example: The true effect is a = 1, d = 0. Expected estimates for a as a function of c: c a
With complete cross-over interference: c = c 1 + c 2 (True for c < 0.1 = 10 cM) Interval Mapping Analysis MN c A c1c1 c2c2
Generation Genotype Value P 1 M 1 A 1 N 1 /M 1 A 1 N 1 a P 2 M 2 A 2 N 2 /M 2 A 2 N 2 –a F 1 M 1 A 1 N 1 /M 2 A 2 N 2 d F 1 gametes: Genotype Frequency M 1 A 1 N 1 (1–c)/2 M 2 A 2 N 2 (1–c)/2 M 1 A 2 N 2 c 1 /2 M 2 A 1 N 1 c 1 /2 M 1 A 1 N 2 c 2 /2 M 2 A 2 N 1 c 2 /2 Line Cross Analysis: Interval Mapping Non-recombinant genotypes Recombinant genotypes Example: Back-cross (BC) mapping population: Tabulate BC genotypes, frequencies and means, assuming no double recombination. Calculate expected marker genotype means.
F 1 backcrossed to M 1 A 1 N 1. Gamete Freq. Value Marker Freq. Contribution to Actual Type Class BC Mean Mean M 1 A 1 N 1 (1–c)/2 aM 1 N 1 /M 1 N 1 (1–c)/2 a(1–c)/2 a M 1 A 1 N 2 c 2 /2 a M 1 N 1 /M 1 N 2 c/2 (ac 2 +dc 1 )/2(ac 2 + dc 1 )/c M 1 A 2 N 2 c 1 /2 d M 2 A 1 N 1 c 1 /2 a M 1 N 1 /M 2 N 1 c/2 (ac 1 +dc 2 )/2(ac 1 + dc 2 )/c M 2 A 2 N 1 c 2 /2 d M 2 A 2 N 2 (1–c)/2 dM 1 N 1 /M 2 N 2 (1–c)/2 d(1–c)/2 d BC Genotypes With Two Linked Markers, M and N, and a linked QTL, A
In a manner similar to the single marker example, contrasts between backcross marker class means (γ and δ below) estimate the effects of the QTL. In contrast to the single marker example, the map position relative to the flanking markers can also be estimated: M 1 N 1 /M 1 N 1 – M 1 N 1 /M 2 N 2 = a – d = γ M 1 N 1 /M 1 N 2 – M 1 N 1 /M 2 N 1 = (a – d)(c 2 – c 1 )/c = δ BC Genotypes With Two Linked Markers, M and N, and a linked QTL, A
The estimate of a is unbiased only if d = 0, so recessive QTLs may not be detected. This problem can be overcome by backcrossing to both parental lines, or by using an F 2 design. Note: c is assumed to be known, so c 1 and c 2 can be estimated: δ/γ = (c 2 – c 1 )/c = (c – 2c 1 )/c and solve for c 1. BC Genotypes With Two Linked Markers, M and N, and a linked QTL, A
Association Mapping: Collect Population Phenotypes and Genotypes H 2 = 0.56 H 2 = 0.23 H 2 = 0.58 H 2 = 0.54
Association Mapping Association mapping utilizes historical recombination in random mating populations to identify QTLs, measured by linkage disequilibrium (LD) LD is a measure of the correlation in gene frequencies between two loci.
Consider locus A with alleles A 1 and A 2 at frequencies p 1 and p 2 respectively, and locus B with alleles B 1 and B 2 at frequencies q 1 and q 2 respectively. If the gene frequencies at these loci are uncorrelated, the expected frequency of each gamete type is the product of the allele frequencies at each locus separately. The gamete types are called HAPLOTYPES because we describe the genetic constitution of a haploid gamete. For two loci there are only 4 gamete types: A 1 B 1, A 1 B 2, A 2 B 1 and A 2 B 2. Linkage Disequilibrium (LD)
Gamete TypeExpectedObserved (Haplotype)FrequencyFrequency A 1 B 1 p 1 q 1 = P 11 A 1 B 2 p 1 q 2 = P 12 A 2 B 1 p 2 q 1 = P 21 A 2 B 2 p 2 q 2 = P 22 Where p 1 + p 2 = 1 q 1 + q 2 = 1 Linkage Disequilibrium (LD) If allele frequencies are uncorrelated, the population is in ‘linkage equilibrium’, and P 11 P 22 - P 12 P 21 = 0
If allele frequencies are non-randomly associated, the gamete frequencies are not the simple product of the allele frequencies, but depart from this by amount D D is the coefficient of linkage disequilibrium Linkage Disequilibrium (LD) Gamete TypesExpected FrequencyObserved (Haplotypes)(Disequilibrium) Frequency A 1 B 1 p 1 q 1 + D=P 11 A 1 B 2 p 1 q 2 – D=P 12 A 2 B 1 p 2 q 1 – D= P 21 A 2 B 2 p 2 q 2 + D=P 22 and P 11 P 22 – P 12 P 21 = D
A1B1A1B1 A2B2A2B2 A2B2A2B2 A2B2A2B2 A1B1A1B1 A1B1A1B1 A2B2A2B2 A2B2A2B2 A1B1A1B1 A2B2A2B2 A1B1A1B1 A1B1A1B1 A2B2A2B2 A1B1A1B1 A1B1A1B1 A2B2A2B2 Linkage Disequilibrium A1B1A1B1 A2B2A2B2 A1B1A1B1 A1B1A1B1 A2B2A2B2 A2B2A2B2 A2B2A2B2 A1B2A1B2 A1B1A1B1 A1B2A1B2 A1B2A1B2 A1B2A1B2 A2B1A2B1 A2B1A2B1 A2B1A2B1 A2B1A2B1 Linkage Equilibrium Numerical value of D depends on gene frequencies at the two loci. Sign of D is arbitrary for molecular markers; consider absolute value. Highest value of D for p 1 = p 2 = q 1 = q 2 = 0.5, and gamete types A 1 B 2 and A 2 B 1 are missing (complete linkage disequilibrium); D is then 0.25.
Because of the dependence on gene frequency, values of D are typically scaled by the observed gene frequencies. Linkage Disequilibrium (LD) 1. D'= D/D max D max is the smaller of p 1 q 2 or p 2 q 1. This is because: P 12 = p 1 q 2 – D ≥ 0; D ≤ p 1 q 2 P 21 = p 2 q 1 – D ≥ 0; D ≤ p 2 q 1 Maximum values of D' is r 2 = D 2 /p 1 p 2 q 1 q 2 Expected value in equilibrium population is r 2 = E(r 2 ) = 1/(1 + 4Nc), where N is the effective population size and c is the recombination fraction between the two loci. In principle one can use this relationship to estimate c, but r 2 has very large statistical and genetic sampling variances, so in practice this relationship is not very useful.
Causes of LD: Mutation (a new mutant allele is initially in complete linkage disequilibrium with all other loci in the genome) Admixture between populations with different gene frequencies Natural selection for particular combinations of alleles Population bottlenecks (chance sampling of small number of haplotypes) Linkage Disequilibrium (LD)
c = c = c = 0.01 c = 0.05 c = 0.1 c = 0.5 D declines in successive generations in a random mating population by an amount which depends on the recombination fraction, c. D t = D 0 (1 – c) t after t generations of random mating. With unlinked loci and free recombination (c = 0.5) D is halved by each generation of random mating; with linked loci D decays more slowly. Linkage Disequilibrium (LD)
Then Now Disequilibrium between pairs of loci in random mating populations depends on population history, but is expected to be small unless the loci are very tightly linked.
Use molecular polymorphism and phenotypic information from samples of alleles from a random mating population to determine whether there is an association with the trait phenotype. Can be done for candidate gene, QTL region, or whole genome. Depending on the scale of LD, one can use LD for fine-mapping QTL, and even causal variants. LD large in populations that have undergone recent bottlenecks in population size, from a founder event or artificial selection LD small in large, near equilibrium outbred populations (e.g., Drosophila). CAVEAT: Population admixture can cause false positive associations if marker frequencies and trait values are different between populations Association Mapping
Frequency Phenotype Quantitative traits: Group data by genotype for each marker Assess if there is a difference between the mean of the trait between different alleles of a marker genotype If so, the locus affecting the trait is in LD with the marker locus CasesControls Categorical traits: Group data according to whether individuals are affected or not affected Determine if there is a difference in genotype frequencies or allele frequencies between cases and controls If so, the locus affecting the trait is in LD with the marker locus
Association Mapping Association mapping underestimates QTL effects unless the molecular marker genotyped is the casual variant Let be the effect attributable to the causal variant, and a the estimated effect. = [p(1 – p)/D]a, where p is the frequency of the polymorphic site and D is the LD between the causal QTN and the poylmorphic site associated with it. D p(1 – p) (maximum p(1 – p) = 0.25), so a
t-tests, ANOVA, marker regressions or more sophisticated maximum likelihood (ML) methods can be used to assess differences in trait phenotype between marker genotypes. The parental lines will differ at many loci affecting the trait of interest; therefore QTLs unlinked to the markers under consideration will segregate in the F 2 or backcross generation. Methods for dealing with multiple QTL simultaneously (e.g., composite interval mapping) reduce the variance within marker genotype classes and improve estimates of map positions and of effects. Linkage Mapping: Statistical Considerations
Many markers are tested for linkage to a QTL in a genome scan. The number of false positives increases with the number of tests. With n independent tests, the level for each should be set to α/n (a Bonferroni correction). The number of independent tests will be less than the number of markers because of linked markers. Permutation tests are typically used to determine appropriate experiment-wise significance levels, accounting for multiple tests and correlated markers. Linkage Mapping: Statistical Considerations Likelihood ratio
GenotypePhenotype Ind.M1M2M3M4M5M6M7M8OP1P2P3P Permutation Test
–logP Permutation Test
How large must the experiment be to detect a difference δ between the two homozygous marker genotypes? For simplicity, assume the QTL is completely linked to the marker ( c = 0) and that a t-test is used to judge the significance of the difference of two marker class means. n ≥ 2 (z α + z 2β ) 2 /(δ/σ P ) 2 σ P phenotypic standard deviation within marker-classes αfalse positive (Type I) error rate (0.05) βfalse negative (Type II) error rate (0.1) z ordinate of the normal distribution corresponding to its subscript z α = 1.96 and z 2β = 1.28 Linkage Mapping: Power and Sample Size
n = number per marker class N = number of total mapping population For strictly additive effects, F A 2 = 2pq* 2 F P 2 Easy to detect QTLs with large effects Need large sample sizes to detect QTLs with moderate to small effects The power to detect a difference in mean between two marker genotypes depends on δ/σ P ; strategies to reduce σ P can increase power (e.g., progeny testing, RI lines). F2F2 BC
Linkage Mapping: Recombination and Sample Size Number of individuals needed to detect at least one recombinant in an interval of size c (c = 100cM) Number of marker genotypes needed to localize QTLs per 100 cM
Linkage Mapping: Power, Recombination and Sample Size Large numbers necessary to detect QTL AND estimate location. For an F 2 design, need 336 individuals to detect QTL with large effect (δ/σ P = 0.5) x 59 individuals to ensure the QTL is mapped to a 5 cM region = 19,824 individuals in total and 416,304 marker genotypes per 100 cM. QTL mapping is in practice an iterative procedure, where QTLs are first mapped to broad genomic regions in a genome scan, followed by high resolution mapping to localize genes within each QTL region. Genotyping by sequencing is changing this strategy, facilitating rapid, fine mapping of QTLs.
q = 0.1 q = 0.25 q = 0.5 Association Mapping: Power and Sample Size q = frequency of rare allele LD mapping has the same power as linkage mapping in an F 2 population for intermediate gene frequencies, but much reduced power as the frequency of the rare allele decreases (the number of homozygotes in the population is q 2 ) This calculation assumes the marker is the causal variant; even larger samples are necessary if the marker is in LD with the causal variant Easy to detect intermediate frequency variants with large effects Hard to detect rare variants with small effects
Association Mapping: Recombination and Sample Size Expected frequency of recombinants after t generations of recombination in a random mating population Higher frequency of recombinants in random mating population means smaller sample sizes required for high resolution mapping than linkage studies c = 0.01 c = c = 0.005
Association Mapping: Recombination and Sample Size Number of markers depends on scale and pattern of LD Small population size = large LD tracts = few markers required for QTL detection, but localization poor (dogs). Favorable situation for whole genome LD scan. Large population size = small LD tracts =many markers required for QTL detection, but localization precise, maybe to level of QTN (Drosophila). Favorable situation for candidate gene re-sequencing. LD patterns not constant across genome, but vary with local recombination rate, regions under natural selection Knowing patterns of LD can guide experimental design
Strategies to Increase Power Selective genotyping: Measure many individuals (several thousands), but only genotype the extreme tails Selective genotyping and detect gene frequency differences between tails of distribution by pooling high and low samples (bulk segregant analysis) followed by next generation sequencing of pools
Strategies to Increase Genetic Diversity Estimates of the number of QTL are minimum estimates: Experiments are limited in their power to separate closely linked loci There must always be other loci with effects too small to be detected by an experiment of a particular size The loci found are those differentiating the two strains compared Other loci would probably be found in other strains Can increase genetic diversity by: Artificial selection for high and low trait values from large heterogeneous base population, then inbreeding to construct parental stocks for mapping Mapping population derived from crosses of several inbred strains, either RI lines or large outbred population maintained for many generations
Construction of near-isoallelic lines (NIL) backcross to one of parental strains select for markers flanking QTL and against markers flanking other QTL Fine-scale recombination backcross NIL to one of parental strains select for recombinants within NIL interval using additional markers progeny test recombinant genotypes to map QTL to 2 cM or less. Deficiency mapping (in Drosophila) Change strategy from linkage to association mapping High Resolution Mapping
QTL End Game: Proving QTL Corresponds to Candidate Gene Supporting evidence: Potentially functional DNA polymorphisms Differences mRNA expression between alleles Expression of RNA/protein in relevant tissues Replicated associations in different populations Quantitative complementation QTL alleles and mutant allele More concrete evidence: Create mutants in the candidate gene that affect the trait (transposon tagging) Transgenic rescue Demonstrate functional differences between alleles by knocking-in alternate alleles by homologous recombination