Lecture 5: Major Genes, Polygenes, and QTLs

Slides:

Advertisements

Similar presentations

6- GENE LINKAGE AND GENETIC MAPPING Compiled by Siti Sarah Jumali Level 3 Room 14 Ext 2123.

Advertisements

AN INTRODUCTION TO RECOMBINATION AND LINKAGE ANALYSIS Mary Sara McPeek Presented by: Yue Wang and Zheng Yin 11/25/2002.

Basics of Linkage Analysis

Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.

Joint Linkage and Linkage Disequilibrium Mapping

Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.

Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.

Quantitative Genetics

Review Session Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM I’ll answer questions on my material, then Chad will answer questions on.

Methods of Genome Mapping linkage maps, physical maps, QTL analysis The focus of the course should be on analytical (bioinformatic) tools for genome mapping,

Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )

Non-Mendelian Genetics

1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004.

Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.

Lecture 5: Major Genes, Polygenes, and QTLs

Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.

Experimental Design and Data Structure Supplement to Lecture 8 Fall

Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.

INTRODUCTION TO ASSOCIATION MAPPING

Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.

Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.

Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.

1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.

Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.

Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,

1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.

Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',

Association Mapping in Families Gonçalo Abecasis University of Oxford.

Quantitative Inheritance

Genetic Linkage.

MULTIPLE GENES AND QUANTITATIVE TRAITS

Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.

Copyright © 2001 American Medical Association. All rights reserved.

upstream vs. ORF binding and gene expression?

Gene Mapping in Eukaryotes

Linkage and Linkage Disequilibrium

Quantitative traits Lecture 13 By Ms. Shumaila Azam

Genome Wide Association Studies using SNP

Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.

The Chromosomal Basis of Inheritance GENE MAPPING AP Biology/ Ms. Day

Modern Synthesis concepts from Laboratory Genetics

Genetic Linkage.

Recombination (Crossing Over)

Genes may be linked or unlinked and are inherited accordingly.

PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)

Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.

And Yet more Inheritance

Power to detect QTL Association

MULTIPLE GENES AND QUANTITATIVE TRAITS

Linkage, Recombination, and Eukaryotic Gene Mapping

The ‘V’ in the Tajima D equation is:

Basic concepts on population genetics

Topic 10.2 Inheritance.

Genetic Mapping Linked Genes.

Lecture 5 Artificial Selection

Lecture 10: QTL Mapping II: Outbred Populations

Genetic Drift, followed by selection can cause linkage disequilibrium

Genetic Linkage.

The Chromosomal Basis of Inheritance GENE MAPPING AP Biology/ Ms. Day

Genes Encode RNAs and Polypeptides

Lecture 9: QTL Mapping II: Outbred Populations

Chapter 7 Beyond alleles: Quantitative Genetics

Linkage Analysis Problems

10.2 Inheritance Skills: Calculation of the predicted genotypic and phenotypic ratio of offspring of dihybrid crosses involving unlinked autosomal genes.

Completion and analysis of Punnett squares for dihybrid traits

10.2 Inheritance Skills: Calculation of the predicted genotypic and phenotypic ratio of offspring of dihybrid crosses involving unlinked autosomal genes.

10.2 Inheritance Skills: Calculation of the predicted genotypic and phenotypic ratio of offspring of dihybrid crosses involving unlinked autosomal genes.

Modern Synthesis concepts from Laboratory Genetics

Presentation transcript:

Lecture 5: Major Genes, Polygenes, and QTLs

Major genes --- genes that have a significant effect on the phenotype Polygenes --- a general term of the genes of small effect that influence a trait QTL, quantitative trait locus --- a particular gene underlying the trait. Usually used when a gene underlying a trait is mapped to a particular chromosomal region Candidate gene --- a particular known gene that is of interest as being a potential candidate for contributing to the variation in a trait Mendelizing allele. The allele has a sufficiently large effect that its impact is obvious when looking at phenotype

Major Genes Major morphological mutations of classical genetics that arose by spontaneous or induced mutation Genes of large effect have been found selected lines pygmy, obese, dwarf and hg alleles in mice booroola F in sheep halothane sensitivity in pigs Major genes tend to be deleterious and are at very low frequencies in unselected populations, and contribute little to Var(A)

Genes for Genetic modification of muscling “Natural” mutations in the myostatin gene in cattle

“Natural” mutation in the callipyge - gene in sheep

“Booroola” gene in sheep increasing ovulation rate Merino Sheep

Major genes for mouse body size The mutations ob or db cause deficiencies in leptin production, or leptin receptor deficiencies

Major Genes and Isoalleles What is the genetic basis for quantitative variation? Honest answer --- don’t know. One hypothesis: isoalleles. A locus that has an allele of major effect may also have alleles of much smaller effect (isoalleles) that influence the trait of interest. Structural vs. regulatory changes Structural: change in an amino acid sequence Regulatory: change affecting gene regulation General assumption: regulatory changes are likely more important

Cis vs. trans effects Cis effect --- regulatory change only affects gene (tightly) linked on the same chromosome Trans effect --- a diffusible factor that can influence regulation of unlinked genes Trans-acting locus. This locus influences genes on other chromosomes and non-adjacent sites on the same chromosome Cis-acting locus. The allele influences The regulation of a gene on the same DNA molecule

TRANS-modifiers MASTER modifiers CIS-modifiers Genomic location of genes on array Genomic location of mRNA level modifiers

Polygenic Mutation For “normal” genes (i.e., those with large effects) simply giving a mutation rate is sufficient (e.g. the rate at which an dwarfing allele appears) For alleles contributing to quantitative variation, we must account for both the rate at which mutants appear as well as the phenotypic effect of each Mutational variance, Vm or s2m - the amount of new additive genetic variance introduced by mutation each generation Typically Vm is on the order of 10-3 VE

Simple Tests for the Presence of Major Genes Simple Visual tests: • Phenotypes fall into discrete classes • Multimodality --- distribution has several modes (peaks) Simple statistical tests • Fit to a mixture model (LR test) p(z) = pr(QQ)p(z|QQ) +pr(Qq)p(z|Qq) + pr(qq)p(z|qq) • Heterogeneity of within-family variances Select and backcross

( ) Mixture Models p ( z ) = X P r ' ; π æ The distribution of trait value z is the weighted sum of n underlying distributions p ( z ) = n X i 1 P r The probability that a random individual is from class i The distributions of phenotypes conditional of the individual belonging to class i The component distributions are typically assumed normal p ( z ) = n X i 1 P r ' ; π æ 2 ( ) ' z ; π i æ 2 = 1 p º e x ∑ ° ∏ Normal with mean m and variance s2 Typically assume common variances -> 2n-1 parameters 3n-1 parameters: n-1 mixture proportions, n means, n variances

` ( z ) = ; ¢ Y In quantitative genetics, the underlying classes are typically different genotypes (e.g. QQ vs. Qq) although we could also model different environments in the same fashion Likelihood function for an individual under a mixture model ` ( z j ) = P r Q p + q ' ; π æ 2 Mixture proportions follow from Hardy-Weinberg, e.g. Pr(QQ) = pQ* pQ Likelihood function for a random sample of m individuals ` ( z ) = 1 ; 2 ¢ m Y j . . .

Likelihood Ratio test for Mixtures Null hypothesis: A single normal distribution is adequate to fit the data. The maximum of the likelihood function under the null hypothesis is m a x ` ( z 1 ; 2 ¢ ) = º S ° e p X j @ A … X S 2 = 1 m ( z i ° ) The LR follows a chi-square distribution with n-2 df, where n-1 = number of fitted parameters for the mixture The LR test for a significantly better fit under a mixture is given by 2 ln (max { likelihood under mixture}/max l0 )

Complex Segregation Analysis A significant fit to a mixture only suggests the possibility of a major gene. A much more formal demonstration of a major gene is given by the likelihood-based method of Complex Segregation Analysis (CSA) Testing the fit of a mixture model requires a sample of random individuals from the population. CSA requires a pedigree of individuals. CSA uses likelihood to formally test for the transmission of A major gene in the pedigree

Building the likelihood for CSA Start with a mixture model Difference is that the mixing proportions are not the same for each individual, but rather are a function of its parental (presumed) genotypes ` ( z i j g f ; m ) = 3 X o 1 P r ' π æ 2 Transmission Probability of an offspring having genotype go given the parental genotypes are gf, gm. Phenotypic value of individual j in family i Major-locus genotypes of parents Phenotypic variance conditioned on major-locus genotype Mean of genotype go Sum is over all possible genotypes, indexed by go =1,2,3 ( g o = 3 j f 1 ; m 2 ) q Q Example: code qq=3, Qq=2,QQ=1 Sum over all possible parental genotypes Likelihood for family i ` ( z i ¢ ) = 3 X g f 1 m j ; ` ( z i ¢ j g f ; m ) = n Y 1 Conditional family likelihood

Transmission Probabilities Explicitly model the transmission probabilities P r ( q j g f ; m ) = 1 ° ø Q + - Probability that the father transmits Q Probability that the mother transmits Q Formal CSA test of a major gene (three steps): • Significantly better overall fit of a mixture model compared with a single normal • Failure to reject the hypothesis of Mendelian segregation : tQQ = 1, , tQq = 1/2, tqq = 0 • Rejection of the hypothesis of equal transmission for all genotypes (tQQ = tQq = tqq )

CSA Modification: Common Family Effects Families can share a common environmental effect Expected value for go genotype, family i is mgo + ci Likelihood conditioned on common family effect ci ` ( z i j g f ; m c ) = n Y 1 2 4 3 X o P r ' π + æ 5 ` ( z i j g f ; m ) = Z 1 ° c ' æ 2 d Unconditional likelihood (average over all c --- assumed Normal with mean zero and variance sc2 Likelihood function with no major gene, but family effects ` ( z i ) = Z 1 ° j c ' ; æ 2 d 4 n Y π + ¢ 3 5

Maps and Mapping Functions The unit of genetic distance between two markers is the recombination frequency, c If the phase of a parent is AB/ab, then 1-c is the frequency of “parental” gametes (e.g., AB and ab), while c is the frequency of “nonparental” gametes (e.g.. Ab and aB). A parental gamete results from an EVEN number of crossovers, e.g., 0, 2, 4, etc. For a nonparental (also called a recombinant) gamete, need an ODD number of crossovers between A & b e.g., 1, 3, 5, etc.

Hence, simply using the frequency of “recombinant” (i.e. nonparental) gametes UNDERESTIMATES the m number of crossovers, with E[m] > c In particular, c = Prob(odd number of crossovers) Mapping functions attempt to estimate the expected number of crossovers m from observed recombination frequencies c When considering two linked loci, the phenomena of interference must be taken into account The presence of a crossover in one interval typically decreases the likelihood of a nearby crossover

c = + ° 2 ( 1 ± ) Suppose the order of the genes is A-B-C. If there is no interference (i.e., crossovers occur independently of each other) then Even number in A-B, odd number in B-C Probability(odd number of crossovers btw A and C) Odd number of crossovers btw A & B and even number between B & C c A C = B ( 1 ° ) + 2 We need to assume independence of crossovers in order to multiply these two probabilities When interference is present, we can write this as Interference parameter c A C = B + ° 2 ( 1 ± ) d = 1 --> complete interference: The presence of a crossover eliminates nearby crossovers = 0 --> No interference. Crossovers occur independently of each other

m = ° l n ( 1 2 c ) Mapping functions. Moving from c to m Haldane’s mapping function (gives Haldane map distances) Assume the number k of crossovers in a region follows a Poisson distribution with parameter m This makes the assumption of NO INTERFERENCE Pr(Poisson = k) = lk Exp[-l]/k! l = expected number of successes c = 1 X k p ( m ; 2 + ) e ° ! - Odd number Prob(Odd number of crossovers) m = ° l n ( 1 2 c ) This gives the estimated Haldane distance as Usually reported in units of Morgans or Centimorgans (Cm) One morgan --> m = 1.0. One Cm --> m = 0.01

Linkage disequilibrium mapping Idea is to use a random sample of individuals from the population rather than a large pedigree. Ironically, in the right settings this approach has more power for fine mapping than pedigree analysis. Why? Key is the expected number of recombinants. in a pedigree, Prob(no recombinants) in n individuals is (1-c)n LD mapping uses the historical recombinants in a sample. Prob(no recomb) = (1-c)2t, where t = Time back to most recent common ancestor

Expected number of recombinants in a sample of n sibs is cn Expected number of recombinants in a sample of n random individuals with a time t back to the MRCA (most recent common ancestor) is 2cnt Hence, if t is large, many more expected recombinants in random sample and hence more power for very fine mapping (i.e. c < 0.01) Because so many expected recombinants, only works with c very small

Fine-mapping genes Suppose an allele causing a large effect on the trait arose as a single mutation in a closed population New mutation arises on red chromosome Initially, the new mutation is largely associated with the red haplotype Hence, markers that define the red haplotype are likely to be associated (i.e. in LD) with the mutant allele

This linkage disequilibrium decays slowly with time if c is small Let p = Prob(mutation associated with original haplotype) p =(1-c)t Thus if we can estimate p and t, we can solve for c, c = 1- p 1/t

Diastrophic dysplasis (DTD) association with CSF1R marker locus alleles Allele Normal DTD-bearing 1-1 4 (3.3%) 144 (94.7%) 1-2 28 (22.7%) 1 (0.7%) 2-1 7 (5.7%) 0 (0%) 2-2 84 (68.3%) 7 (4.6%) Most frequent allele type varies between normal and DTD-bearing haplotypes Hence, allele 1-1 appears to be on the original haplotype in which the DTD mutation arose --> p = 0.947 c = 1- p 1/t = 1- 0.947 1/100 100 generations to MRCA used for Finnish population Gives c = 0.00051 between marker and DTD. Best Estimate from pedigrees is c = 0.012 (1.2cM)

Candidate Loci and the TDT Often try to map genes by using case/control contrasts, also called association mapping. The frequencies of marker alleles are measured in both a case sample -- showing the trait (or extreme values) control sample -- not showing the trait The idea is that if the marker is in tight linkage, we might expect LD between it and the particular DNA site causing the trait variation. Problem with case-control approach: Population Stratification can given false positives.

When population being sampled actually consists of several distinct subpopulations we have lumped together, marker alleles may provide information as to which group an individual belongs. If there are other risk factors in a group, this can create a false association btw marker and trait Example. The Gm marker was thought (for biological reasons) to be an excellent candidate gene for diabetes in the high-risk population of Pima indians in the American Southwest. Initially a very strong association was observed: Gm+ Total % with diabetes Present 293 8% Absent 4,627 29% The association was re-examined in a population of Pima that were 7/8th (or more) full heritage: Problem: freq(Gm+) in Caucasians (lower-risk diabetes Population) is 67%, Gm+ rare in full-blooded Pima Gm+ Total % with diabetes Present 17 59% Absent 1,764 60%

Transmission-disequilibrium test (TDT) The TDT accounts for population structure. It requires sets of relatives and compares the number of times a marker allele is transmitted (T) versus not-transmitted (NT) from a marker heterozygote parent to affected offspring. Under the hypothesis of no linkage, these values should be equal, resulting in a chi-square test for lack of fit: ¬ 2 t d = ( T ° N ) +

Scan for type I diabetes in Humans. Marker locus D2S152 Allele T NT c2 p 228 81 45 10.29 0.001 230 59 73 1.48 0.223 240 36 24 2.30 0.121 ¬ 2 = ( 8 1 ° 4 5 ) + : 9