Human Genetics Genetic Epidemiology.

Slides:



Advertisements
Similar presentations
Confounding from Cryptic Relatedness in Association Studies Benjamin F. Voight (work jointly with JK Pritchard)
Advertisements

Association Tests for Rare Variants Using Sequence Data
Genetic Heterogeneity Taken from: Advanced Topics in Linkage Analysis. Ch. 27 Presented by: Natalie Aizenberg Assaf Chen.
Qualitative and Quantitative traits
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Tutorial #1 by Ma’ayan Fishelson
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Hardy-Weinberg Equilibrium
 Read Chapter 6 of text  Brachydachtyly displays the classic 3:1 pattern of inheritance (for a cross between heterozygotes) that mendel described.
Basics of Linkage Analysis
S.P. From linkage analysis to linkage disequilibrium mapping: the case of HRPT2 ( a gene mutated in Hyperparathyroidism-jaw tumor syndrome) by Silvano.
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
GGAW - Oct, 2001M-W LIN Study Design for Linkage, Association and TDT Studies 林明薇 Ming-Wei Lin, PhD 陽明大學醫學系家庭醫學科 台北榮民總醫院教學研究部.
Chapter 2: Hardy-Weinberg Gene frequency Genotype frequency Gene counting method Square root method Hardy-Weinberg low Sex-linked inheritance Linkage and.
Pedigree Analysis.
Simulation/theory With modest marker spacing in a human study, LOD of 3 is 9% likely to be a false positive.
Parametric and Non-Parametric analysis of complex diseases Lecture #8
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Inferences About Process Quality
Quantitative Genetics
Robust and powerful sibpair test for rare variant association
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Population Genetics Learning Objectives
1 Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.
HARDY-WEINBERG EQUILIBRIUM
Process of Genetic Epidemiology Migrant Studies Familial AggregationSegregation Association StudiesLinkage Analysis Fine Mapping Cloning Defining the Phenotype.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
1 Father of genetics. Studied traits in pea plants.
Population Genetics is the study of the genetic
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Pedigree Analysis.
Non-Mendelian Genetics
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and.
 a visual tool for documenting biological relationships in families and the presence of diseases  A pedigree is a family tree or chart made of symbols.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Genetic Theory Pak Sham SGDP, IoP, London, UK. Theory Model Data Inference Experiment Formulation Interpretation.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Biometrical Genetics Shaun Purcell Twin Workshop, March 2004.
Chapter 6 Sampling and Sampling Distributions
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Power in QTL linkage analysis
Power Calculations for GWAS
Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.
Genome Wide Association Studies using SNP
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Balanced Translocation detected by FISH
Lecture 9: QTL Mapping II: Outbred Populations
Linkage Analysis Problems
Presentation transcript:

Human Genetics Genetic Epidemiology

Family trees can have a lot of nuts

Genetic Epidemiology - Aims Gene detection Gene characterization mode of inheritance allele frequencies → prevalence, attributable risk

Genetic Epidemiology - Methods Aggregation Segregation Co-segregation Association

Segregation affected and unaffected or two distributions: determined by a dominant or recessive allele Also possible: three distributions: Can the dichotomy or trichotomy be explained by Mendelian segregation?

Likelihood (parameter(s); data)  Probability (data | parameter(s)) The joint probability of the genotypes and phenotypes of all the members of a pedigree can be written as

Transmission Probabilities Value if there is Mendelian segregation 1 ½ P(AA transmits A) = τ AA A P(Aa transmits A) = τ Aa A P(aa transmits A) = τ aa A

Ascertainment We examine segregating sibships The proportion of sibs affected is larger than expected on the basis of Mendelian inheritance The likelihood must be conditional on the mode of ascertainment We need to know the proband sampling frame

Cosegregation Chromosome segments are transmitted Cosegregation is caused by linked loci ultimate statistical proof of genetic etiology

Methods of Linkage Analysis Trait model-based – assume a genetic model underlying the trait Trait model-free - no assumptions about the genetic model underlying the trait (parametric) (non-parametric) Ascertainment is often not an issue for locus detection by linkage analysis

Model-based Linkage Analysis If founder marker genotypes are known or can be inferred exactly, → no increase in Type 1 error → smallest Type 2 error when the model is correct If founder marker genotypes are unknown, we can 1) estimate them 2) use a database All parameters other than the recombination fraction are assumed known

Model-free Linkage Analysis Identity-in-state versus Identity-by-descent Two alleles are identical by descent if they are copies of the same parental allele A1A1 A1A2 IBD

Sib pairs share 0, 1 or 2 alleles identical by descent at a marker locus 0, 1 or 2 alleles identical by descent at a trait locus Linkage The average proportion shared at any particular locus is 1/2

Relative Pair Model-Free Linkage Analysis We correlate relative-pair similarity (dissimilarity) for the trait of interest with relative-pair similarity (dissimilarity) for a marker Linkage between a trait locus and a marker locus → positive correlation Affected relative pair analysis: Do affected relative pairs share more marker alleles than expected if there is no linkage? No controls!

Association Causes of association between a marker and a disease chance stratification, population heterogeneity very close linkage pleiotropy

Causes of Allelic Association Heterogeneity/stratification This allelic association is nuisance association Simpson's paradox: If we mix two populations that have both different disease prevalence and different marker allele prevalence, and there is no association between the disease and marker allele in each population, there will be an association between the disease and the marker allele in the mixed population. The best solution to avoid this confounding is to study only ethnically homogeneous populations

(Tight) Linkage Imagine a number of generations ago, a normal allele d mutated to a disease allele D on a particular chromosome on which the allele at a marker locus was A1 mutation A1 d A1 D This chromosome is passed down through the generations, and now there are many copies. If the distance between D and A1 is small, recombinations are unlikely, so most D chromosomes carry A1 This is the type of allelic association we are interested in

Guarding Against Stratification Three solutions: use a homogenous population use family-based controls use genomic control

Matching on Ethnicity Close relatives are the best controls, but can lead to overmatching Cases and control family members must have the same family history of disease Siblings Cousins

Transmission Disequilibrium Test (TDT) A design that uses pseudosibs as controls Cases and their parents are typed for markers A1A2 A2A2 Transmitted genotype is A1A2 Untransmitted genotype is A2A2 Father transmits A1, does not transmit A2 Mother transmits A2, does not transmit A2 (uninformative in terms of alleles)

Build up a 2 x 2 table: Transmitted A1 A2 A1 Untransmitted A2 • c a b d The counts a and d come from homozygous parents The counts b and c come from heterozygous parents McNemar's test : χ12 (b - c)2 b + c

Genomic Control Calculate an association statistic for a candidate locus Calculate the same association statistic, from the same sample, for a set of unlinked loci Determine significance by reference to the results for the unlinked loci

Linkage Between a Marker and a Disease Intrafamilial association Typically no population association Not affected by population stratification Population association if very close

Association versus Linkage Allelic Association Linkage Association at the population level Intrafamilial association Pinpoints alleles Pinpoints loci More powerful Less powerful More tests required Fewer tests required More sensitive to mistyping Less sensitive to mistyping Sensitive to population stratification Not sensitive to population stratification Which is better?

What is the Best Design and Analysis? If heterogeneity / stratification is a non-issue, unrelated cases and controls for association analysis (genome scan?) If heterogeneity / stratification could be an issue, genome scan desired, large extended pedigrees, type all (founders and non- founders) for 200-400 equi-spaced markers, for linkage analysis Note: cost, burden of multiple testing A wise investigator, like a wise investor, would hedge bets with a judicious mix

Case-Control Data Consider a particular marker allele, A1, sample of cases and controls: N n2 n1 n0 Total S s2 s1 s0 Controls R r2 r1 r0 Cases 2 1 Number of A1 alleles

Consider the probability structure: q2 q1 q0 Controls p2 p1 p0 Cases 2 1 Number of A1 Alleles Cochran-Armitage trend: test the null hypothesis p2 + ½p1 = q2 + ½q1 without assuming the two alleles a person has are independent Sasieni (1997) Biometrics 53:1253-1261

asymptotically has a χ2 distribution with 1 d.f

Cochran-Armitage Trend Test Does not assume independence of alleles within a person Does assume independence of genotypes from person to person Is not valid if there is population stratification The increased variance due to stratification can be estimated from a random set of markers that are independent of the disease genomic control. Devlin and Roeder (1999) Biometrics 55:997-1004

Case-only Studies Look at departure from (1-p)2 2p(1-p) p2 A*A* A1A* where p = P(A1) = p2 + ½p1 Suggested as more powerful (only cases needed) more precise (signal decreases faster with distance from the causative locus) Hardy-Weinberg Disequilibrium (HWD) test statistic:

Case - only Studies No power in the case of a multiplicative model No controls there must be a difference in HWD between cases and controls therefore we consider this HWD trend test:

Weighted average of the Cochran-Armitage trend test and the HWD trend test statistics We want to give more weight to b or d, whichever yields the larger signal Therefore take

To investigate the null distribution of this average we simulate many different situations – sample sizes up to 10,000 cases and 10,000 controls - and generate For all situations considered, the distribution is well approximated by a Gamma distribution

As the sample size and marker allele frequency increase, the largest mean and the smallest variance occur for 10,000 cases and 10,000 controls, and for a marker allele frequency 0.5 For 10,000 cases and 10,000 controls, and marker allele frequency 0.5, the upper tail of the distribution is well approximated by a Gamma distribution with mean μ = 1.78 and variance σ2 = 3.45

We develop a prediction equation to determine percentiles of the null distribution for smaller sample sizes and marker allele frequencies We base goodness of fit on the root mean squared error (RMSE) of logeα, calculated for various sample size combinations, from the variance among 50 replicate samples:

With ~90% confidence, the true loge α lies in the. interval logeα + 1 With ~90% confidence, the true loge α lies in the interval logeα + 1.645(RSME), i.e., α is within e+1.645(RSME) - fold of the true α For total sample size (R + S) 200 or larger and α = 0.0001 or larger, in the very worst case (R = S = 100, α = 0.0001) with 90% confidence α could differ from the true α by a factor of at most ~ 4.8 The average RMSE is 0.35, corresponding to being between 78% and 122% of the true α with 90% confidence

Genetic Models Simulated Probability of being affected given POWER Genetic Models Simulated Probability of being affected given A1A1 A1A* A*A* 1 Recessive 1 1.00 0.10 2 Recessive 2 0.05 3 Additive 0.50 0.00 4Multiplicative 0.81 0.045 0.0025 Each simulated population contains 500,000 individuals allowed to randomly mate for 50 generations after the appearance of a disease mutation Marker loci placed at distances 0 – 6 cM from the disease susceptibility locus For type I error, no association between the disease and marker loci

Tests Performed Homogeneous populations HWD, cases only Allele test Allele test x HWD in cases HWD trend test Cochran-Armitage trend test Cochran-Armitage trend test x HWD trend test Weighted average Population stratification Cochran-Armitage trend test with genomic control Product of this and the HWD trend test Weighted average with genomic control

Type I error, homogeneous population ∆ HWD test, cases only ▲ product of the allele test and HWD test

Type I error, population stratification ○ allele test ◊ Cochran-Armitage trend test ▲ product of the allele test and HWD test ■ weighted average test ● product of the Cochrn-Armitage trend test and the HWD test

Power, homogeneous population ■ weighted average test

Power, population stratification □ HWD trend test ♦ CA test with genomic control ■ weighted average with genomic control

Conclusions Under recessive inheritance, the weighted average has better performance than either the Cochran-Armitage trend test or the HWD trend test Has good performance for other models as well The product of the Cochran-Armitage trend test statistic and the HWD test statistic (cases only) has better power, but has inflated Type I error if there is population stratification The weighted average has good overall properties, automatically controls for marker mistyping

With acknowledgment to Kijoung Song

Can we use evolutionary models, when we have large amounts of genetic data on a sample of cases and controls, to obtain a more powerful way of detecting loci involved in the etiology of disease? Will these models bear fruit or nuts?