Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of whole genome association studies in pedigreed populations

Similar presentations


Presentation on theme: "Analysis of whole genome association studies in pedigreed populations"— Presentation transcript:

1 Analysis of whole genome association studies in pedigreed populations
Goutam Sahana Genetics and Biotechnology Faculty of Agricultural Sciences Aarhus University, 8830 Tjele, Denmark

2 Concept of mapping Identification of genetic variant underlying disease susceptibility or a trait value Evidence for the location of the gene = Causal variant

3 Approaches to Mapping Candidate gene studies Genome-wide studies
Association Resequencing approaches Genome-wide studies Linkage analysis Genome-wide association studies (Linkage disequilibrium, LD mapping)

4 Linkage mapping Look for marker alleles that are correlated with the phenotype within a pedigree Different alleles can be connected with the trait in the different pedigrees

5 Association mapping Marker alleles are correlated with a trait on a population level Can detect association by looking at unrelated individuals from a population Does not necessarily imply that markers are linked to (are close to) genes influencing the trait.

6 Linkage vs. association
Unlikely to exist Linkage analysis Effect Association study Very difficult Freq. of causal variant Modified from D. Altschuler

7 Linkage vs. association
Potential Advantage Linkage Association No prior information regarding gene function required + Localization to small genomic region - Not susceptible to effects of stratification -/+ Sufficient power to detect common alleles of modest effect (MAFs>5%) Ability to detect rare allele (MAFs<1%) Tools for analysis available +/- Hirschhorn & Daly, Nature Rev. Genet. 2005

8 Allelic Association Direct Association Indirect Association
Allele of interest is itself involved in phenotype Indirect Association Allele itself is not involved, but due to LD with the functional variant Spurious association Confounding factors (e.g., population stratification)

9 Linkage disequilibrium
Non random association between alleles at different loci. Loci are in LD if alleles are present on haplotypes in different proportions than expected based on allele frequencies Two alleles that are in LD are occurring together more often than would be expected by chance

10 Linkage disequilibrium
Locus A: Alleles A & a; freq. PA & Pa Locus B: Alleles B & b; freq. PB & Pb A B A b a B a b Possible haplotyoes Expected frequencies: pApB pApb papB papb Observed frequencies: pAB pAb paB pab D = pAB - pApB ≠ 0

11 LD variation across genome
The extent of LD is highly variable across the genome The determinants of LD are not fully understood. Factors that are believed to influence LD Genetic drift Population growth Admixture or migration Selection Variable recombination rates

12 Haplotype Genotypes Locus1 2 4 Locus2 1 3 Locus3 3 2 Locus4 4 1
Haplotypes Identification of phase 4 1 3 2 2 3 4 1 PHASE BEAGLE

13 Haplotype-based analysis
Increased ability to identify regions that are shared identical by descent among affected individuals Haplotypes may the causative ‘composite allele’ rather than a particular nucleotide at a particular SNP Haplotype analysis is meaningful only if SNPS are in themselves in LD

14 Monogenic Complex traits
verses Complex traits

15 Monogenic trait Mutation in single gene is both necessary and sufficient to produce the phenotype or to cause the disease The impact of the gene on genetic risk is the same in all families Follow clear segregation pattern in families Typically rare in population

16 Complex trait Multiple genes lead to genetic predisposition to a phenotype Pedigree reveals no Mendelian pattern Any particular gene mutation is neither sufficient nor necessary to explain the phenotype Environment has major contribution We study the relative impact of individual gene on the phenotype

17 Some examples Mendelian/ Complex Disease No. of genes Incidence
Cystic fibrosis M 1 40 Huntington disease 5-10 Diabetes, type 2 C ? 10,000 – 20,000 Alzheimer 20,000 Schizophrenia 1000

18 Quantitative Trait A biological trait that shows continuous variation rather than falling into distinct categories Quantitative trait locus (QTL) - Genetic locus that is associated with variation in such quantitative trait

19 Assessing genetic contributions to complex traits
Continuous characters (wt, blood pressure) Heritability: Proportion of observed variance in phenotype explained by genetic factors Discrete characters (disease) Relative risk ratio: λ= risk to relative of an affected individual/risk in general population λ encompasses all genetic and environmental effects, not just those due to any single locus

20 Factors that influence identification of allelic association
Effect size Linkage disequilibrium Disease and marker allele frequencies Sample Size Reviewed by Zondervar & Cardon, Nature Rev. Genet. 2004

21 Odds ratio

22 Sample size Disease allele freq. Marker allele freq. Odd ratio 3.0 2.0 1.3 0.2 150 360 2900 0.5 430 1250 11,000 0.05 1170 4150 40,000 4200 15000 160,000 No. of cases= no. of controls; D’=0.7; power 80%;  =0.001 Zondervar & Cardon (Nature Rev. Genet. 2004)

23 Population stratification
Consider two case/control samples, genotyped at a marker with alleles M and m Sample A Sample B M m Freq. Affected 50 0.10 Unaffec. 450 0.90 0.50 M m Freq. Affected 1 9 0.01 Unaffec. 99 891 0.99 0.10 0.90 2 NS 2 NS

24 Population stratification
Sample A Sample B M m Freq. Affected 50 0.10 Unaffec. 450 0.90 0.50 M m Freq. Affected 1 9 0.01 Unaffec. 99 891 0.99 0.10 0.90 M m Freq. Affected 51 59 0.055 Unaffec. 549 1341 0.945 0.30 0.70 2 =14.8 P<0.001

25 Dealing with population structure
Genomic control (Devlin and Roeder, 1999) Inflate the distribution of the test statistic by λ. λ estimated from data Unlinked ‘null’ markers Test locus 2 No stratification E(2) 2 E(2) Stratification Adjust test statistics

26 Dealing with population structure
Structured association (Pritchard et al., 2000) Discover structure from set of unlinked markers, i.e. assign probabilities of ancestry from k populations to each individual, and then control for it.

27 Association analysis approaches
Case–control studies Markers frequencies are determined in a group of affected individuals and compared with allele frequencies in a control population Family based methods Based on unequal transmission of alleles from parents to a single affected child in each family. Associations are summed over many unrelated families

28 Case-Control studies: 2 test
Alleles Genotypes 1 2 Total Case n1 n2 2N Ctrl m1 m2 2M T1 T2 2(N+M) 11 12 22 Total Case n11 n12 n22 N Ctrl m11 m12 m22 M T11 T12 T22 N+M 2x3 contingency table 2x2 contingency table Test of independence: 2 = (O-E)2/E with 2 or 1 df

29 Family based tests Genotypes from independent family trios where the child is affected Use the non-transmitted genotypes or alleles as internal controls to the transmitted ones

30 Family-based association studies
? ? 1 4 transmitted non-transmitted 1 2 3 4 control 1 4 Is an allele transmitted more often than it’s not transmitted to affected offspring ?

31 TDT: Transmission Disequilibrium Test
Non-transmitted G g G/G G/g a b c d G g Transmitted G/g TDTG = (TG-NTG)2/(TG+NTG) =(b-c)2/(b+c) ~ 21

32 TDT: Transmission Disequilibrium Test
Multiallelic markers ETDT (Sham & Curtis, 1995) Missing parent genotypes TRANSMIT (Cayton,1999) Haplotypes TDTHAP (Clayton & Jones, 1999) Sibs TDT/STDT (Spielman & Ewens, 1998) Pedigrees PBAT (Martin et al, 2000) Quantitative traits QTDT (Abecasis et al. 2000)

33 Some limitations Subjects – random or structure family
Parents not available Difficult when there are very many genes individually of small effect Environmental influence may obscure genetic effects Genetic heterogeneity underlying disease phenotype Hidden (unaccounted) relationship

34 Rare allele A a Single family is segregating B b Offspring group I
Offspring group II

35 Complex pedigree & Quantitative traits

36 Complex pedigree Non-independence among pedigree members
Only polygenic relationship is not sufficient Association analysis should account for the point-wise relationship among individuals Identical-by-decent probabilities

37

38 Methods Combined linkage and LD Generalized linear models
Mixed-model (Yu et al. 2006) Bayesian approach

39 Combined linkage and LD
Phenotype= Fixed factors + Polygene + Haplotype Polygene – the whole relationship in pedigree is used Identical-by-descend coefficients were estimated for point-wise relationship Phase determination - GDQTL QTL mapping - DMU

40 QTL for Clinical Mastitis in cattle
LA

41 QTL for Clinical Mastitis in cattle
LA LD

42 QTL for Clinical Mastitis in cattle
LD/LA LA LD

43 Simulation 100 half-sib families (Dairy cattle pedigree) 2000 progeny
5 chromosomes – 100 cM (each) SNP – 5000 15 QTL (1QTL-10%, 4QTL-5 %, 10QTL–2%) 50% of the genetic variance Heritability – 30%

44 Generalized linear models
Phenotype= Sire-family + genotype Software – TASSEL

45 Generalized linear models

46 Generalized linear models

47 Generalized linear models

48 Mixed-model (Yu et al. 2006) 1 2 SAS mixed model (Gael Pressoir)
Phenotype= Fixed factors + SNP + Population + polygene 1 2 STRUCTURE Relationship SAS mixed model (Gael Pressoir)

49 Mixed-model

50 Mixed-model

51 Mixed-model

52 Software – iBays (Janss LLG, 2007)
Bayesian approach Phenotype= Fixed factors + Polygene + Allele or Haplotype All markers are fitted simultaneously, search for marker combination that explains the trait variation Avoid multiple testing Software – iBays (Janss LLG, 2007)

53 Bayesian approach

54 Bayesian approach

55 Multiple testing

56 Multiple testing Performing one test at an alpha level of 0.05 implies 5% chance of rejecting a true null hypothesis (false positive) Performing 100 tests at  = 0.05 when all 100 H0 are true, we expect 5 of the tests to give FP results Pr(at least one FP)=1-Pr(no FP)= 1- (0.95)100 = 0.994 (if the tests are independent)

57 Multiple testing Bonferroni correction Permutation test
Rejection level of each test is i  /m Permutation test False discovery rate (FDR) What proportion of rejections are when H0 is true? Of all the times you reject H0 how often is H0 true? q value (Storey et al. PNAS 2003)

58 Summary 4 methods LD and linkage GLM Mixed-model Bayesian approach

59 Project team Goutam Sahana Bernt Guldbrandtsen Luc Janss
Mogens Sandø Lund


Download ppt "Analysis of whole genome association studies in pedigreed populations"

Similar presentations


Ads by Google