Principles of genetic epidemiology April 2008 course.

Principles of genetic epidemiology April 2008 course

The post-genomic era Now that the full human genome sequence has been published, we have access to genetic information in an unprecedented manner: –3 billion base pairs in the human genome –c 22 000 genes –Tens of thousands of RNAs –Hundreds of thousands of proteins Thus, developments in molecular genetic analysis render it now possible to attempt identification of liability genes in complex, multifactorial traits, and to dissect out with new precision the role of genetic predisposition and environment/life style factors in these disorders. New technologies and statistical tools are continuously introduced Nonetheless, often much hype and little real progress

In complex disease a person's susceptibility genotype and environmental history combine to establish present health status, and the genotype's norm of reaction determines future health trajectory Genes, developmental history and environment as determinants of health

Characteristics of complex traits  Trait values are determined by complex interactions among numerous metabolic and physiological systems, as well as demographic and lifestyle factors  Variation in a large number of genes can potentially influence interindividual variation of trait values  The impact of any one gene is likely to be small to moderate in size  For diseases : Monogenic diseases that mimic complex diseases typically account for a small fraction of disease cases (examples in breast cancer, obesity, hypertension, osteoarthritis)  Example: Ala-Kokko L et al. Single base mutation in the type II procollagen gene (COL2A1) as a cause of primary osteoarthritis associated with a mild chondrodysplasia. PNAS 1990 ;87:6565-8. One large family, mutation not found otherwise.

Phenotype: Clinical definition Define genetic component Identify data sets and data sources Follow-up: (gene tracing & evaluation) Replication Functional studies Interactions (gene-gene & gene- environment) Analysis: Genotyping Statistical analysis Bioinformatics Variation detection Study design: Case-control (association) Family-based (multiplex families/ sibpairs/ trio-based) Whole genome analysis Steps in gene discovery, tracing and evaluation Adapted from Fig 16.2 in Genetic Analysis of Complex Disease, Haines & Pericak-Vance, 2nd edition, 2006

Strategies for family studies: Does disease or behavior aggregate in families? What are the causes of familial aggregation? What is the model of genetic inheritance and which genes are responsible? How do genes interact with the environment?

Families are the basic unit

How to detect genetic effects and find genes? Family studies: – provide estimates of heritability – information on mode of inheritance – adoption and twin studies as special cases Molecular genetic studies: – candidate genes, genome-wide scans – association studies & linkage – animal studies (e.g.’knockouts’)

What is heritability Heritability is the estimate of the proportion in total variance of a trait or liability to a disease that is accounted for by genetic variance - interindividual genetic differences. Genetic variance may arise from additive effects, due to different alleles at a locus, or may be due to dominance, the interactions of alleles Heritability is a characteristic of populations, not individuals or families, which is affected by both genetic and environmental effects

Conceptual model of individual’s phenotype Y = μ + G + Env Where Env = C+E Hence, variance can be decomposed: σ 2 = σ 2 G + σ 2 C + σ 2 E Heritability is σ 2 G/σ 2 and genetic variance has several components: σ 2 G = σ 2 A + σ 2 D + σ 2 I

FAMILY STUDY Provides estimates of the degree of family aggregation Risks to siblings, parents, offspring as well as to other relatives can be estimated Similarity of different types of relatives can permit modelling of genetic versus non- genetic familial influences

To disentangle genes and experience, we study special family groups: Either family members sharing experiences but differing in shared genes, e.g. twin studies or family members sharing genes, but differing in their shared experience, e.g. adoption studies

ADOPTION DESIGN Test for association between trait in adoptees and trait in biological parents (genetic correlation) & Test for association between trait in adoptees and trait in adoptive parents. STRENGTHS:relatively powerful WEAKNESSES: (1) poor generalizability (2) adoptive parents likely to provide ‘good homes’ (3) biological parents of adoptive children may have had multiple forms of psychopathology - selection (4) poor characterization of phenotypes of biological parents

The Classical Twin Study Monozygotic (MZ) pairs are genetically alike Dizygotic (DZ) pairs, like siblings, share on average half of their segregating genes DZ pairs can be same-sexed or opposite-sex (male- female) Increased similarity of twin pairs compared to unrelated subjects suggests familial factors Increased similarity of MZ pairs compared to DZ pairs provides evidence for genetic factors

The classical twin study modelling Model contribution of additive (A) and non- additive (D)genetic effects, environmental effects shared by family members (C ) and unshared effects (E) (i.e. unique to each family member) Competing models, e.g. E, AE, ACE can be statistically compared and tested against actual data Mx – statistical program created by Mike Neale most commonly used in genetic modelling: http://views.vcu.edu/mx/

Different phenotypes, different effects of genes Genetic effects Non-genetic family effects Experimentation (age 12) 11%73% Initiation/ever smoker (adolescents) 20-36%18-59% Initiation/ever smoker (adults) 28-80%4-50% Persistence/ cessation 58-71%None Nicotine dependence (FTND or DSM-IV) 60-75%None

Extensions of the classical twin study I Effect modification by age, sex and environmental factors, e.g. smoking or obesity Assess genetic covariance over time through longitudinal models Assess sex effects by comparison of like- sexed and same-sexed DZ pairs Assess social interaction effects

Genetic Influences on Change in BMI A longitudinal study of Finnish twins J.v.B.Hjelmborg, C.Fagnani, K.Silventoinen,M.McGue, M.Korkeila, K.Christensen, A.Rissanen, J.Kaprio

Finnish Twin Cohort Twins born 1930-1955 participating in three surveys in 1975, 1981 and 1990 Wt and ht asked in each questionnaire 10556 twins answered all questionnaires Same sex pairs Age at baseline 20-45 y

Latent growth model for weight change in adults 1975-1990

Males (95% CI)Females (95% CI) N of pairs499 MZ, 1013 DZ735 MZ; 1265 DZ MZ correlation of BMI level0.79 (.79,.80)0.83 (.82,.83) DZ correlation of BMI level0.44 (.44,.45)0.39 (.38,.39) MZ correlation of weight gain0.60 (.56,.68)0.65 (.61,.71) DZ correlation of weight gain0.26 (.24,.32)0.30 (.28,.32) Heritability of BMI level0.80 (.79,.80)0.82 (0.81,0.84) Heritability of rate of weight gain0.58 (.50,.69)0.64 (0.58,0.69) Add. genetic correlation of BMI levels with rate of weight gain -0.070 (-.13,-.068)0.041 (0.00,0.076) Unique environmental correlation of BMI levels with rate of weight gain 0.0094 (-.020,.091)0.24 (0.14,0.34) Genetic modeling results for latent growth curve model of BMI Finnish Twin Cohort 1975 – 1990

Summary of findings A longitudinal growth curve model provides better estimates of heritability – c 80% for adult BMI –c 60% for rate of weight gain over a 15 year period in young to middle-aged adults Genetic influences on baseline BMI and on rate of weight gain are weakly, if at all, correlated Genes regulating weight gain and loss are likely to be different from those affecting BMI Environmental effects on weight change appear to be larger than on BMI

Extensions of the classical twin study II Define phenotypes by assessing the combination of signs and symptoms with highest heritability –for example, broad vs. narrow definitions of LBP Define natural history of disease by assessing genetic communality of different stages –for example, initiation, persistence, and dependence in smoking Common genetic pathways across phenotypes –for example, hip, knee and hand OA; bone density in weight-bearing & non-weight bearing bones

How to detect genetic effects and genes? Molecular genetic studies: – candidate genes, genome-wide scans – association studies & linkage – animal studies (e.g.’knockouts, knock-ins’) Family studies: – provide estimates of heritability – information on mode of inheritance – adoption and twin studies as special cases

ascertain pedigree units that are likely to segregate genes of relevance –Ex: pedigrees with quasi-Mendelian disease transmission –affected sib pair approach of linkage analysis ascertain families on the basis of individuals with extreme or remarkable phenotypes –Ex: extremely discordant sibpairs –ascertain young individuals with the disease ascertain individuals from isolated populations: –more homogenous genetically and culturally as well ascertain intermediate phenotypes –physiologic phenotype is “closer” to sequence variants Increasing the genetic signal in the data...... At the cost of representativeness and ability to evaluate population risk

ISOLATED POPULATION Wonderfully isolated Finnish population –Small number of founders –Subsequent isolation –Rapid expansion –Major bottlenecks → Genetic drift has moulded the gene pool Genetic homogeneity, longer LD blocks Valuable for genetic studies, especially of monogenic diseases

1.candidate gene analysis motto: study a few good genes 2.whole-genome searches (genome scans) motto: cast out a net that catches all the big fish Two basic Analysis Strategies

statistically straightforward: test the association between genotypes and phenotype with contingency tables, chi-square test, regression principle: if an allele is more frequent in affecteds than unaffecteds  gene may be close to a disease gene candidacy of a gene can come from a number of different sources: –biological insights (e.g. gene expressed in a certain tissue) –homology to other genes –functional studies in model organisms –member of a relevant gene family Challenge: greater biological understanding of the genes Candidate Gene Studies

Allelic association studies test whether alleles are associated with the trait 2 types of association tests –population-based association test cases and controls are unrelated cross-classify by genotype use  2 test, ANOVA or logistic regression –family-based association tests (e.g. TDT) cases and controls are related: parents, sibs etc often based on allele transmission rates Multivariate/data reduction approaches –Multiple regression of all SNPs in gene –Haplotype analyses –False discovery rate and replication rather than p-values Pathway analyses –Combination of individual SNPs/genes and pathway constraints

best: allele increases disease susceptibility –candidate gene studies good: some subjects share common ancestor –linkage disequilibrium studies bad: association due to population stratification –family-based offer protection The 3 possible causes for association d A1A1 d M K AllelesLoci Slide by Steven Horwath, 2003

POPULATION STRATIFICATION Hypothetical Example (by Andrew Heath) Falsely infer that A1 allele is risk-factor for following traditional Mediterranean diet. OR = 2.28, 95%CI 1.39 - 3.73 NO ASSOCIATION NORTHERN EUROPEAN ANCESTRY (N=200) SOUTHERN EUROPEAN ANCESTRY (N=200) NOT A1 allele A1 allele NON-MED DIET MED DIET NON-MED DIET MED DIET 162 18 90% 18 2 10% 35 15 25% 105 45 75% 70% 30% 90% 10% NON-MED DIET MED DIET 197 33 123 47 NOT A1 allele A1 allele MINGLED IN AUSTRALIAN POPULATION (N=400)

Family-based association tests avoid confounding due to ethnic stratification –These designs automatically match "controls" to cases on ethnic ancestry. Conventional wisdom: –family-based designs are generally less efficient than designs based on unrelated control subjects –population admixture effects are negligible Non-conventional wisdom –family controls are better matched for environmental exposures –cryptic relatedness may be an important issue in isolate populations Population-based versus family- based association tests

Pathway approach Hung et al. Cancer Epid Biomarker Prev 2004 & Conti et al. Human Heredity 2003

involve anonymous markers, no candidate genes hundreds of evenly spaced genetic markers in the genome often hundreds of related individuals in small to large families linkage analysis is statistical method to draw inferences about the co-transmission of marker locus alleles and trait-influencing alleles Identifies chromosomal regions harboring the genes predisposing to trait (such as nicotine dependence) Family-based Genome Scans

Co-transmission of disease and alleles Aa aaAa aa Aa aa

ChromosomePhenotypeLOD ≥2 / p-value Author and yearCountryNumber of families and individuals 2FTQ2.61Straub et al. 1999New Zealand130 families, 343 individuals FTQ2.53Sullivan et al. 2004New Zealand129 families 5FTND3.04Gelernter et al. 2007US634 small nuclear families 6FTND2.70Swan et al. 2006US158 nuclear families, 607 individuals 7FTND2.70Swan et al. 2006US158 nuclear families, 607 individuals FTND2.73Gelernter et al. 2007US634 small nuclear families FTND2.50Loukola et al. 2007Finland153 families, 505 individuals 8FTND2.7Swan et al. 2006US158 nuclear families, 607 individuals 10HSI4.17Li et al. 2006US (AA)402 nuclear families, 1261 individuals FTQ2.43Straub et al. 1999New Zealand130 families, 343 individuals FTQ2.02Sullivan et al. 2004New Zealand129 families 11FTND2.31Li et al. 2006US (AA)402 nuclear families, 1261 individuals HSI2.15Li et al. 2006US (AA)402 nuclear families, 1261 individuals 17FTND0.009*Lou et al. 2007US (EA)200 families, 671 individuals AA=African-American sample, EA=European-American sample, * Lou et al 2007 reported a p-value

Index cases are twins from pairs concordant for heavy smoking based on earlier questionnaires from the Finnish Twin Cohorts 1293 families (twin pairs) invited 762 families recruited with 2412 family members (1278 men, 1134 women) Data collection complete for 2143 persons Interview, blood sample, informed consent SAMPLE COLLECTION

–Identified Finnish families with DZ smoking twins Invited also siblings and parents to participate –153 affected twin-pair families, 505 individuals –On average 3 individuals per family (range 2-9) Phenotype definitions 1.Smoker (smoked ≥100 cigarettes during lifetime) 2.Nicotine dependent (Fagerström, FTND) 3.Nicotine dependent (DSM-IV) 4.Alcohol use (aiming for intoxication) 5.Co-morbid phenotype of FTND and alcohol use STUDY SAMPLE

Chromosome 11- Nicotine Withdrawal LOD score cM position Finnish Australian Chromosome 11- Candidate Genes for Nicotine withdrawal in Finnish and Australian families 1. DRD4 2. TH 3. CHRNA10 4. TPH1 5. ANKK1/DRD2, HTR3A, HTR3B 1234 5

involve anonymous markers, no candidate genes chips of 300,000 to 1,000,000 SNPS on a single array (Illumina, Affymetrix) Hundreds to thousands of cases and unrelated controls High-through-put genotyping of common SNPs such as those identified from HapMap project Over past two years many new genes in common diseases have been identified Two recent GWAs on nicotine dependence (Uhl et al, 2007, Bierut et al, 2007) New GWA on smoking cessation (Uhl G, et al, Arch Gen Psychiatr, in press) finds genes with very little overlap to earlier GWAs on nicotine dependence Genome-wide Case-Control Analyses

Li C-Y et al, PLoS Comput Biol 2008 Bioinformatics processing of existing information to discover biological pathways

Li C-Y et al, PLoS Comput Biol 2008

Integration of information at different levels Developments in molecular genetics render it now possible to attempt identification of liability genes in complex, multifactorial traits, and to dissect out with new precision the role of genetic predisposition and environment/life style factors in these disorders. But, an integrative framework is needed Complex picture Gottesmann I, Science 1997

P G4G4 G1G1 G2G2 G3G3 E1E1 E4E4 E2E2 E3E3 G E P1P1 P4P4 P2P2 P3P3 G’ 4 E’ 4 E’ 1 G’ 2 E’ 3 G’ 1 Measured GenotypesMeasured Environments Outcome Phenotype Endophenotypes TIME?TIME? P5P5 G’ 5 E’ Eaves et al., 2005

millions of SNPs, bi-allelic all common genetic variants known common function known fast genotyping, sequencing, mutation detection Information of genetic data will increase past present, future microsatellites incomplete knowledge of variants function barely known linkage analysis genetic map candidate genes new technology statistical methods Linkage disequilibrium tests Slide from Steve Horwarth

Complex disease gene mapping is starting to fullfill its promise distinction between candidate gene studies and whole genome scans diminishes as genotyping costs decrease when collecting pedigrees enriched with affecteds always collect the DNA of good controls as well Put effort into high quality and detailed phenotyping –multiple, longitudinal measures –use intermediate, physiological phenotypes as traits –Imaging, metabolomics –gene expression and protein array measurements To summarize

Useful reading JL Haines, MA Pericak-Vance. Genetic analysis of Complex Disease. Wiley, 2006 DC Thomas. Statistical Methods in Genetic Epidemiology, Oxford 2004 MJ Khoury. Human Genome Epidemiology, Oxford, 2003

Principles of genetic epidemiology April 2008 course.

Similar presentations

Presentation on theme: "Principles of genetic epidemiology April 2008 course."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Principles of genetic epidemiology April 2008 course.

Similar presentations

Presentation on theme: "Principles of genetic epidemiology April 2008 course."— Presentation transcript:

Similar presentations

About project

Feedback