Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel: 982-0368.

Similar presentations

Presentation on theme: "Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel: 982-0368."— Presentation transcript:

1 Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel:

2 Genetic epidemiology “A science which deals with the etiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations.” Newton E. Morton, 1982

3 Model for Complex Diseases + = Disease Susceptibility

4 Trait 2 Trait 1 Trait 3 Disease Gene 4 Gene 1 Gene 2 Gene 3 Environment 1 Environment 2 Genetics of a Complex Disease

5 “Monogenic” vs “Complex” Disease MendelianComplex 1 or small # of genesMany Often etiologicSusceptibility / molecular (severe phenotype)pathology ? Highly penetrantModest penetrance High Odds RatioModest/Low Odds Ratio Strong selection => Weak/No selection => Low frequency/RareHigh frequency/Common Coding SequenceNon-coding/regulation (?)

6 Overall steps for disease gene identification Is there a genetic component? Study design Measurement of phenotype Molecular analysis Functional analysis

7 Is there a genetic component? Twin studies Familial aggregation Segregation analysis Race/ethnicity differences

8 Twin studies Comparison of monozygotic (MZ) pairs (who share all their genome) with dizygotic (DZ) twins (who share half of their genome in common on average, same as sibs) The greater similarity of MZ twins than DZ twins is considered evidence of genetic factors The pairwise (Pr) concordance (Pr) is the proportion of affected pairs that are concordant for the disease. –The proportion of twin pairs with both twins affected of all ascertained twin pairs with at least one affected –Pr=C/(C+D), where C is the number of concordant pairs and D is the number of discordant pairs. The probandwise (Cc) concordance is the proportion of affected individuals among the co-twins of previously ascertained index cases. –Allows for double counting of doubly ascertained twin pairs and is interpretable as the recurrence risk in a co-twin of an affected individual –Cc=2C/(2C+D) In theory, complete genetic determination of a disease would equate to MZ twins having 100% concordance and DZ twins having 50% concordance

9 Twin studies - assumptions Random mating No interactions between genes and environment Equivalent environments for MZ and DZ twins

10 Concordance rates for some traits

11 Other types of twin studies Twins discordant for disease have been used to examine possible environmental causes. Adoption studies also permit the separation of childhood rearing effects from genetic effects by studying the similarity of adopted children with their biological and adoptive parents

12 Bouchard et al. Science Oct 12;250(4978):223-8.

13 Familial aggregation Sibling risk relative ratio s = risk to a sibling of person with disease of interest population risk

14 SNP Disease Variant & λ s in Diabetes Type 1Type 2 Prob of Disease (Sibling) 6%30-40% Prob of Disease (Unrelated) 0.4%7% λ s

15 Recurrence risks for multiple sclerosis in families Compston and Coles. Lancet. 2002;359(9313):

16 Segregation analysis Determines which specific model (genetic or environmental) best fits the familial aggregation E.g. –Major gene or many genes (polygenic)? –Dominant, additive, recessive inheritance?

17 Differences in prevalence across race/ethnicities


19 Time trends in the percentage of African American adolescents and adults who were overweight,

20 Adiposity However, rates of type 2 diabetes in African Americans still higher than Caucasian Americans after controlling for age, adiposity, and socio-economic status Other factors must be involved

21 Epidemiological study designs

22 Study designs Case series: what clinicians see Case-control: compare people with and without a disease Cohort: follow people over time to see who gets the disease Randomized controlled trial (RCT)

23 Other terms Retrospective vs. prospective Cross-sectional vs. longitudinal

24 Measurement of Phenotype What is the phenotype? –e.g., diabetes, fasting glucose, oral glucose tolerance test How is it diagnosed? –Physician’s diagnosis, clinical measurements, questionnaire How objective are the phenotypes? –Physician’s diagnosis – somewhat variable –Clinical measurements – most pretty good *The more defined the phenotype, the easier to find the gene(s) that controls it

25 Additional consideration in genetic studies: families or unrelated individuals?

26 Patient ascertainment Sib-pair: Families: Case-control: ? ? ? ? or

27 Molecular/Analytical approaches Linkage –Families Association –Candidate gene –Genome wide –Generally case-control –There are family-based approaches

28 Effect and frequency of risk alleles dictate strategies Linkage studies Association studies Unlikely to exist Frequency in population Magnitude of effect Unlikely to be found

29 Linkage analysis Linkage = The proximity of two or more markers on a chromosome Linkage analysis is a statistical method for detecting linkage between a disease and markers of known location by following their inheritance in families Uses recombination to define genomic interval likely to contain gene/s Single large pedigree or multiple small pedigrees

30 Linkage analysis Works well for Mendelian traits and more highly penetrant diseases Low resolution = fewer markers needed and resilient to allelic heterogeneity Apparently high Type I error rate for complex/non-Mendelian diseases – more loci, common variants, high phenocopy rate, and lower penetrance Large pedigrees better for rare alleles – more likely to segregate the allele Large pedigrees increase the probability of parental heterozygosity for frequent alleles Most likely to detect intermediate frequency alleles Strong pedigree signal may reflect rare Mendelian forms of complex disease –eg BRCA1 & BRCA2 mutations in breast and ovarian cancer

31 Genotype markers across the genome Illumina's Linkage IVb Panel: >6,000 SNPs

32 LOD score LOD score = Log of the Odds of linkage = log 10 Likelihood of linkage = log 10 L(  <0.5) The closer two markers are to each other, the lower the odds of a recombination (crossing over event) occurring between them in meiosis. Likelihood of not being linked L(  =0.5)

33 Linkage analysis Is there cosegregation of a chromosomal region with the phenotype?

34 Linkage analysis

35 Is there cosegregation of a chromosomal region with the phenotype? Add additional markers to region Add additional families to study

36 Association study Best power for common variants of modest - low effect size Search for specific genetic differences distinguishing cases from controls Cases Controls

37 Cross-sectional - no follow-up More efficient recruitment than families - easy to ascertain and recruit Easy to analyze Statistical power compared to family-based linkage Cases and controls must be well- matched - Drawn from same population - Randomize non-genetic confounder factors At risk for type 1 errors if incomplete matching (stratification) Case - Control is the most popular study design for complex disease genetics: X X

38 Genome-wide association studies (GWAS): A paradigm shift in human genetics

39 How can we use SNPs to find diabetes genes? Genome-Wide Association Study (GWAS) –Examination of variation across the entire human genome to identify genetic correlations with the presence or absence of diabetes Two groups: cases (have diabetes) vs. controls (don’t have diabetes) Each participant’s genome is surveyed for markers of genetic variation (SNPs) Groups compared to determine specific genetic differences between the two groups

40 GWAS approach Does not assume knowledge of genes/biology Investigate markers evenly spaced along genome Investigate association: Joint occurrence of two alleles (e.g. disease allele and marker allele) in a population > expected frequency

41 Why are GWAS now feasible? SNP identification efforts  more SNPs in databases Understanding of linkage disequilibrium in the human genome (HapMap project)  fewer “tagSNPs” to genotype Lower cost of genotyping platforms

42 Products now use >1 million SNPs!

43 Pairwise tagging Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 high r 2 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA After Carlson et al. (2004) AJHG 74:106

44 Use of haplotypes can improve genotyping efficiency AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA A CCCCCC A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 Tags: SNP 1 SNP 3 2 in total Test for association: SNP 1 captures 1+2 SNP 3 captures 3+5 “AG” haplotype captures SNP 4+6

45 Efficiency and power Relative power (%) Average marker density (per kb) tag SNPs random SNPs P.I.W. de Bakker et al. (2005) Nat Genet ~300,000 tag SNPs needed to cover common variation in whole genome in CEU

46 Genotyping platform: Illumina Cost: 370 Duo $ Y $ M $

47 Genotyping platform: Affymetrix

48 Completeness of dbSNP Vast majority of common SNPs are contained in or highly correlated with a SNP in dbSNP Nature 437, YRI CEU CHN+JPT

49 Comparison of coverage Paul de Bakker, pers. comm.

50 Association Studies Detect genes/genomic regions associated with a disease through allelic associations in case-control studies –Causal variants are associated with disease phenotype –Linked neutral variants are associated with the disease phenotype through LD with the causal variant Younger disease variants (rarer variant) –LD around the variant is stronger = better power –Associated region containing variant is broad (low genome resolution) Older disease variants (common variant) –Weaker LD = worse power –Better association map resolution

51 Phenotype-genotype association Marker associated with disease could be: 1. False positive result (type 1 error) 2. Co-inherited with a true causative (functional) variant 3. A true functional or causative variant

52 Replication and follow-up Many analytical tests – high probability of false positives Replicate in additional studies (often requires cross- study collaboration) Map the causal variant –Denser marker map –Evidence for other variants in the same gene with (perhaps smaller) independent effects (allelic heterogeneity) –Haplotype analysis –Resequencing –Sequence / genome mapping bioinformatics to identify or predict genome features in the linkage disequilibrium region of the map SNP –Expression or reporter assays

53 Common Disease Common Variant Hypothesis (CDCV) Genetic risk for common diseases (diabetes, CHD, hypertension, schizophrenia, asthma,..) results from common variants/polymorphisms in multiple genes The effects for each gene variant must be smaller than in monogenic disorders otherwise the prevalence of the diseases would be very high Since SNPs are a common mode of variation in the human genome - and coding SNPs lead to mongenic diseases - SNPs may be the variants that are associated with risk for common diseases

54 Do the Common Disease Variants Code ? Not necessarily (or usually ?) Protein coding SNPs (cSNPs) may disrupt protein fold, structure, activity Variants in Mendelian diseases with high penetrance (50-100% penetrance) often disrupt proteins But common diseases are not penetrant to the same level - genetic odds ratios : Prob (Disease | risk allele)~ Prob (Not Disease | risk allele)

55 Early successes Klein R et al. Complement factor H polymorphism in age-related macular degeneration. Science Apr 15; 308: –96 cases and 50 controls; 116,204 SNPs Maraganore et al. High-resolution whole- genome association study of Parkinson disease. Am J Hum Genet Nov; 77: –198,345 SNPs in 443 sibling pairs discordant for PD –1,793 PD-associated SNPs (P<.01 in tier 1) and 300 genomic control SNPs in 332 matched case-unrelated control pairs

56 The future Identify additional genes in diverse populations Identify causal variant/s in these genes Determine function of novel genes, and function of causative variants Explore gene x gene interactions (epistasis) and gene x environment interactions (e.g. physical activity, diet) Other technological advances: –Animal models of disease –Innovative imaging of target tissues –Functional approaches to gene expression profiling –Whole genome sequencing? Era of “personalized medicine”&/or prevention

57 END Questions?

Download ppt "Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel: 982-0368."

Similar presentations

Ads by Google