Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel: 982-0368.

Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics msale@virginia.edu Tel: 982-0368

Genetic epidemiology “A science which deals with the etiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations.” Newton E. Morton, 1982

Model for Complex Diseases + = Disease Susceptibility

Trait 2 Trait 1 Trait 3 Disease Gene 4 Gene 1 Gene 2 Gene 3 Environment 1 Environment 2 Genetics of a Complex Disease

“Monogenic” vs “Complex” Disease MendelianComplex 1 or small # of genesMany Often etiologicSusceptibility / molecular (severe phenotype)pathology ? Highly penetrantModest penetrance High Odds RatioModest/Low Odds Ratio Strong selection => Weak/No selection => Low frequency/RareHigh frequency/Common Coding SequenceNon-coding/regulation (?)

Overall steps for disease gene identification Is there a genetic component? Study design Measurement of phenotype Molecular analysis Functional analysis

Is there a genetic component? Twin studies Familial aggregation Segregation analysis Race/ethnicity differences

Twin studies Comparison of monozygotic (MZ) pairs (who share all their genome) with dizygotic (DZ) twins (who share half of their genome in common on average, same as sibs) The greater similarity of MZ twins than DZ twins is considered evidence of genetic factors The pairwise (Pr) concordance (Pr) is the proportion of affected pairs that are concordant for the disease. –The proportion of twin pairs with both twins affected of all ascertained twin pairs with at least one affected –Pr=C/(C+D), where C is the number of concordant pairs and D is the number of discordant pairs. The probandwise (Cc) concordance is the proportion of affected individuals among the co-twins of previously ascertained index cases. –Allows for double counting of doubly ascertained twin pairs and is interpretable as the recurrence risk in a co-twin of an affected individual –Cc=2C/(2C+D) In theory, complete genetic determination of a disease would equate to MZ twins having 100% concordance and DZ twins having 50% concordance

Twin studies - assumptions Random mating No interactions between genes and environment Equivalent environments for MZ and DZ twins

Concordance rates for some traits

Other types of twin studies Twins discordant for disease have been used to examine possible environmental causes. Adoption studies also permit the separation of childhood rearing effects from genetic effects by studying the similarity of adopted children with their biological and adoptive parents

Bouchard et al. Science. 1990 Oct 12;250(4978):223-8.

Familial aggregation Sibling risk relative ratio s = risk to a sibling of person with disease of interest population risk

SNP Disease Variant & λ s in Diabetes Type 1Type 2 Prob of Disease (Sibling) 6%30-40% Prob of Disease (Unrelated) 0.4%7% λ s 15 4-6

Recurrence risks for multiple sclerosis in families Compston and Coles. Lancet. 2002;359(9313):1221-31.

Segregation analysis Determines which specific model (genetic or environmental) best fits the familial aggregation E.g. –Major gene or many genes (polygenic)? –Dominant, additive, recessive inheritance?

Differences in prevalence across race/ethnicities

Time trends in the percentage of African American adolescents and adults who were overweight, 1988-94 http://www.niddk.nih.gov/health/diabetes/pubs/afam/afam.htm

Adiposity However, rates of type 2 diabetes in African Americans still higher than Caucasian Americans after controlling for age, adiposity, and socio-economic status Other factors must be involved

Epidemiological study designs http://en.wikipedia.org/wiki/Study_design

Study designs Case series: what clinicians see Case-control: compare people with and without a disease Cohort: follow people over time to see who gets the disease Randomized controlled trial (RCT)

Other terms Retrospective vs. prospective Cross-sectional vs. longitudinal

Measurement of Phenotype What is the phenotype? –e.g., diabetes, fasting glucose, oral glucose tolerance test How is it diagnosed? –Physician’s diagnosis, clinical measurements, questionnaire How objective are the phenotypes? –Physician’s diagnosis – somewhat variable –Clinical measurements – most pretty good *The more defined the phenotype, the easier to find the gene(s) that controls it

Additional consideration in genetic studies: families or unrelated individuals?

Patient ascertainment Sib-pair: Families: Case-control: ? ? ? ? or

Molecular/Analytical approaches Linkage –Families Association –Candidate gene –Genome wide –Generally case-control –There are family-based approaches

Effect and frequency of risk alleles dictate strategies Linkage studies Association studies Unlikely to exist Frequency in population Magnitude of effect Unlikely to be found

Linkage analysis Linkage = The proximity of two or more markers on a chromosome Linkage analysis is a statistical method for detecting linkage between a disease and markers of known location by following their inheritance in families Uses recombination to define genomic interval likely to contain gene/s Single large pedigree or multiple small pedigrees

Linkage analysis Works well for Mendelian traits and more highly penetrant diseases Low resolution = fewer markers needed and resilient to allelic heterogeneity Apparently high Type I error rate for complex/non-Mendelian diseases – more loci, common variants, high phenocopy rate, and lower penetrance Large pedigrees better for rare alleles – more likely to segregate the allele Large pedigrees increase the probability of parental heterozygosity for frequent alleles Most likely to detect intermediate frequency alleles Strong pedigree signal may reflect rare Mendelian forms of complex disease –eg BRCA1 & BRCA2 mutations in breast and ovarian cancer

Genotype markers across the genome Illumina's Linkage IVb Panel: >6,000 SNPs

LOD score LOD score = Log of the Odds of linkage = log 10 Likelihood of linkage = log 10 L(  <0.5) The closer two markers are to each other, the lower the odds of a recombination (crossing over event) occurring between them in meiosis. Likelihood of not being linked L(  =0.5)

Linkage analysis Is there cosegregation of a chromosomal region with the phenotype?

Linkage analysis

Is there cosegregation of a chromosomal region with the phenotype? Add additional markers to region Add additional families to study

Association study Best power for common variants of modest - low effect size Search for specific genetic differences distinguishing cases from controls Cases Controls

Cross-sectional - no follow-up More efficient recruitment than families - easy to ascertain and recruit Easy to analyze Statistical power compared to family-based linkage Cases and controls must be well- matched - Drawn from same population - Randomize non-genetic confounder factors At risk for type 1 errors if incomplete matching (stratification) Case - Control is the most popular study design for complex disease genetics: X X

Genome-wide association studies (GWAS): A paradigm shift in human genetics

How can we use SNPs to find diabetes genes? Genome-Wide Association Study (GWAS) –Examination of variation across the entire human genome to identify genetic correlations with the presence or absence of diabetes Two groups: cases (have diabetes) vs. controls (don’t have diabetes) Each participant’s genome is surveyed for markers of genetic variation (SNPs) Groups compared to determine specific genetic differences between the two groups

http://www.genizon.com/html/gestion/Research_Technology.jpg GWAS approach Does not assume knowledge of genes/biology Investigate markers evenly spaced along genome Investigate association: Joint occurrence of two alleles (e.g. disease allele and marker allele) in a population > expected frequency

Why are GWAS now feasible? SNP identification efforts  more SNPs in databases Understanding of linkage disequilibrium in the human genome (HapMap project)  fewer “tagSNPs” to genotype Lower cost of genotyping platforms

Products now use >1 million SNPs!

Pairwise tagging Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 high r 2 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA After Carlson et al. (2004) AJHG 74:106 http://www.hapmap.org/downloads/presentations/2_Daly.ppt

Use of haplotypes can improve genotyping efficiency AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA A CCCCCC A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 http://www.hapmap.org/downloads/presentations/2_Daly.ppt Tags: SNP 1 SNP 3 2 in total Test for association: SNP 1 captures 1+2 SNP 3 captures 3+5 “AG” haplotype captures SNP 4+6

Efficiency and power Relative power (%) Average marker density (per kb) tag SNPs random SNPs P.I.W. de Bakker et al. (2005) Nat Genet ~300,000 tag SNPs needed to cover common variation in whole genome in CEU

Genotyping platform: Illumina Cost: 370 Duo $240-280 650Y $480-520 1M $580-650 http://www.genengnews.com/articles/chtitem.aspx?tid=1862&chid=2

Genotyping platform: Affymetrix http://gmed.bu.edu/about/genotyping.html

Completeness of dbSNP Vast majority of common SNPs are contained in or highly correlated with a SNP in dbSNP Nature 437, 1299-1320. 2005 YRI CEU CHN+JPT

Comparison of coverage Paul de Bakker, pers. comm.

Association Studies Detect genes/genomic regions associated with a disease through allelic associations in case-control studies –Causal variants are associated with disease phenotype –Linked neutral variants are associated with the disease phenotype through LD with the causal variant Younger disease variants (rarer variant) –LD around the variant is stronger = better power –Associated region containing variant is broad (low genome resolution) Older disease variants (common variant) –Weaker LD = worse power –Better association map resolution

Phenotype-genotype association Marker associated with disease could be: 1. False positive result (type 1 error) 2. Co-inherited with a true causative (functional) variant 3. A true functional or causative variant

Replication and follow-up Many analytical tests – high probability of false positives Replicate in additional studies (often requires cross- study collaboration) Map the causal variant –Denser marker map –Evidence for other variants in the same gene with (perhaps smaller) independent effects (allelic heterogeneity) –Haplotype analysis –Resequencing –Sequence / genome mapping bioinformatics to identify or predict genome features in the linkage disequilibrium region of the map SNP –Expression or reporter assays

Common Disease Common Variant Hypothesis (CDCV) Genetic risk for common diseases (diabetes, CHD, hypertension, schizophrenia, asthma,..) results from common variants/polymorphisms in multiple genes The effects for each gene variant must be smaller than in monogenic disorders otherwise the prevalence of the diseases would be very high Since SNPs are a common mode of variation in the human genome - and coding SNPs lead to mongenic diseases - SNPs may be the variants that are associated with risk for common diseases

Do the Common Disease Variants Code ? Not necessarily (or usually ?) Protein coding SNPs (cSNPs) may disrupt protein fold, structure, activity Variants in Mendelian diseases with high penetrance (50-100% penetrance) often disrupt proteins But common diseases are not penetrant to the same level - genetic odds ratios : Prob (Disease | risk allele)~ 1.2 -1.5 Prob (Not Disease | risk allele)

Early successes Klein R et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005 Apr 15; 308:385-9. –96 cases and 50 controls; 116,204 SNPs Maraganore et al. High-resolution whole- genome association study of Parkinson disease. Am J Hum Genet. 2005 Nov; 77:685-93. –198,345 SNPs in 443 sibling pairs discordant for PD –1,793 PD-associated SNPs (P<.01 in tier 1) and 300 genomic control SNPs in 332 matched case-unrelated control pairs

The future Identify additional genes in diverse populations Identify causal variant/s in these genes Determine function of novel genes, and function of causative variants Explore gene x gene interactions (epistasis) and gene x environment interactions (e.g. physical activity, diet) Other technological advances: –Animal models of disease –Innovative imaging of target tissues –Functional approaches to gene expression profiling –Whole genome sequencing? Era of “personalized medicine”&/or prevention

END Questions?

Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel: 982-0368.

Similar presentations

Presentation on theme: "Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel: 982-0368."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel: 982-0368.

Similar presentations

Presentation on theme: "Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel: 982-0368."— Presentation transcript:

Similar presentations

About project

Feedback