Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel: 982-0368.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

What is an association study? Define linkage disequilibrium
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Genetic Analysis in Human Disease
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Mapping Genes for SLE: A Paradigm for Human Disease? Stephen S. Rich, Ph.D. Department of Public Health Sciences Wake Forest University School of Medicine.
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Office hours Wednesday 3-4pm 304A Stanley Hall. Fig Association mapping (qualitative)
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Introduction to Molecular Epidemiology Jan Dorman, PhD University of Pittsburgh School of Nursing
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Understanding Genetics of Schizophrenia
Genes, Environment and Traits
Chapter 7 Multifactorial Traits
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Multifactorial Traits
Process of Genetic Epidemiology Migrant Studies Familial AggregationSegregation Association StudiesLinkage Analysis Fine Mapping Cloning Defining the Phenotype.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Molecular & Genetic Epi 217 Association Studies
CS177 Lecture 10 SNPs and Human Genetic Variation
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Hunting: Linkage and Association
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
An quick overview of human genetic linkage analysis
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Genome wide association studies (A Brief Start)
The International Consortium. The International HapMap Project.
Unit 9: Genetic Epidemiology. Unit 9 Learning Objectives: 1. Understand characteristics, uses, strengths, and limitations of genetic epidemiology study.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
An quick overview of human genetic linkage analysis Terry Speed Genetics & Bioinformatics, WEHI Statistics, UCB NWO/IOP Genomics Winterschool Mathematics.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
1 Seminar 4: Applied Epidemiology Kaplan University School of Health Sciences.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Quantitative genetics
Genomic Analysis: GWAS
Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.
Recombination (Crossing Over)
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Power to detect QTL Association
Genome-wide Associations
Chapter 7 Multifactorial Traits
Medical genomics BI420 Department of Biology, Boston College
Lecture 9: QTL Mapping II: Outbred Populations
Medical genomics BI420 Department of Biology, Boston College
Presentation transcript:

Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel:

Genetic epidemiology “A science which deals with the etiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations.” Newton E. Morton, 1982

Model for Complex Diseases + = Disease Susceptibility

Trait 2 Trait 1 Trait 3 Disease Gene 4 Gene 1 Gene 2 Gene 3 Environment 1 Environment 2 Genetics of a Complex Disease

“Monogenic” vs “Complex” Disease MendelianComplex 1 or small # of genesMany Often etiologicSusceptibility / molecular (severe phenotype)pathology ? Highly penetrantModest penetrance High Odds RatioModest/Low Odds Ratio Strong selection => Weak/No selection => Low frequency/RareHigh frequency/Common Coding SequenceNon-coding/regulation (?)

Overall steps for disease gene identification Is there a genetic component? Study design Measurement of phenotype Molecular analysis Functional analysis

Is there a genetic component? Twin studies Familial aggregation Segregation analysis Race/ethnicity differences

Twin studies Comparison of monozygotic (MZ) pairs (who share all their genome) with dizygotic (DZ) twins (who share half of their genome in common on average, same as sibs) The greater similarity of MZ twins than DZ twins is considered evidence of genetic factors The pairwise (Pr) concordance (Pr) is the proportion of affected pairs that are concordant for the disease. –The proportion of twin pairs with both twins affected of all ascertained twin pairs with at least one affected –Pr=C/(C+D), where C is the number of concordant pairs and D is the number of discordant pairs. The probandwise (Cc) concordance is the proportion of affected individuals among the co-twins of previously ascertained index cases. –Allows for double counting of doubly ascertained twin pairs and is interpretable as the recurrence risk in a co-twin of an affected individual –Cc=2C/(2C+D) In theory, complete genetic determination of a disease would equate to MZ twins having 100% concordance and DZ twins having 50% concordance

Twin studies - assumptions Random mating No interactions between genes and environment Equivalent environments for MZ and DZ twins

Concordance rates for some traits

Other types of twin studies Twins discordant for disease have been used to examine possible environmental causes. Adoption studies also permit the separation of childhood rearing effects from genetic effects by studying the similarity of adopted children with their biological and adoptive parents

Bouchard et al. Science Oct 12;250(4978):223-8.

Familial aggregation Sibling risk relative ratio s = risk to a sibling of person with disease of interest population risk

SNP Disease Variant & λ s in Diabetes Type 1Type 2 Prob of Disease (Sibling) 6%30-40% Prob of Disease (Unrelated) 0.4%7% λ s

Recurrence risks for multiple sclerosis in families Compston and Coles. Lancet. 2002;359(9313):

Segregation analysis Determines which specific model (genetic or environmental) best fits the familial aggregation E.g. –Major gene or many genes (polygenic)? –Dominant, additive, recessive inheritance?

Differences in prevalence across race/ethnicities

Time trends in the percentage of African American adolescents and adults who were overweight,

Adiposity However, rates of type 2 diabetes in African Americans still higher than Caucasian Americans after controlling for age, adiposity, and socio-economic status Other factors must be involved

Epidemiological study designs

Study designs Case series: what clinicians see Case-control: compare people with and without a disease Cohort: follow people over time to see who gets the disease Randomized controlled trial (RCT)

Other terms Retrospective vs. prospective Cross-sectional vs. longitudinal

Measurement of Phenotype What is the phenotype? –e.g., diabetes, fasting glucose, oral glucose tolerance test How is it diagnosed? –Physician’s diagnosis, clinical measurements, questionnaire How objective are the phenotypes? –Physician’s diagnosis – somewhat variable –Clinical measurements – most pretty good *The more defined the phenotype, the easier to find the gene(s) that controls it

Additional consideration in genetic studies: families or unrelated individuals?

Patient ascertainment Sib-pair: Families: Case-control: ? ? ? ? or

Molecular/Analytical approaches Linkage –Families Association –Candidate gene –Genome wide –Generally case-control –There are family-based approaches

Effect and frequency of risk alleles dictate strategies Linkage studies Association studies Unlikely to exist Frequency in population Magnitude of effect Unlikely to be found

Linkage analysis Linkage = The proximity of two or more markers on a chromosome Linkage analysis is a statistical method for detecting linkage between a disease and markers of known location by following their inheritance in families Uses recombination to define genomic interval likely to contain gene/s Single large pedigree or multiple small pedigrees

Linkage analysis Works well for Mendelian traits and more highly penetrant diseases Low resolution = fewer markers needed and resilient to allelic heterogeneity Apparently high Type I error rate for complex/non-Mendelian diseases – more loci, common variants, high phenocopy rate, and lower penetrance Large pedigrees better for rare alleles – more likely to segregate the allele Large pedigrees increase the probability of parental heterozygosity for frequent alleles Most likely to detect intermediate frequency alleles Strong pedigree signal may reflect rare Mendelian forms of complex disease –eg BRCA1 & BRCA2 mutations in breast and ovarian cancer

Genotype markers across the genome Illumina's Linkage IVb Panel: >6,000 SNPs

LOD score LOD score = Log of the Odds of linkage = log 10 Likelihood of linkage = log 10 L(  <0.5) The closer two markers are to each other, the lower the odds of a recombination (crossing over event) occurring between them in meiosis. Likelihood of not being linked L(  =0.5)

Linkage analysis Is there cosegregation of a chromosomal region with the phenotype?

Linkage analysis

Is there cosegregation of a chromosomal region with the phenotype? Add additional markers to region Add additional families to study

Association study Best power for common variants of modest - low effect size Search for specific genetic differences distinguishing cases from controls Cases Controls

Cross-sectional - no follow-up More efficient recruitment than families - easy to ascertain and recruit Easy to analyze Statistical power compared to family-based linkage Cases and controls must be well- matched - Drawn from same population - Randomize non-genetic confounder factors At risk for type 1 errors if incomplete matching (stratification) Case - Control is the most popular study design for complex disease genetics: X X

Genome-wide association studies (GWAS): A paradigm shift in human genetics

How can we use SNPs to find diabetes genes? Genome-Wide Association Study (GWAS) –Examination of variation across the entire human genome to identify genetic correlations with the presence or absence of diabetes Two groups: cases (have diabetes) vs. controls (don’t have diabetes) Each participant’s genome is surveyed for markers of genetic variation (SNPs) Groups compared to determine specific genetic differences between the two groups

GWAS approach Does not assume knowledge of genes/biology Investigate markers evenly spaced along genome Investigate association: Joint occurrence of two alleles (e.g. disease allele and marker allele) in a population > expected frequency

Why are GWAS now feasible? SNP identification efforts  more SNPs in databases Understanding of linkage disequilibrium in the human genome (HapMap project)  fewer “tagSNPs” to genotype Lower cost of genotyping platforms

Products now use >1 million SNPs!

Pairwise tagging Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 high r 2 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA After Carlson et al. (2004) AJHG 74:106

Use of haplotypes can improve genotyping efficiency AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA A CCCCCC A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 Tags: SNP 1 SNP 3 2 in total Test for association: SNP 1 captures 1+2 SNP 3 captures 3+5 “AG” haplotype captures SNP 4+6

Efficiency and power Relative power (%) Average marker density (per kb) tag SNPs random SNPs P.I.W. de Bakker et al. (2005) Nat Genet ~300,000 tag SNPs needed to cover common variation in whole genome in CEU

Genotyping platform: Illumina Cost: 370 Duo $ Y $ M $

Genotyping platform: Affymetrix

Completeness of dbSNP Vast majority of common SNPs are contained in or highly correlated with a SNP in dbSNP Nature 437, YRI CEU CHN+JPT

Comparison of coverage Paul de Bakker, pers. comm.

Association Studies Detect genes/genomic regions associated with a disease through allelic associations in case-control studies –Causal variants are associated with disease phenotype –Linked neutral variants are associated with the disease phenotype through LD with the causal variant Younger disease variants (rarer variant) –LD around the variant is stronger = better power –Associated region containing variant is broad (low genome resolution) Older disease variants (common variant) –Weaker LD = worse power –Better association map resolution

Phenotype-genotype association Marker associated with disease could be: 1. False positive result (type 1 error) 2. Co-inherited with a true causative (functional) variant 3. A true functional or causative variant

Replication and follow-up Many analytical tests – high probability of false positives Replicate in additional studies (often requires cross- study collaboration) Map the causal variant –Denser marker map –Evidence for other variants in the same gene with (perhaps smaller) independent effects (allelic heterogeneity) –Haplotype analysis –Resequencing –Sequence / genome mapping bioinformatics to identify or predict genome features in the linkage disequilibrium region of the map SNP –Expression or reporter assays

Common Disease Common Variant Hypothesis (CDCV) Genetic risk for common diseases (diabetes, CHD, hypertension, schizophrenia, asthma,..) results from common variants/polymorphisms in multiple genes The effects for each gene variant must be smaller than in monogenic disorders otherwise the prevalence of the diseases would be very high Since SNPs are a common mode of variation in the human genome - and coding SNPs lead to mongenic diseases - SNPs may be the variants that are associated with risk for common diseases

Do the Common Disease Variants Code ? Not necessarily (or usually ?) Protein coding SNPs (cSNPs) may disrupt protein fold, structure, activity Variants in Mendelian diseases with high penetrance (50-100% penetrance) often disrupt proteins But common diseases are not penetrant to the same level - genetic odds ratios : Prob (Disease | risk allele)~ Prob (Not Disease | risk allele)

Early successes Klein R et al. Complement factor H polymorphism in age-related macular degeneration. Science Apr 15; 308: –96 cases and 50 controls; 116,204 SNPs Maraganore et al. High-resolution whole- genome association study of Parkinson disease. Am J Hum Genet Nov; 77: –198,345 SNPs in 443 sibling pairs discordant for PD –1,793 PD-associated SNPs (P<.01 in tier 1) and 300 genomic control SNPs in 332 matched case-unrelated control pairs

The future Identify additional genes in diverse populations Identify causal variant/s in these genes Determine function of novel genes, and function of causative variants Explore gene x gene interactions (epistasis) and gene x environment interactions (e.g. physical activity, diet) Other technological advances: –Animal models of disease –Innovative imaging of target tissues –Functional approaches to gene expression profiling –Whole genome sequencing? Era of “personalized medicine”&/or prevention

END Questions?