Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Human Genome Research Institute

Similar presentations


Presentation on theme: "National Human Genome Research Institute"— Presentation transcript:

1 Genetics for Epidemiologists Lecture 2: Measurement of Genetic Exposures
National Human Genome Research Institute U.S. Department of Health and Human Services National Institutes of Health National Human Genome Research Institute National Institutes of Health Teri A. Manolio, M.D., Ph.D. Director, Office of Population Genomics and Senior Advisor to the Director, NHGRI, for Population Genomics U.S. Department of Health and Human Services

2 Topics to be Covered Measuring genetic variation Blood group markers
Restriction-fragment length polymorphisms Variable number of tandem repeats (VNTRs, minisatellites and microsatellites) Single nucleotide polymorphisms (SNPs) Linkage disequilibrium (LD) Familial resemblance and family history

3 Larson, G. The Complete Far Side. 2003.

4 Measuring Genetic Variation: Blood Group and Enzymatic Markers
RBC COMT activity measured in 5 large families with hypertension (total 518 individuals) Associations tested with 25 genetic markers: ABO, Rh, K, MNS, P, Fy, Jk, PGD, ADA, ACP1, PGM1, HBB, GPT, C3, HPA, TF, GC, OR, GM, KM, BF, ESD, GLO1, Le Lod score of 1.27 and estimated recombination fraction of 0.1 found for phosphogluconate dehydrogenase (PGD) Am J Med Genet 1984; 19:

5 Restriction Fragment Length Polymorphisms (RFLPs)
Define polymorphic marker loci that can be detected as differences in length of DNA fragments after digestion with DNA sequence-specific endonucleases Establish linkage relationships using pedigree analysis Am J Hum Genet 1980; 32:

6 Restriction Fragment Length Polymorphisms (RFLPs)
Since the RFLPs are being used simply as genetic markers, any trait… segregating in a pedigree can be mapped. Such a procedure would not require any knowledge of the biochemical nature of the trait or of the nature of the alterations in the DNA responsible for the trait. Am J Hum Genet 1980; 32:

7 RFLPs Used to Map Neurofibromatosis
      RFLPs Used to Map Neurofibromatosis Linkage analysis of 15 Utah kindreds showed that a gene responsible for von Recklinghausen neurofibromatosis (NF) is located near the centromere on chromosome 17 Science 1987; 236:

8 RFLPs Used to Map Neurofibromatosis
      RFLPs Used to Map Neurofibromatosis Cosegration of NF with the A2 (1.9 kb) allele and not A1 (2.4kb) in each of four affected offspring. Science 1987; 236:

9 Variable Numbers of Tandem Repeats (VNTRs): Minisatellites
Repetition in tandem of a short (6- to 100-bp) motif spanning 0.5 kb to several kb Opened the way to DNA fingerprinting for individual identification Provided the first highly polymorphic, multiallelic markers for linkage studies Associated with many interesting features of human genome biology and evolution Well-known minisatellite is 5.5kb, kringle IV repeat in apolipoprotein(a) and plasminogen Vernaud G and Denoued F, Genome Res 2000; 10:

10 Kringle-IV Encoding Sequences of Human apo(a) cDNA ApoA1 Alleles
Lackner et al, Hum Mol Genet 1993; 2:

11 Correlations of ApoA Molecular Weight with Lp(a) Levels and Number of Kringle-IV Repeats
Gavish et al, J Clin Invest 1989; 84:

12 Simple Sequence Repeats (also “VNTRs”): Microsatellites
Repetition in tandem of a short (2- to 6-bp) motif from 5-5,000 times Most are di-, tri-, and tetra-nucleotide repeats repeated times Most are highly polymorphic making them enormously useful for mapping and linkage Marshfield and similar maps placed ~400 microsatellites across genome, provided primers for analysis Could be highly automated: NHLBI and CIDR large-scale genotyping services

13 Multipoint LOD Scores for Long-term SBP and DBP on Chromosome 17
Levy et al, Hypertension 2000;36:

14 Larson, G. The Complete Far Side. 2003.

15 ~ 10 million across genome
Single Nucleotide Polymorphisms (SNPs) GAAATAATTAATGTTTTCCTTCCTTCTCCTATTTTGTCCTTTACTTCAATTTATTTATTTATTATTAATATTATTATTTTTTGAGACGGAGTTTC/ACTCTTGTTGCCAACCTGGAGTGCAGTGGCGTGATCTCAGCTCACTGCACACTCCGCTTTCCTGGTTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGACTACAGTCACACACCACCACGCCCGGCTAATTTTTGTATTTTTAGTAGAGTTGGGGTTTCACCATGTTGGCCAGACTGGTCTCGAACTCCTGACCTTGTGATCCGCCAGCCTCTGCCTCCCAAAGAGCTGGGATTACAGGCGTGAGCCACCGCGCTCGGCCCTTTGCATCAATTTCTACAGCTTGTTTTCTTTGCCTGGACTTTACAAGTCTTACCTTGTTCTGCC/TTCAGATATTTGTGTGGTCTCATTCTGGTGTGCCAGTAGCTAAAAATCCATGATTTGCTCTCATCCCACTCCTGTTGTTCATCTCCTCTTATCTGGGGTCACA/CTATCTCTTCGTGATTGCATTCTGATCCCCAGTACTTAGCATGTGCGTAACAACTCTGCCTCTGCTTTCCCAGGCTGTTGATGGGGTGCTGTTCATGCCTCAGAAAAATGCATTGTAAGTTAAATTATTAAAGATTTTAAATATAGGAAAAAAGTAAGCAAACATAAGGAACAAAAAGGAAAGAACATGTATTCTAATCCATTATTTATTATACAATTAAGAAATTTGGAAACTTTAGATTACACTGCTTTTAGAGATGGAGATGTAGTAAGTCTTTTACTCTTTACAAAATACATGTGTTAGCAATTTTGGGAAGAATAGTAACTCACCCGAACAGTG/TAATGTGAATATGTCACTTACTAGAGGAAAGAAGGCACTTGAAAAACATCTCTAAACCGTATAAAAACAATTACATCATAATGATGAAAACCCAAGGAATTTTTTTAGAAAACATTACCAGGGCTAATAACAAAGTAGAGCCACATGTCATTTATCTTCCCTTTGTGTCTGTGTGAGAATTCTAGAGTTATATTTGTACATAGCATGGAAAAATGAGAGGCTAGTTTATCAACTAGTTCATTTTTAAAAGTCTAACACATCCTAGGTATAGGTGAACTGTCCTCCTGCCAATGTATTGCACATTTGTGCCCAGATCCAGCATAGGGTATGTTTGCCATTTACAAACGTTTATGTCTTAAGAGAGGAAATATGAAGAGCAAAACAGTGCATGCTGGAGAGAGAAAGCTGATACAAATATAAAT/GAAACAATAATTGGAAAAATTGAGAAACTACTCATTTTCTAAATTACTCATGTATTTTCCTAGAATTTAAGTCTTTTAATTTTTGATAAATCCCAATGTGAGACAAGATAAGTATTAGTGATGGTATGAGTAATTAATATCTGTTATATAATATTCATTTTCATAGTGGAAGAAATAAAATAAAGGTTGTGATGATTGTTGATTATTTTTTCTAGAGGGGTTGTCAGGGAAAGAAATTGCTTTTT SNPs 1 / 300 bases ~ 10 million across genome 2110 SNPs

16 Mapping the Relationships Among SNPs
Christensen and Murray, N Engl J Med 2007; 356:

17 Chromosome 9p21 Region Associated with MI
Samani N et al, N Engl J Med 2007; 357:

18 Distances Among East Coast Cities
Boston Provi-dence New York Phila-delphia Balti-more Providence 59 210 152 Philadelphia 320 237 86 Baltimore 430 325 173 87 Washington 450 358 206 120 34

19 Distances Among East Coast Cities
Boston Provi-dence New York Phila-delphia Balti-more Providence 59 210 152 Philadelphia 320 237 86 Baltimore 430 325 173 87 Washington 450 358 206 120 34 < 100 > 400

20 Distances Among East Coast Cities
Boston Provi-dence New York Phila-delphia Balti-more Providence 59 210 152 Philadelphia 320 237 86 Baltimore 430 325 173 87 Washington 450 358 206 120 34 < 100 > 400

21 Distances Among East Coast Cities
Boston Provi- dence New York Phila- delphia Balti- more Wash- ington

22 Distances Among East Coast Cities
Boston Provi-dence New York Phila-delphia Balti-more Wash-ington

23 One Tag SNP May Serve as Proxy for Many
} One Tag SNP May Serve as Proxy for Many } Block 1 Block 2 SNP1 CAGATCGCTGGATGAATCGCATCTGTAAGCAT CGGATTGCTGCATGGATCGCATCTGTAAGCAC CAGATCGCTGGATGAATCCCATCAGTACGCAT CGGATTGCTGCATGGATCCCATCAGTACGCAT CGGATTGCTGCATGGATCCCATCAGTACGCAC   SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8

24 One Tag SNP May Serve as Proxy for Many
} One Tag SNP May Serve as Proxy for Many } Block 1 Block 2 SNP1 CAGATCGCTGGATGAATCGCATCTGTAAGCAT CGGATTGCTGCATGGATCGCATCTGTAAGCAC CAGATCGCTGGATGAATCCCATCAGTACGCAT CGGATTGCTGCATGGATCCCATCAGTACGCAT CGGATTGCTGCATGGATCCCATCAGTACGCAC % SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8

25 One Tag SNP May Serve as Proxy for Many
} One Tag SNP May Serve as Proxy for Many } Block 1 Block 2 CAGATCGCTGGATGAATCGCATCTGTAAGCAT CGGATTGCTGCATGGATCGCATCTGTAAGCAC CAGATCGCTGGATGAATCCCATCAGTACGCAT CGGATTGCTGCATGGATCCCATCAGTACGCAT CGGATTGCTGCATGGATCCCATCAGTACGCAC % SNP3 SNP5 SNP6 SNP7 SNP8

26 One Tag SNP May Serve as Proxy for Many
} One Tag SNP May Serve as Proxy for Many } Block 1 Block 2 CAGATCGCTGGATGAATCGCATCTGTAAGCAT CGGATTGCTGCATGGATCGCATCTGTAAGCAC CAGATCGCTGGATGAATCCCATCAGTACGCAT CGGATTGCTGCATGGATCCCATCAGTACGCAT CGGATTGCTGCATGGATCCCATCAGTACGCAC % SNP3 SNP6 SNP8

27 One Tag SNP May Serve as Proxy for Many
Block 1 Block 2 Singleton Frequency GTT % CTC % GTT % GAT % CAT % CAC %  other haplotypes %

28 Pair-Wise Linkage Disequilibrium (LD) Measures
Name Symbol Definition "Lewontin's D" D pABpab – pAbpaB "D prime" D' D / max (D) Correlation ("r-squared") r2 D2 / pApapBpb For a discussion and comparison of these LD measures, see Devlin B, Risch N, Genomics 1995; 29: Courtesy K. Jacobs, NCI

29 Two Measures of LD: D' and r2
D' varies from 0 (complete equilibrium) to 1 (complete disequilibrium) When D' = 0, typing one SNP provides no information on the other SNP D' does not adequately account for allele frequencies; r2 is correlation between SNPs, is preferred measure When r2 = 1, two SNPs are in perfect LD; allele frequencies are identical for both SNPs, and typing one SNP provides complete information on the other

30 What can LD do for me? Knowledge of patterns of LD can be quite useful in the design and analysis of genetic data Design: Estimation of theoretical power to detect associations Evaluation of degree of completeness of sampling of genetic variants Choice of most informative genetic variants to genotype Sample size increases by ~1/r2 to achieve same power to detect association with SNP2 as SNP1 Courtesy K. Jacobs, NCI

31 Association Signal for Coronary Artery Disease on Chromosome 9
Samani N et al, N Engl J Med 2007; 357:

32 Region of Chromosome 1 Showing Strong Association with Inflammatory Bowel Disease
Duerr R et al. Science 2006; 314:

33 LD Patterns in TCF7L2 Association Region
Grant et al, Nat Genet 2006; 38:

34 LD in Three HapMap Populations
International HapMap Consortium, Nature 2005; 437:

35 A HapMap for More Efficient Association Studies: Goals
Use just the density of SNPs needed to find associations between SNPs and diseases Do not miss chromosomal regions with disease association Produce a tool to assist in finding genes affecting health and disease Ancestral populations differ in their degree of LD; recent African ancestry populations are older and have shorter stretches of LD, need more SNPs for complete genome coverage

36 SNPs as Gateway to Genome-Wide Association (GWA) Studies
SNPs much more numerous than other markers and easier to assay Genome-wide studies attempt to capture majority of genomic variation (10M SNPs!) Variation inherited in groups, or blocks, so not all 10 million points have to be tested Blocks are shorter (so need to test more points) the less closely people are related SNP technology allows studies in unrelated persons, assuming 5kb – 10kb lengths in common (300,000 – 1,000,000 markers)

37 International HapMap Consortium, Nature 2005; 437:

38 International HapMap Consortium, Nature 2007; 449:

39 Progress in Genotyping Technology
102 ABI TaqMan ABI SNPlex 10 Cost per genotype (Cents, USD) Illumina Golden Gate Affymetrix MegAllele Affymetrix 10K Illumina Infinium/Sentrix Perlegen 1 Affymetrix 100K/500K Nb of SNPs 1 10 102 103 104 105 106 2001 2005 Courtesy S. Chanock, NCI

40 Continued Progress in Genotyping Technology
Affymetrix 500K Illumina 550K Illumina 650Y Illumina 317K Cost per person (USD) July 2005 Oct 2006 Courtesy S. Gabriel, Broad/MIT

41 Cost of a Genome-Wide Association Study in 2,000 People
Year Number of SNPs Cost/SNP Cost/Study

42 Cost of a Genome-Wide Association Study in 2,000 People
Year Number of SNPs Cost/SNP Cost/Study 2001

43 Cost of a Genome-Wide Association Study in 2,000 People
Year Number of SNPs Cost/SNP Cost/Study 2001 10,000,000

44 Cost of a Genome-Wide Association Study in 2,000 People
Year Number of SNPs Cost/SNP Cost/Study 2001 10,000,000 $1.00

45 Cost of a Genome-Wide Association Study in 2,000 People
Year Number of SNPs Cost/SNP Cost/Study 2001 10,000,000 $1.00 $20 billion

46 Cost of a Genome-Wide Association Study in 2,000 People
Year Number of SNPs Cost/SNP Cost/Study 2001 10,000,000 $1.00 $20 billion 2008

47 Cost of a Genome-Wide Association Study in 2,000 People
Year Number of SNPs Cost/SNP Cost/Study 2001 10,000,000 $1.00 $20 billion 2008 1,000,000

48 Cost of a Genome-Wide Association Study in 2,000 People
Year Number of SNPs Cost/SNP Cost/Study 2001 10,000,000 $1.00 $20 billion 2008 1,000,000 0.05¢

49 Cost of a Genome-Wide Association Study in 2,000 People
Year Number of SNPs Cost/SNP Cost/Study 2001 10,000,000 $1.00 $20 billion 2008 1,000,000 0.05¢ $1 million

50 HapMap population sample
Coverage (% SNPs tagged at r2 > 0.8) of Commercial Genotyping Platforms HapMap population sample Platform YRI CEU CHB+JPT Affymetrix GeneChip 500K 46 68 67 Affymetrix SNP Array 6.0 66 82 81 Illumina HumanHap300 33 77 63 Illumina HumanHap550 55 88 83 Illumina HumanHap650Y 89 84 Perlegen 600K 47 92 Equal weight to pedigrees Expect all to be zero if no genetic contribution Manolio et al, J Clin Invest 2008; 118:

51 Following the Polymorphism Literature
Sometimes named for: amino acid change (AGT M235T) nucleotide sequence (AGTR1 A1166C) promoter (AGT -6 G/A) restriction enzyme site (XbaI, PvuII, HindIII) gene product (APOE*e2) legacy system (DRB1*0104) reference SNP (rs709932) or submitted SNP (ss ) Good sources for information: OMIM, HUGO, dbSNP, UCSC Genome Browser Courtesy S. Chanock, NCI

52 Other Genomic Technologies
Sequencing: measure variation at every point in gene or candidate region in dozens to hundreds of people to find functional variants Gene expression: measure changes in mRNA (transcribed) in cases and controls or in response to stimulation Epigenetics: measure DNA methylation or histone deacetylation that turns genes on and off

53 Sidney Harris, http://www.sciencecartoonsplus.com/gallery.htm.

54 Summary Points: Genotyping Methods
Unbelievably rapid progress from small number of blood group markers to >10M SNPs, CNVs, structural variants, sequence variants Technology will continue to change and will be challenge to keep up with; difficult to know when ready to apply to population studies SNPs are currently the dominant technology (more to come in Lecture 4) Quality control is a major issue

55 Familial Resemblance?

56 Evidence for Genetic Influence on Disease or Trait from Family Data
Familial resemblance: trait more similar among related than unrelated persons Familial clustering: risk of disease in relative of case > risk in relative of non-case or of general population; (sibling relative risk, Risch's λS) Distributions of continuous trait: mixtures of distributions or commingling analysis

57 Sibling Relative Risk of Living to Age 90 Centenarians vs
Sibling Relative Risk of Living to Age 90 Centenarians vs. Those Dying at Age 73 102 centenarians (10 men, 92 women) in eastern Massachusetts, 77 controls (28 men, 49 women) born in 1896 but died age 73 centenarians had 456 sibs (233 men, 223 women), controls had 240 sibs (121 men, 119 women) RR higher after age 70 For any age after 65, siblings of centenarians had a 42·4% lower hazard of death (95% CI 0·334–0·538, p<0·0005). Perls TT et al, Lancet 1998; 351:1560.

58 Large Representative Pedigree Showing 69 Patients with Atrial Fibrillation
Arnar et al, Europ Heart J 2006; 27:

59 Strength of Extensive Genealogies
Common diseases do not show Mendelian inheritance patterns Affected siblings infrequent in common diseases, but many patients may have more distant relatives with same disease Degree of Relatives Risk Ratio [95% CI] P-Value 1 1.77 [1.67,1.88] < 0.001 2 1.36 [1.27,1.44] 3 1.18 [1.14,1.23] 4 1.10 [1.06,1.13] 5 1.05 [1.02,1.07] Arnar et al, Europ Heart J 2006; 27:

60 Familial Correlations
Phenotypic resemblance among relatives estimated by regression of one relative’s value (offspring), on that of another (parent): Yo = μ + β • [(Ym + Yf )/2] + ε Twice parent-offspring correlation is estimate of heritability If trait under genetic control, expect trait correlations among closer relatives to be greater than those among more distant relatives Spouse correlations are often used to “control” for the effects of shared environment, though they may be increased by assortative mating or decreased by gender differences

61 Familial Correlations of Sex-Specific LV Mass, Multiply-Adjusted
Relative Pair Pairs (n) Correlation Expected Spouse 855 0.05 Parent-offspring 662 0.15 0.5 Sibling 1,486 0.16 Avuncular 369 0.06 0.25 Equal weight to pedigrees Expect all to be zero if no genetic contribution after Post W et al, Hypertension 1997; 30:

62 Assessing Familial and Genetic Nature of a Phenotypic Trait: Heritability
Often designated as H, h2, or σ2G /σ2P Proportion of total inter-individual variation in the trait (σ2P) or phenotypic variation, attributable to genetic variation (σ2G) Population- and environment-specific parameter Its value, high or low, does not indicate role of genes in any specific individual Does allow one to predict expected degree of familial aggregation of a trait Traits with high heritability should prove fruitful in identifying trait-related genes

63 Genetic Basis of Familial Clustering of Plasma ACE Activity
Relative N Mean (u/L) Major Gene Effect % Variance Fathers 87 34.1 4.8 29 Mothers 30.7 4.0 Siblings 169 43.1 10.8 75 codominant major gene effect, twice as high in homozygotes as heterozygotes and in offspring as parents minor allele frequency 0.26; in HWE 7% homozygotes for minor, 38% heterozygotes, 55% homozygote wild Cambien F, et. al., Am J Hum Genet 1988; 43:

64 Estimated Heritability Explained by GWA Findings to Date
Estimated GWA σ2G Estimated Total σ2G Reference Height 3% 90% Weedon Nat Genet 2008 T2DM λs = 1.07 λs = 3.5 Zeggini/Scott Science 2007 CRP ? 10.5% 30-50% Reiner/Ridker Nat Genet 2008 Psoriasis ~1.3 OR λs = 4-11 Liu PLoS Genet 2008 NHGRI GWA Catalog,

65 Hardy-Weinberg Equilibrium
Occurrence of two alleles of a SNP in the same individual are two independent events Ideal conditions: random mating no selection (equal survival) no migration no mutation no inbreeding large population sizes gene frequencies equal in males and females)… If alleles A and a of SNP rs1234 have frequencies p and 1-p, expected frequencies of the three genotypes are: Freq AA = p2 Freq Aa = 2p(1-p) Freq aa = (1-p)2 After G. Thomas, NCI

66 Summary Points: Familial Clustering
Indicator of possible genetic influence May over-estimate genetic component due to poor assessment and adjustment for shared environment Methods include twin studies, parent-offspring correlation, “relative” relative risk, % variance explained Current genes for complex disease explain only tiny fraction of total heritability

67 Larson, G. The Complete Far Side. 2003.

68

69 Basic Definitions: Loci, Genes, Alleles
Locus: Place on a chromosome where a specific gene or set of markers resides Quantitative trait locus (QTL): a genetic factor believed to influence a quantitative trait such as blood pressure, lipoprotein levels, etc. Gene: Contiguous piece of DNA that can contain information to make or modify ‘expression’ of specific protein(s) Allele: A variant form of a DNA sequence at a particular locus on a chromosome Candidate gene: Gene believed to influence expression of complex phenotypes due to known biologic properties of their products After S. Chanock, NCI

70 Basic Definitions: Parts of a Gene
Exon: a DNA sequence that usually specifies the sequence of amino acids in translation Intron: an intervening DNA sequence removed from mRNA after transcription and thus does not encode protein in translation Splice site: Junction of intron and exon Promoter: region of DNA to which an RNA polymerase binds and initiates transcription - the promoter regulates gene expression by controlling the amount of mRNA transcribed Polymorphism: Variation in the sequence of DNA among individuals After S. Chanock, NCI

71 SNPs and Function: We know so little…
Majority are “silent” No known functional change Some alter gene expression/regulation Promoter/enhancer/silencer mRNA stability Small RNAs Some alter function of gene product Change sequence of protein Courtesy S. Chanock, NCI

72 SNPs within Genes Coding SNPs (cSNPs)
Synonymous: no change in amino acid previously termed “silent” but….. Can alter mRNA stability DRD2 (Duan et al 2002) Can alter speed of translation and protein folding MDR1 (Gottesman et al 2007) Nonsynonymous: changes amino acid (codon) conservative and radical Nonsense: insertion of stop codon Frameshift (insertion/deletion): Disrupts codon sequence, rare but disruptive After S. Chanock, NCI

73 SNPs Outside Genes Majority distributed throughout genome are “silent” (excellent as markers) Alter transcription Promoter, enhancer, silencer Regulate expression Locus control region, mRNA stability Most are assumed to be ‘silent hitchhikers’ No function by predictive models or analysis Courtesy S. Chanock, NCI

74 Sample Collection and Processing
Obtaining samples for DNA preparation whole blood, buffy coat sputum buccal cells serum, urine pathology specimens placenta, excreta, other Purifying and quantifying DNA Transformed lymphocytes Whole genome amplification (WGA) ‘Barcode’ individual DNAs (QC) After S. Chanock, NCI


Download ppt "National Human Genome Research Institute"

Similar presentations


Ads by Google