Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ho Kim School of Public Health Seoul National University

Similar presentations


Presentation on theme: "Ho Kim School of Public Health Seoul National University"— Presentation transcript:

1 Ho Kim School of Public Health Seoul National University
SNP과 Haplotype 분석 소개 Ho Kim School of Public Health Seoul National University

2 Contents SNP (Single Nucleotide Polymorphism) Haplotypes
Linkage & Linkage disequilibrium Association study design SNP vs. Haplotype for association study Haplotype estimation Data analysis

3 SNPs (pronounced snips)

4 Mutation

5 Polymorphism – Definition
A sequence variation that occurs at least 1 percent of the time (> 1%) 90% of variations are SNPs Mutation If the variation is present less than 1 percent of the time (<= 1%)

6 SNPs in the Human Genome
All humans share 99.9% the same genetic sequence SNPs occur about every 1000 base pairs The human genome contains more than 2 million SNPs ~21,000 SNPs are found in genes SNPs are not evenly spaced along the sequence SNP-rich regions SNP-poor regions

7 SNPs as DNA Landmarks Help in DNA sequencing
Help in the discovery of genes responsible for many major diseases: asthma, diabetes, heart disease, schizophrenia and cancer among others

8 From SNP to Haplotype Phenotype Black eye GATATTCGTACGGA-T Brown eye
GATGTTCGTACTGAAT GATATTCGTACGGAAT SNP Phenotype Black eye Brown eye Blue eye AG- 2/6 GTA 3/6 AGA 1/6 Haplotypes SNP Simple to measure & understand Haplotype have the advantage in the appropriate circumstances of carrying more information about the genotype-phenotype link than do the underlying SNPs. DNA Sequence

9 SNP & Haplotype SNP: Single Nucleotide Polymorphism
Haplotype: A set of closely linked genetic markers present on one chromosome which tend to be inherited together (not easily separable by recombination). G A C Set of SNP polymorphisms: a SNP haplotype

10 Linkage and Linkage Disequilibrium (1)
Linkage: the tendency of genes or other DNA sequences at specific loci to be inherited together as a consequence of their physical proximity on a single chromosome. Linkage disequilibrium (allelic association): particular alleles at two or more neighboring loci show allelic association if they occur together with frequencies significantly different from those predicted from the individual allele frequencies. Linkage is a relation between loci, but association is a relation between alleles.

11 Linkage and Linkage Disequilibrium (2)
( = recombination fraction) No linkage:  = 0.5 Perfect linkage:  = 0 Linkage disequlibrium: 0   1 ( = probability of allelic association) Linkage equilibrium:  = 0 Complete linkage disequilibrium:  = 1

12 Allelic Association (LD) Morton et al. (2001)
Locus B Locus A Allele 1 Allele 2 Allele frequency Allele 1 Allele 2 Allele frequency 1 A, B: diallelic loci; 11, 12, 21, 22: haplotypes; : association probability

13 Measures of LD Covariance D = | 11 22 - 12 21 | Association
 = D/Q(1-R) All other measures are functions of Q, R, .

14 New Findings on Linkage Disequilibrium
In the chromosome, there are blocks of limited haplotype diversity in which more than 80% of a global human sample can typically be characterized by only three common haplotypes (Patil et al., Science 2001). Haplotype blocks are the more precise units to reflect genetic variation. Identification of haplotype structure, i.e., construction of a haplotype map, provides a basis for accurate and efficient association studies.

15 Daly et al. (2001). LD by distance from two markers

16 The Problem It’s not yet easy to measure an individual’s (only two) haplotypes Molecular haplotyping (nucleotide sequencing) is the gold standard A more efficient strategy: Focus on regions, such as certain genes Estimate haplotypes from SNP data (genotypes) Use LD map, and reduce the number of loci to represent the haplotype Use haplotype map (DB) = key SNPs + haplotype blocks with strong LD

17 Haplotyping: Phase Problem
C SNP1 SNP2 Diploid Observed: SNP1 G/T SNP2 A/C Possible Haplotypes: GA, TC or GC, TA n SNPs  2n possible haplotypes

18 Molecular Haplotyping
Hetero-duplex analysis, mismatch detection, allele-specific PCR: Have potential to get high-throughput Only practical for short haplotypes (2-5 kb vs kb) Costly Rolling Circle amplification method, etc: Can handle larger size Difficult to automate

19 In-silico Haplotyping
Alias: Haplotype Reconstruction, Haplotype Inference, Computational Haplotyping, Statistical Haplotyping, etc. Advantages: Cost effective High-throughput Difficulty: Phase Ambiguity: Haplotypes increase exponentially with SNPs

20 In-silico Haplotyping: Two Tasks
Reconstruction of the haplotypes of the sampled individuals II. Estimation of haplotypes frequencies in a population

21 In-silico Haplotyping: Approaches
Clark’s algorithm E-M algorithm (expectation-maximization algorithm) Bayesian algorithm

22 Clark’s Algorithm 1) Find Homozygotes or heterozygotes at one locus
SNP1 T T SNP2 A A SNP3 C C T-A-C Unambiguously defined SNP1 T T SNP2 A A SNP3 C G T-A-C T-A-G

23 Clark’s Algorithm 2) Try to solve ambiguous haplotype as a combination of solved ones SNP1 A T SNP2 A A SNP3 C G T-A-C : solved one A-A-G …………………………… Continue until either all haplotypes have been solved or until no more haplotypes can be found in this way

24 Clark’s Algorithm problems
No homozygotes or single SNP heterozygotes -> chain might never get started Many unsolved haplotypes left at the end Quite useful in practice !!

25 EM Algorithm Use multinomial likelihood with HWE Pr(AT//AA//CG)
=pr(AAC/TAG)+pr(AAG/TAC) =pr(AAC)pr(TAG)+pr(AAG)pr(TAC) Falling and Schork(2000) showed that EM is better than Clark’s algorithm

26 A Gibbs sampler, Stephens et al (2001)
G=(G1, …, Gn) observed multilocus genotype freq H=(H1, …, Hn) unknown haplotype pairs F=(F1, …, FM) M unknown pop’n hap freq Choose individual i from all ambiguous individuals Sample Hi(t+1) from pr(Hi|g,H-i(t)) Set Hj(t+1)=Hj(t) for j=1,2,…,i-1,i+1,…n

27

28 Haplotype Inference A: SNP data: 0 (MM), 1 (Mm), 2 (mm) for a single locus B: Haplotype data: 0(M), 1 (m) for a single locus

29 #1 1, 2 00000 00100 #2 1, 3 00010 #3 1, 4 01001 #4 1, 5 00001 #5 1, 1 #6 1, 1

30 An Example Data 169 cases, 231 controls 11 haplotypes
sex, age information

31

32 Logistic Regression Results
Without adjusting for age, sex: Haplotype 7 is most strongly associated, but not statistically significant (p=0.07) Adjusting for age, sex: Haplotype 11 is most strongly associated (p=0.03) Slightly stronger association with accounting for repeated measures (2 haplotypes per person) by GEE procedure (p=0.02)

33 Other Examples

34 Drysdale et al. PNAS 2000, 97(19) 10483–10488

35

36

37 Wallenstein, Hodge, and Weston, Genetic Epidemiology 15:173–181 (1998)

38 Cohort study Case-control study

39 Shaw et al. Am J of Medical Genet 114 205-213 (2002)

40 References Clark (1990). Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Bio Evol 7: Escoffier and Slatkin (1995). Maximum likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Bio Evol 12: Stephens, Smith, and Donnelly (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68, Niu, Qin, Xu and Liu (2002) Bayesian haplotype inference for multiple linked single-nucleotide ploymorphisms. Am J Hum Genet 70;

41 Thank you ! This file is available at /~hokim 열린 강의실, 세미나자료


Download ppt "Ho Kim School of Public Health Seoul National University"

Similar presentations


Ads by Google