Ho Kim School of Public Health Seoul National University

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Linkage and Genetic Mapping
Introduction to Haplotype Estimation Stat/Biostat 550.
Manish Anand Nihar Sheth Jim Costello Univ. of Indiana
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
SNP Haplotype reconstruction Statistics 246, 2002, Week 14, Lecture 2 Not complete.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
MALD Mapping by Admixture Linkage Disequilibrium.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Population Genetics: SNPS Haplotype Inference Eric Xing Lecture.
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Gene Hunting: Linkage and Association
Informative SNP Selection Based on Multiple Linear Regression
Announcements: Proposal resubmission deadline 4/23 (Thursday).
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Copyright © 2004 Pearson Prentice Hall, Inc. Chapter 7 Multiple Loci & Sex=recombination.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
The Haplotype Blocks Problems Wu Ling-Yun
Simple-Sequence Length Polymorphisms
Single Nucleotide Polymorphisms (SNPs
Introduction to SNP and Haplotype Analysis
Of Sea Urchins, Birds and Men
Constrained Hidden Markov Models for Population-based Haplotyping
Xiaole Shirley Liu STAT115/STAT215/
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Linkage: Statistically, genes act like beads on a string
Patterns of Linkage Disequilibrium in the Human Genome
Haplotype Reconstruction
Phasing of 2-SNP Genotypes Based on Non-Random Mating Model
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Bioinformatics: A Statistician’s Persepctive
Lecture 9: QTL Mapping II: Outbred Populations
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Ho Kim School of Public Health Seoul National University SNP과 Haplotype 분석 소개 Ho Kim School of Public Health Seoul National University

Contents SNP (Single Nucleotide Polymorphism) Haplotypes Linkage & Linkage disequilibrium Association study design SNP vs. Haplotype for association study Haplotype estimation Data analysis

SNPs (pronounced snips)

Mutation

Polymorphism – Definition A sequence variation that occurs at least 1 percent of the time (> 1%) 90% of variations are SNPs Mutation If the variation is present less than 1 percent of the time (<= 1%)

SNPs in the Human Genome All humans share 99.9% the same genetic sequence SNPs occur about every 1000 base pairs The human genome contains more than 2 million SNPs ~21,000 SNPs are found in genes SNPs are not evenly spaced along the sequence SNP-rich regions SNP-poor regions

SNPs as DNA Landmarks Help in DNA sequencing Help in the discovery of genes responsible for many major diseases: asthma, diabetes, heart disease, schizophrenia and cancer among others

From SNP to Haplotype Phenotype Black eye GATATTCGTACGGA-T Brown eye GATGTTCGTACTGAAT GATATTCGTACGGAAT SNP 1 2 3 4 5 6 Phenotype Black eye Brown eye Blue eye AG- 2/6 GTA 3/6 AGA 1/6 Haplotypes SNP Simple to measure & understand Haplotype have the advantage in the appropriate circumstances of carrying more information about the genotype-phenotype link than do the underlying SNPs. DNA Sequence

SNP & Haplotype SNP: Single Nucleotide Polymorphism Haplotype: A set of closely linked genetic markers present on one chromosome which tend to be inherited together (not easily separable by recombination). G A C Set of SNP polymorphisms: a SNP haplotype

Linkage and Linkage Disequilibrium (1) Linkage: the tendency of genes or other DNA sequences at specific loci to be inherited together as a consequence of their physical proximity on a single chromosome. Linkage disequilibrium (allelic association): particular alleles at two or more neighboring loci show allelic association if they occur together with frequencies significantly different from those predicted from the individual allele frequencies. Linkage is a relation between loci, but association is a relation between alleles.

Linkage and Linkage Disequilibrium (2) ( = recombination fraction) No linkage:  = 0.5 Perfect linkage:  = 0 Linkage disequlibrium: 0   1 ( = probability of allelic association) Linkage equilibrium:  = 0 Complete linkage disequilibrium:  = 1

Allelic Association (LD) Morton et al. (2001) Locus B Locus A Allele 1 Allele 2 Allele frequency Allele 1 Allele 2 Allele frequency 1 A, B: diallelic loci; 11, 12, 21, 22: haplotypes; : association probability

Measures of LD Covariance D = | 11 22 - 12 21 | Association  = D/Q(1-R) All other measures are functions of Q, R, .

New Findings on Linkage Disequilibrium In the chromosome, there are blocks of limited haplotype diversity in which more than 80% of a global human sample can typically be characterized by only three common haplotypes (Patil et al., Science 2001). Haplotype blocks are the more precise units to reflect genetic variation. Identification of haplotype structure, i.e., construction of a haplotype map, provides a basis for accurate and efficient association studies.

Daly et al. (2001). LD by distance from two markers

The Problem It’s not yet easy to measure an individual’s (only two) haplotypes Molecular haplotyping (nucleotide sequencing) is the gold standard A more efficient strategy: Focus on regions, such as certain genes Estimate haplotypes from SNP data (genotypes) Use LD map, and reduce the number of loci to represent the haplotype Use haplotype map (DB) = key SNPs + haplotype blocks with strong LD

Haplotyping: Phase Problem C SNP1 SNP2 Diploid Observed: SNP1 G/T SNP2 A/C Possible Haplotypes: GA, TC or GC, TA n SNPs  2n possible haplotypes

Molecular Haplotyping Hetero-duplex analysis, mismatch detection, allele-specific PCR: Have potential to get high-throughput Only practical for short haplotypes (2-5 kb vs. 50-100kb) Costly Rolling Circle amplification method, etc: Can handle larger size Difficult to automate

In-silico Haplotyping Alias: Haplotype Reconstruction, Haplotype Inference, Computational Haplotyping, Statistical Haplotyping, etc. Advantages: Cost effective High-throughput Difficulty: Phase Ambiguity: Haplotypes increase exponentially with SNPs

In-silico Haplotyping: Two Tasks Reconstruction of the haplotypes of the sampled individuals II. Estimation of haplotypes frequencies in a population

In-silico Haplotyping: Approaches Clark’s algorithm E-M algorithm (expectation-maximization algorithm) Bayesian algorithm

Clark’s Algorithm 1) Find Homozygotes or heterozygotes at one locus SNP1 T T SNP2 A A SNP3 C C T-A-C Unambiguously defined SNP1 T T SNP2 A A SNP3 C G T-A-C T-A-G

Clark’s Algorithm 2) Try to solve ambiguous haplotype as a combination of solved ones SNP1 A T SNP2 A A SNP3 C G T-A-C : solved one A-A-G …………………………… Continue until either all haplotypes have been solved or until no more haplotypes can be found in this way

Clark’s Algorithm problems No homozygotes or single SNP heterozygotes -> chain might never get started Many unsolved haplotypes left at the end Quite useful in practice !!

EM Algorithm Use multinomial likelihood with HWE Pr(AT//AA//CG) =pr(AAC/TAG)+pr(AAG/TAC) =pr(AAC)pr(TAG)+pr(AAG)pr(TAC) Falling and Schork(2000) showed that EM is better than Clark’s algorithm

A Gibbs sampler, Stephens et al (2001) G=(G1, …, Gn) observed multilocus genotype freq H=(H1, …, Hn) unknown haplotype pairs F=(F1, …, FM) M unknown pop’n hap freq Choose individual i from all ambiguous individuals Sample Hi(t+1) from pr(Hi|g,H-i(t)) Set Hj(t+1)=Hj(t) for j=1,2,…,i-1,i+1,…n

Haplotype Inference A: SNP data: 0 (MM), 1 (Mm), 2 (mm) for a single locus B: Haplotype data: 0(M), 1 (m) for a single locus

#1 1, 2 00000 00100 #2 1, 3 00010 #3 1, 4 01001 #4 1, 5 00001 #5 1, 1 #6 1, 1

An Example Data 169 cases, 231 controls 11 haplotypes sex, age information

Logistic Regression Results Without adjusting for age, sex: Haplotype 7 is most strongly associated, but not statistically significant (p=0.07) Adjusting for age, sex: Haplotype 11 is most strongly associated (p=0.03) Slightly stronger association with accounting for repeated measures (2 haplotypes per person) by GEE procedure (p=0.02)

Other Examples

Drysdale et al. PNAS 2000, 97(19) 10483–10488

Wallenstein, Hodge, and Weston, Genetic Epidemiology 15:173–181 (1998)

Cohort study Case-control study

Shaw et al. Am J of Medical Genet 114 205-213 (2002)

References Clark (1990). Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Bio Evol 7: 111-122 Escoffier and Slatkin (1995). Maximum likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Bio Evol 12: 921-927. Stephens, Smith, and Donnelly (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68, 978-989. Niu, Qin, Xu and Liu (2002) Bayesian haplotype inference for multiple linked single-nucleotide ploymorphisms. Am J Hum Genet 70;157-169

Thank you ! Email :hokim@snu.ac.kr This file is available at http://plaza.snu.ac.kr /~hokim 열린 강의실, 세미나자료