Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection Ruibin Xi Peking University School of Mathematical Sciences
Haplotype Freqeuncies
Linkage Equilibrium
Linkage Disequilibrium
Disequilibrium Coefficient DAB
DAB is hard to interpret Sign is arbitrary … A common convention is to set A, B to be the common allele and a, b to be the rare allele Range depends on allele Frequencies Hard to compare between markers
r2 (also called Δ2) Ranges between 0 and 1 1 when the two markers provide identical information 0 when they are in perfect equilibrium
Raw r2 data from chr22
Comparing Populations CEPH: Utah residents with ancestry from northern and western Europe (CEU)
Use LD for SNP imputation and detection fastPhase
Use LD for SNP imputation and detection fastPhase
Model for haplotypes Observed n haplotypes Each with M markers bij = 0, 1 Assume each haplotye originates from one of K clusters zi: unknown cluster of origin of bi Since clusters of origin are unknown
Local clustering of haplotype Assume zi = (zi1,…, ziM) forms a Markov chain on {1,…,K} zim denote the cluster origin for bim Initial probabilities Transition probabilities Conditional on the cluster of origin Marginal
Local clustering of genotype data We have genotype data gim: genotype at marker m of individual i Take values 0, 1, 2 Initial probabilities ( unordered cluster of origins) Transition probabilities
Local clustering of genotype data Genotype probabilities conditional on cluster of origins Joint likelihood
Algorithms for genotype imputation fastPhase BEAGLE IMPUTE PLINK MaCH
Algorithms for genotype imputation fastPhase BEAGLE IMPUTE PLINK MaCH Picture taken from IMPUTE v2
SNP detection with LD information MaCH: (G: genotye, S: cluster)
SNP detection with LD information For sequencing data G is not observed Coverage of base A, B are observed, we have the HMM
SNP detection with LD information Nielsen et al. 2011 Nature Review Genetics