Presentation is loading. Please wait.

Presentation is loading. Please wait.

Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.

Similar presentations


Presentation on theme: "Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu."— Presentation transcript:

1 Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu

2 2 Outline Definition and motivation SNP distribution and characteristics –Allele frequency, LD, population stratification SNP discovery (unknown) and genotyping (known)SNP discovery genotyping –CNV detection

3 3 Polymorphism Polymorphism: sites/genes with “common” variation, less common allele frequency ≥1%, otherwise called rare variant and not polymorphic First discovered (early 1980): restriction fragment length polymorphism Some definitions: –Locus: position on chromosome where sequence or gene is located –Allele: alternative form of DNA on a locus

4 4 Polymorphism Single Nucleotide Polymorphism –Occasionally short (1-3 bp) indels are considered SNPs too –Come from DNA-replication mistake individual germ line cell, then transmitted –~90% of human genetic variation Copy number variations –May or may not be genetic

5 5 Why Should We Care Disease gene discovery –Association studies, certain SNPs are susceptible for diabetes –Chromosome aberrations, duplication / deletion might cause cancer Personalized Medicine –Drug only effective if you have one allele

6 6

7 7

8 8 SNP Distribution Most common, 1 SNP / 100-300 bp –Balance between mutation introduction rate and polymorphism lost rate –Most mutations lost within a few generations 2/3 are CT differences In non-coding regions, often less SNPs at more conserved regions In coding regions, often more synonymous than non-synonymous SNPs

9 9 SNP Characteristics: Allele Frequency Distribution Most alleles are rare (minor allele frequency < 10%)

10 10 Mode of inheritance

11 11 SNP Characteristics: Allele Frequency Distribution Nucleotide diversity –Average fraction of nucleotides differ between a pair of random chosen allele AACCG GCTTA GCCGA GTTAT AAGCG GCTTA GCCGA GATAT AACCG GCTAA GCCGA GTTAT AAGCG GCTTA GCCGA GTTAT AACCG GCTTA GCCGA GATAT

12 12 SNP Characteristics: Hardy-Weinberg equilibrium (HWE)

13 13 SNP Characteristics: Linkage Disequilibrium EquilibriumDisequilibrium LD: If Alleles occur together more often than can be accounted for by chance, then indicate two alleles are physically close on the DNA –In mammals, LD is often lost at ~100 KB –In fly, LD often decays within a few hundred bases

14 14 SNP Characteristics: Linkage Disequilibrium Statistical Significance of LD –Chi-square test with 1 df –e ij = n i. n. j / n T B1B2Total A1n 11 n 12 n1.n1. A2n 21 n 22 n2.n2. Totaln. 1 n. 2 nTnT

15 15 SNP Characteristics: Linkage Disequilibrium Three ways to calculate LD Observed Expected

16 16 SNP Characteristics: Linkage Disequilibrium Haplotype block: a cluster of linked SNPs Haplotype boundary: blocks of sequence with strong LD within blocks and no LD between blocks, reflect recombination hotspots Haplotype size distribution

17 17 SNP Characteristics: Linkage Disequilibrium Can see haplotype block: a cluster of linked SNPs

18 18 SNP Characteristics: Linkage Disequilibrium [C/T] [A/G] T X C [A/C] [T/A] –Possible haplotype: 2 4 –In reality, a few common haplotypes explain 90% variations Tagging SNPs: –SNPs that capture most variations in haplotypes –removes redundancy Redundant

19 19 SNP Characteristics: Population Stratification Population stratification: individuals selected from two genetically different populations, stratification may be environmental, cultural, or genetic Could give spurious results in case control association studies – the example of “chopstick genes”

20 Using genetic variation to study populations 20

21 21 SNP Discovery Methods Sequencing individuals for difference: too costly First check whether big regions have SNPs –Basic idea: denature and re-anneal two samples, detect heterduplex –Can pool samples (e.g. 10 African with 10 Caucasians) to speed screening Resequence to verify dbSNP: 12M RefSNP, 6M validateddbSNP

22 22 SNP Genotyping For a known locus TT C/A AG, does this individual have CC, AA or AC? Many methods Hybridization-based methods –Dynamic allele-specific hybridization –Molecular beacons –SNP-array chip (simultaneously genotype thousands of SNPs) Enzyme-based methods –RFLP –PCR-based methods –Flap endonuclease –Primer extension –Oligonucleotide ligase assay Other methods (based on physical properties of DNA)

23 23 SNP Array One SNP at a time or genome-wide (SNP array) 2.5kb 5.8kb 0.30

24 24 40 Probes Used Per SNP Allele call –AA, BB, AB Signal –Theoretically 1A+1B, 2A, 2B –But could have 1A+3B Amplified!

25 25 T SNP Chip for LOH Loss of Heterozygosity: tumor suppressor gene inactivation by allelic loss in cancers TT NormalFirst genetic hitCancer X OR TT X T X T X A BAA A B LOH

26 26 Making LOH Calls Compare the cancer and normal SNP profile of the same individual

27 27 SNP Array for CNV Collect normal / diseased samples on SNP arrays Probe normalization, background subtraction Use HMM to infer CNV

28 28 Integrate CNV with Expression to Identify oncogene MITF in melanoma

29 29 Summary SNP and CNV SNP distribution and characteristics –Allele frequency (minor allele > 1%) –LD: linkage ~ physical proximity –Population stratification SNP discovery: heteroduplex SNP genotyping –SNP array –CNV detection: HMM

30 30 Acknowledgement Stefano Monti Tim Niu Kenneth Kidd, Judith Kidd and Glenys Thomson Joel Hirschhorn Greg Gibson & Spencer Muse


Download ppt "Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu."

Similar presentations


Ads by Google