Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods in genome wide association studies. Norú Moreno

Similar presentations


Presentation on theme: "Methods in genome wide association studies. Norú Moreno"— Presentation transcript:

1 Methods in genome wide association studies. Norú Moreno
CS374:: Algorithms in Biology Professor: Serafim Batzoglou

2 Agenda GWA Polymorphisms Hap Map Project Genotyping chip
Integrating CNVs and SNPs Imputation Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays

3 Genome-wide Association Study (GWA study or GWAS)
Completion of the Human Genome Project in 2003 Examination of genetic variation across a given genome. Objective: Identify genetic associations with observable traits

4 GWAS Scan SNPs across many individuals to associate alleles with a particular disease Use a detected association to detect, treat and prevent the disease Pharmacogenomics.

5 Polymorphisms A specific sequence variation that some individuals possess Some variations are common, others are rare Examples: Blood types Height Skin Color Etc…

6 Types of polymorphisms
1. Copy Number Variation (CNV) Segment of DNA that are found in different numbers of copies among individuals Substantial regions, not single nucleotides A B C A C A B B B C

7 Types of polymorphisms
Single Nucleotide Polymorphism (SNP) )Murray 2007(

8 HapMap Two unrelated people share about 99.5% of their DNA sequence.
HapMap focuses only on common SNPs, : 1% of the population 269 individuals, ~4M SNPs Genotyped the individuals for these SNPs, and published the results

9 Genotyping chip ACTGGGCTAATCGATCGACTAGCTAGCTAGTCTCGATCAAT ACTGGGCTAA
Probes

10 Genotyping chip (Liu 2007) (Affymetrix)

11 Genotyping chip (Affymetrix)

12 Genotyping chip B BB (0) AB (0.5) AA (1) A

13 Genotyping chip Affymetrix 100k chip set
Entire genome with SNPs (low density). Affymetrix 500k chip (SNP array 5.0) Entire genome with SNPs (high density) Affymetrix 1M chip (SNP array 6.0) Entire genome with SNPs (very high density)

14 Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs (Birdsuite) Korn, et al.

15 Birdsuite Take in count CNVs and SNPs :: Raw data from genotyping chip as input. Output: integrated CNVs and SNPS genotype per locus CNVs and SNPs coexist. Both common and rare to understand the role of genetic variation in disease.

16 Birdsuite New Genotype A-null AAAB BBBB SNPs (AA, AB, CC) CNPs

17 Birdsuite – 4 Stages Canary – ‘Genotypes’ common copy- number polymorphisms (CNPs) Birdseed - Genotypes SNPs using the classical AA, AB, and BB genotypes. Birdseye - Identify rare CNVs via HMMs Fawkes - Integrates CNV information to produce mutually consistent SNP genotypes (i.e. including genotypes such as A-null and AAB)

18 Birdsuite - Canary Determines the copy number of each individual at each predefined CNP locus. CNP = Copy number polymorphism CNV>1% frequency in population A B B B C Locus Number of copies A 1 B 3 C

19 Canary (Korn, p.1255)

20 Birdsuite - Birdseed We expect only AA, AB or BB.
From canary only CNPs with 2 No fewer or extra copies. BB AB Use HapMap as prior model to represent expected allele intensity for each genotype. Algorithm based in expectation-maximization to determine AA, AB and BB clusters per SNP. Gives a score reflecting the confidence call. Result: Have been used to genotype over 50,000 samples at the Broad Institute with average call rate > 99% AA (Korn, p.1257)

21 Birdsuite - Birdseye Using Canary and Birdseed:
Identify rare and de novo CNVs Small number of real CNVs at unknown sites. Search consistent evidence for copy number variation across multiple neighboring probes. Implement an HMM-based algorithm to find strong, consistent evidence for altered copy number states

22 Birdsuite - Birdseye HMM to find regions of variable copy number in a sample. Hidden state: The true copy number of the individual’s genome. Observed states: The normalized intensity measurements of each probe on the array.

23 Birdsuite - Fawkes Merge all the results.
Show the CNVs within each SNP. Utilize the imputed locations (in A/B intensity space) of copy-variable clusters. Assign an allele-specific copy number genotype at each SNP. (e.g. AAB, ABBB, A or B)

24 Fawkes (Korn, p. 1254,1257)

25 (Affymetrix website screenshot)

26 Imputation Dealing with missing data points by filling in values.
In SNPs: T A G G T ? T G C C T A G C G T Why? Cost-saving Avoid re-genotyping Keep effective sample size SNP comparisons between existing platforms.

27 Imputation High rate of occurrence. ‘Direct’ imputation.
T A G G T ? T G C C T A G C G T T A G G T A T G C C T A G C G T

28 Imputation Linkage disequilibrium
Non-random association of alleles at two or more loci. LD SNP of interest

29 Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays Homer, et al.

30 The DNA Detective Is an individual genome present in a DNA mixture?
Mixed DNA // Population Query Is an individual genome present in a DNA mixture?

31 DNA Detective We have: Different laboratories > different conclusions. Usually not accurate at all. Hard and cannot be automatized.

32 DNA Detective - Methodology
Summary: Cumulative sum of allele shifts over all available SNPs. Shift’s sign > individual of interest is closer to a reference sample or closer to a given mixture. First genotype a single SNP for a single person, then adapt it to all mixtures and pooled data.

33 DNA Detective – Single SNP, Single person
Raw preprocessed data > allele instensity (How much of A and how much of B we have). Transform normalized data into a ratio. Yi is the estimate of allele frequency BB AB AA ~0 ~0.5 ~1

34 DNA Detective - Methodology
Use relative probe intensity data. Compare allele frequency estimates from the mixture (M). Assume reference population (Pop) has similar ancestral components interchangeable.

35 DNA Detective - Methodology
Distance measure for individual Yi

36 DNA Detective - Methodology
Null hypotheses, individual is not in the mixture, D(Yi,j) ~ 0 Alternative hypotheses, D(Yi,j) > 0 More similar to M than Pop D(Yi,j) < 0 Yi,jc is more ancestral similar to Pop than to M.

37 (Homer, p.4)

38 DNA Detective - Results
Accurate findings. Determined if a trace amount (<1%) of DNA is present in a DNA mixture. Tested with different kinds of Mixtures from public available data.

39 DNA Detective - Implications
Forensics application. Traceability Leak of privacy information. Public data from many studies. Summary statistics of Allele Frequency. Political implications. How to share the data now?

40 Thank You!

41 References Korn J, et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature genetics Oct;40(10): Homer N, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet Aug 29;4(8):e Liu Y, DPhil, Prchal F. SNP-Chip-Based Genome-Wide Analysis of Genetic Alterations in Hematologic Disorders: The Way Forward?. The Hematologist. 2007 Murray, E. IST 341 Issues in Human Genetics. 0Genetics%20MTHFR%20SNP%20Page.html


Download ppt "Methods in genome wide association studies. Norú Moreno"

Similar presentations


Ads by Google