Methods in genome wide association studies. Norú Moreno

Slides:



Advertisements
Similar presentations
Lecture 2 Strachan and Read Chapter 13
Advertisements

What is an association study? Define linkage disequilibrium
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Which Phenotypes Can be Predicted from a Genome Wide Scan of Single Nucleotide Polymorphisms (SNPs): Ethnicity vs. Breast Cancer Mohsen Hajiloo, Russell.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ferdinand van ’t Hooft Cardiovascular Genetics and Genomics Group Karolinska Institutet, Stockholm, Sweden Genome-Wide Association Study GWAS
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Resolving membership in a study in shared aggregate genetics data David W. Craig, Ph.D. Investigator & Associate Director Neurogenomics Division
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Genome Variations & GWAS
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Understanding Genetics of Schizophrenia
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Copy Number Variants: detection and analysis Manuel Ferreira & Shaun Purcell Boulder, 2009.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
An Efficient Method of Generating Whole Genome Sequence for Thousands of Bulls Chuanyu Sun 1 and Paul M. VanRaden 2 1 National Association of Animal Breeders,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
What host factors are at play? Paul de Bakker Division of Genetics, Brigham and Women’s Hospital Broad Institute of MIT and Harvard
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Gene Hunting: Linkage and Association
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
The 1 st Competition on Critical Assessment of Data Privacy and Protection The privacy workshop is jointly sponsored by iDASH (U54HL108460) and the collaborating.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Genotype Calling Jackson Pang Digvijay Singh Electrical Engineering, UCLA.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
California Pacific Medical Center
Genome wide association studies (A Brief Start)
The International Consortium. The International HapMap Project.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
The analysis of A Genome-wide Association Study of Autism Reveals a Common Novel Risk Locus at 5p14.1 Rodney Knowlton Kyle Andrews.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Common variation, GWAS & PLINK
Global Variation in Copy Number in the Human Genome
Genome Wide Association Studies using SNP
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Linking Genetic Variation to Important Phenotypes
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Part II: Potential Genetic Privacy Risks
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou

Agenda GWA Polymorphisms Hap Map Project Genotyping chip Integrating CNVs and SNPs Imputation Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays

Genome-wide Association Study (GWA study or GWAS) Completion of the Human Genome Project in 2003 Examination of genetic variation across a given genome. Objective: Identify genetic associations with observable traits

GWAS Scan SNPs across many individuals to associate alleles with a particular disease Use a detected association to detect, treat and prevent the disease Pharmacogenomics.

Polymorphisms A specific sequence variation that some individuals possess Some variations are common, others are rare Examples: Blood types Height Skin Color Etc…

Types of polymorphisms 1. Copy Number Variation (CNV) Segment of DNA that are found in different numbers of copies among individuals Substantial regions, not single nucleotides A B C A C A B B B C

Types of polymorphisms Single Nucleotide Polymorphism (SNP) )Murray 2007(

HapMap Two unrelated people share about 99.5% of their DNA sequence. HapMap focuses only on common SNPs, : 1% of the population 269 individuals, ~4M SNPs Genotyped the individuals for these SNPs, and published the results

Genotyping chip ACTGGGCTAATCGATCGACTAGCTAGCTAGTCTCGATCAAT ACTGGGCTAA Probes

Genotyping chip (Liu 2007) (Affymetrix)

Genotyping chip (Affymetrix)

Genotyping chip B BB (0) AB (0.5) AA (1) A

Genotyping chip Affymetrix 100k chip set Entire genome with 100 000 SNPs (low density). Affymetrix 500k chip (SNP array 5.0) Entire genome with 500 000 SNPs (high density) Affymetrix 1M chip (SNP array 6.0) Entire genome with 1 000 000 SNPs (very high density)

Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs (Birdsuite) Korn, et al.

Birdsuite Take in count CNVs and SNPs :: Raw data from genotyping chip as input. Output: integrated CNVs and SNPS genotype per locus CNVs and SNPs coexist. Both common and rare to understand the role of genetic variation in disease.

Birdsuite New Genotype A-null AAAB BBBB SNPs (AA, AB, CC) CNPs

Birdsuite – 4 Stages Canary – ‘Genotypes’ common copy- number polymorphisms (CNPs) Birdseed - Genotypes SNPs using the classical AA, AB, and BB genotypes. Birdseye - Identify rare CNVs via HMMs Fawkes - Integrates CNV information to produce mutually consistent SNP genotypes (i.e. including genotypes such as A-null and AAB)

Birdsuite - Canary Determines the copy number of each individual at each predefined CNP locus. CNP = Copy number polymorphism CNV>1% frequency in population A B B B C Locus Number of copies A 1 B 3 C

Canary (Korn, p.1255)

Birdsuite - Birdseed We expect only AA, AB or BB. From canary only CNPs with 2 No fewer or extra copies. BB AB Use HapMap as prior model to represent expected allele intensity for each genotype. Algorithm based in expectation-maximization to determine AA, AB and BB clusters per SNP. Gives a score reflecting the confidence call. Result: Have been used to genotype over 50,000 samples at the Broad Institute with average call rate > 99% AA (Korn, p.1257)

Birdsuite - Birdseye Using Canary and Birdseed: Identify rare and de novo CNVs Small number of real CNVs at unknown sites. Search consistent evidence for copy number variation across multiple neighboring probes. Implement an HMM-based algorithm to find strong, consistent evidence for altered copy number states

Birdsuite - Birdseye HMM to find regions of variable copy number in a sample. Hidden state: The true copy number of the individual’s genome. Observed states: The normalized intensity measurements of each probe on the array.

Birdsuite - Fawkes Merge all the results. Show the CNVs within each SNP. Utilize the imputed locations (in A/B intensity space) of copy-variable clusters. Assign an allele-specific copy number genotype at each SNP. (e.g. AAB, ABBB, A or B)

Fawkes (Korn, p. 1254,1257)

(Affymetrix website screenshot)

Imputation Dealing with missing data points by filling in values. In SNPs: T A G G T ? T G C C T A G C G T Why? Cost-saving Avoid re-genotyping Keep effective sample size SNP comparisons between existing platforms.

Imputation High rate of occurrence. ‘Direct’ imputation. T A G G T ? T G C C T A G C G T T A G G T A T G C C T A G C G T

Imputation Linkage disequilibrium Non-random association of alleles at two or more loci. LD SNP of interest

Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays Homer, et al.

The DNA Detective Is an individual genome present in a DNA mixture? Mixed DNA // Population Query Is an individual genome present in a DNA mixture?

DNA Detective We have: Different laboratories > different conclusions. Usually not accurate at all. Hard and cannot be automatized.

DNA Detective - Methodology Summary: Cumulative sum of allele shifts over all available SNPs. Shift’s sign > individual of interest is closer to a reference sample or closer to a given mixture. First genotype a single SNP for a single person, then adapt it to all mixtures and pooled data.

DNA Detective – Single SNP, Single person Raw preprocessed data > allele instensity (How much of A and how much of B we have). Transform normalized data into a ratio. Yi is the estimate of allele frequency BB AB AA ~0 ~0.5 ~1

DNA Detective - Methodology Use relative probe intensity data. Compare allele frequency estimates from the mixture (M). Assume reference population (Pop) has similar ancestral components interchangeable.

DNA Detective - Methodology Distance measure for individual Yi

DNA Detective - Methodology Null hypotheses, individual is not in the mixture, D(Yi,j) ~ 0 Alternative hypotheses, D(Yi,j) > 0 More similar to M than Pop D(Yi,j) < 0 Yi,jc is more ancestral similar to Pop than to M.

(Homer, p.4)

DNA Detective - Results Accurate findings. Determined if a trace amount (<1%) of DNA is present in a DNA mixture. Tested with different kinds of Mixtures from public available data.

DNA Detective - Implications Forensics application. Traceability Leak of privacy information. Public data from many studies. Summary statistics of Allele Frequency. Political implications. How to share the data now?

Thank You!

References Korn J, et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature genetics. 2008 Oct;40(10): 1253-60 Homer N, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008 Aug 29;4(8):e1000167 Liu Y, DPhil, Prchal F. SNP-Chip-Based Genome-Wide Analysis of Genetic Alterations in Hematologic Disorders: The Way Forward?. The Hematologist. 2007 Murray, E. IST 341 Issues in Human Genetics. http://www.science.marshall.edu/murraye/341/snps/Human%2 0Genetics%20MTHFR%20SNP%20Page.html