Genome Wide Association Studies using SNP

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
GBS & GWAS using the iPlant Discovery Environment
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Design of Experiments and Analysis of Variance
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
Today Concepts underlying inferential statistics
Lorelei Howard and Nick Wright MfD 2008
Chapter 14 Inferential Data Analysis
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Basic Statistics. Basics Of Measurement Sampling Distribution of the Mean: The set of all possible means of samples of a given size taken from a population.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Population Stratification
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Input: A set of people with/without a disease (e.g., cancer) Measure a large set of genetic markers for each person (e.g., measurement of DNA at various.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
ANOVA P OST ANOVA TEST 541 PHL By… Asma Al-Oneazi Supervised by… Dr. Amal Fatani King Saud University Pharmacy College Pharmacology Department.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 13 Understanding research results: statistical inference.
Differences Among Groups
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Methods of Presenting and Interpreting Information Class 9.
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Quantitative Methods in the Behavioral Sciences PSY 302
Power Calculations for GWAS
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
Causality, Null Hypothesis Testing, and Bivariate Analysis
Chapter 13 Nonlinear and Multiple Regression
upstream vs. ORF binding and gene expression?
General Linear Model & Classical Inference
Understanding Results
Washington State University
Qualitative data – tests of association
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
BMI/CS 776 Spring 2018 Anthony Gitter
Regression-based linkage analysis
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
The ‘V’ in the Tajima D equation is:
Genome-wide Association Studies
Correlation for a pair of relatives
I. Statistical Tests: Why do we use them? What do they involve?
Statistical Inference about Regression
What are BLUP? and why they are useful?
Lecture 10: QTL Mapping II: Outbred Populations
Genetic Drift, followed by selection can cause linkage disequilibrium
Statistical Analysis and Design of Experiments for Large Data Sets
Lecture 9: QTL Mapping II: Outbred Populations
Understanding Statistical Inferences
Evaluation of power for linkage disequilibrium mapping
Presentation transcript:

Genome Wide Association Studies using SNP

The three GBS Analyses steps 1. Data processing A) Raw reads B) Read Cleaning and quality control 2. Read Mapping A) Aligning reads to reference genome B) Selecting best hit from multi-mapped reads C) BAM Conversation 3. Variant Discovery A) SNP/Indel Mining B) SNP/Indel filtering C) Practical applications i) Diversity analysis ii) GWAS

Experimental design Sample size: Aim for the largest sample size that your money will buy and you can phenotype. Rare alleles: If the trait you are interested in is associated with rare allele in the population you will need much larger numbers to identify effects in these SNP. Coverage: In GWAS hopefully there will always be a SNP that is in full (or almost) linkage disequilibrium with the causative gene.

Statistical Methods for GWAS GWAS analysis exploit linkage disequilibrium (LD) LD is a population associations between markers and quantitative trait locus: QTL Associations arise because there are small segments on chromosomes in the current population which are descend from the same common ancestor. These segments from ancestor without recombination, will carry identical marker alleles or haplotypes. There are numbers of methodologies which exploit these associations.

Binary trait: example, disease state present absent Odd ratio Pearson chi-square test Fisher exact test Correlation Trend test

Quantitative trait Univariate models: T-test Wilcox sum test ANOVA Kruskal Wallis Test Multivariate models: Generalized linear model Mixed model

Testing marker on a trait Null hypothesis: Marker has no effect on trait Alternative hypothesis: Marker does affect the trait Reject using F-statistics

Choice of significant level (alpha) What value of alpha Bonferroni: The Bonferroni correction is an adjustment made to P values when several dependent or independent statistical tests are being performed simultaneously on a single data set. To perform a Bonferroni correction, divide the critical P value (α) by the number of comparisons being made Permutation testing: Set appropriate significance with multiple testing False discovery rate (FDR): The FDR is the expected proportion of detected QTL that are in fact false positives.

Population structure and confounding Care needs to be taken to avoid sporous or inflated associations due to population structure. The main causes of confounding in GWAS are a) Population structure or existence of a major sub population in a population. b) Cryptic relatedness “i.e. existence of small groups (often pairs) of highly related individuals. c) Environmental differences between sub-populations or geographical locations. d) Differences in allele call rates between sub-populations.

Avoiding False Positive due to population structure Any calculation unaccounted for due to population structure will results in false positive associations in GWAS. An alternative is to remove the effect of population structure using a model with population structure.

Controlling population structure Genomic control, where chi-square test statistics are used. Structural association done by a Bayesian population model with individuals. Fitting population membership on covariates. Regression on a set of markers, where a set of fairly widely spaced markers covering the genome is used as covariates. Principal components, where number of e.g., 10 components are used as covariates. Mixed model method, where set of random effects is fitted for each individual with covariance based on estimated kinship matrix or known pedigree

TASSEL: General Linear Model (GLM) TASSEL utilizes a fixed effects linear model to test for association between segregating sites and phenotypes. The analysis optionally accounts for population structure using covariates that indicate degree of membership in underlying populations. A main effects only model is automatically built using all variables in the input data. A separate model is built and solved for each trait and marker combination.

TASSEL: Mixed Linear Model (MLM) This conducts association analysis via a mixed linear model (MLM). A mixed model is one which includes both fixed and random effects. Including random effects gives MLM the ability to incorporate information about relationships among individuals. When a genetic marker based kinship matrix (K) is used jointly with population structure (Q), the “Q+K” approach improves statistical power.

QQ plot explanation The QQ plot shows the expected distribution of association test statistics (X-axis) across the SNPs compared to the observed values (Y- axis). Any deviation from the X=Y line implies a consistent difference between cases and controls across the whole genome suggesting a bias (false positives association). A clean QQ plot should show a solid line matching X=Y until it sharply curves at the end (representing the small number of true associations among thousands of unassociated SNPs).

Manhattan plot