Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics.

Similar presentations


Presentation on theme: "Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics."— Presentation transcript:

1 Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

2 Introduction Certain SNPs within genes may be associated with a disease phenotype Statistical model used in class only considers inheritance of a single copy of an SNP location: Single Chromosome Model Expand the statistic to a diploid model and take into account different expression patterns of a SNP Certain SNPs within genes may be associated with a disease phenotype Statistical model used in class only considers inheritance of a single copy of an SNP location: Single Chromosome Model Expand the statistic to a diploid model and take into account different expression patterns of a SNP

3 Basic Statistic- Haploid Model  : Relative Risk p A : Probability of disease-associated allele F: Disease prevalence For this project, F is assumed to be very small +/-: Disease State Derivation of case (p + ) and control (p - ) frequencies: P(A)=p A p + A =P(A|+)p - A =P(A|-)F=P(+) P(A|+)=P(+|A)P(A)/P(+) P(+|A)=  P(+|¬A)  : Relative Risk p A : Probability of disease-associated allele F: Disease prevalence For this project, F is assumed to be very small +/-: Disease State Derivation of case (p + ) and control (p - ) frequencies: P(A)=p A p + A =P(A|+)p - A =P(A|-)F=P(+) P(A|+)=P(+|A)P(A)/P(+) P(+|A)=  P(+|¬A)

4 Derivation- Continued P(+)=F=p A P(+|A)+(1-p A )P(+|¬A) P(+)=F= p A P(+|A)+(1-p A )P(+|A)/  P(+)=F=P(+|A)(p A +(1-p A )/  )=P(+|A)(p A (  -1)+1)/  P(+|A)=  F/(p A (  -1)+1) P(A|+)=P(+|A)P(A)/P(+)=P(+|A)p A /F=  p A /(p A (  -1)+1) P(-|A)=1-P(+|A)=1-  F/(p A (  -1)+1) P(A|-)=P(-|A)P(A)/P(-) If F is small, then 1-F ≈ 1 and P(-|A) ≈ 1 then, P(A|-) ≈ P(A) = p A P(+)=F=p A P(+|A)+(1-p A )P(+|¬A) P(+)=F= p A P(+|A)+(1-p A )P(+|A)/  P(+)=F=P(+|A)(p A +(1-p A )/  )=P(+|A)(p A (  -1)+1)/  P(+|A)=  F/(p A (  -1)+1) P(A|+)=P(+|A)P(A)/P(+)=P(+|A)p A /F=  p A /(p A (  -1)+1) P(-|A)=1-P(+|A)=1-  F/(p A (  -1)+1) P(A|-)=P(-|A)P(A)/P(-) If F is small, then 1-F ≈ 1 and P(-|A) ≈ 1 then, P(A|-) ≈ P(A) = p A

5 Haploid Model The relative risk formula: Association Power: The relative risk formula: Association Power:

6 Assumptions Low disease prevalence F ≈ 0: Allows p - A ≈ p A Uses Hardy-Weinberg Principle A-Major Allelea-Minor Allele P(AA)=P(A)^2 P(Aa)=2*P(A)*(1-P(A)) P(aa)=(1-P(A))^2 Uses a balanced case-control study Low disease prevalence F ≈ 0: Allows p - A ≈ p A Uses Hardy-Weinberg Principle A-Major Allelea-Minor Allele P(AA)=P(A)^2 P(Aa)=2*P(A)*(1-P(A)) P(aa)=(1-P(A))^2 Uses a balanced case-control study

7 Diploid Disease Models When inheriting two copies of a SNP site, there are three common relationships between major and minor SNPs Dominant Particular phenotype requires one major allele Recessive Particular phenotype requires both minor alleles Additive Particular phenotype varies based whether there are one or two major alleles When inheriting two copies of a SNP site, there are three common relationships between major and minor SNPs Dominant Particular phenotype requires one major allele Recessive Particular phenotype requires both minor alleles Additive Particular phenotype varies based whether there are one or two major alleles

8 Diploid Disease Models AA- Homozygous major Aa, aA- Heterozygous aa- Homozygous minor AA- Homozygous major Aa, aA- Heterozygous aa- Homozygous minor

9 Modifying the Calculation for Relative Risk Previous relative risk formula only considered the haploid case of having a SNP or not having a SNP. Approach: Create a virtual SNP which replaces p A in the formula. Previous relative risk formula only considered the haploid case of having a SNP or not having a SNP. Approach: Create a virtual SNP which replaces p A in the formula.

10 Virtual SNPs Use Hardy-Weinberg Principle to calculate a new p A - the virtual SNP using the characteristics of diploid disease models. Recessive p A =p d *p d Dominant p A =p d *p d +2*p d *(1-p d ) Additive p A =p d *p d +c*p d *(1-p d ) P d : Probability of disease-associated allele. In the calculations used to determine the association power, c was set to sqrt(2). Use Hardy-Weinberg Principle to calculate a new p A - the virtual SNP using the characteristics of diploid disease models. Recessive p A =p d *p d Dominant p A =p d *p d +2*p d *(1-p d ) Additive p A =p d *p d +c*p d *(1-p d ) P d : Probability of disease-associated allele. In the calculations used to determine the association power, c was set to sqrt(2).

11 Diploid Disease Models:  =1.5

12

13

14 Diploid Disease Models:  =2

15

16

17 Diploid Disease Models:  =3

18 Results Achieving significant association power with low relative risk SNPs (  =1.5) Minimum of 200 cases and 200 controls required to reach 80% power within strongest p d intervals for each type of SNP At a sample size of 1000 cases and 1000 controls, dominant and additive SNPs show very significant power for almost all SNP probabilities below 50% Difficult to obtain significant association for low probability recessive SNPs regardless of sample size Achieving significant association power with low relative risk SNPs (  =1.5) Minimum of 200 cases and 200 controls required to reach 80% power within strongest p d intervals for each type of SNP At a sample size of 1000 cases and 1000 controls, dominant and additive SNPs show very significant power for almost all SNP probabilities below 50% Difficult to obtain significant association for low probability recessive SNPs regardless of sample size

19 Results SNP probability ranges for greatest association power Dominant:.10 -.30 Recessive:.45 -.70 Additive:.15 -.40 Higher relative risk SNPs require fewer cases and controls to achieve the same power. As  approaches 1, the association power to detect a recessive allele with probability p is the same as the power to detect dominant allele with probability 1-p. SNP probability ranges for greatest association power Dominant:.10 -.30 Recessive:.45 -.70 Additive:.15 -.40 Higher relative risk SNPs require fewer cases and controls to achieve the same power. As  approaches 1, the association power to detect a recessive allele with probability p is the same as the power to detect dominant allele with probability 1-p.

20 Results Diseases with higher relative risk have their range of highest association power skewed toward lower probability SNPs. Challenges in obtaining high association power: Low probability recessive SNPs Low relative risk diseases, especially with small sample sizes High probability dominant SNPs, however these are unlikely due natural selection and that the majority of the population would be affected by such diseases. Diseases with higher relative risk have their range of highest association power skewed toward lower probability SNPs. Challenges in obtaining high association power: Low probability recessive SNPs Low relative risk diseases, especially with small sample sizes High probability dominant SNPs, however these are unlikely due natural selection and that the majority of the population would be affected by such diseases.


Download ppt "Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics."

Similar presentations


Ads by Google