Download presentation

Presentation is loading. Please wait.

Published byIbrahim Gyles Modified over 3 years ago

1
CIS: Compound Importance Sampling for Binding Site p-value Estimation The Hebrew University, Jerusalem, Israel Yoseph Barash Gal Elidan Tommy Kaplan Nir Friedman

2
2 Detecting Target Genes promoter binding site? gene binding site? Probabilistic framework Log odds Score: ACGTACGT 1 2 k p[i,c] – prob. of letter c at position i

3
3 Detecting target genes (2) ? ?

4
4 p-value of Scores Score Prob S

5
5 p-value score: Universal Interpretable Control false positive error rate Detecting target genes (3) Bonferroni corrected p-value 0.01 score 15 13 11 9 7 p-value 10 -7 10 -6 10 -5 10 -4 10 -2 10 -3

6
6 p-value Estimation Score Problem 1: naïve enumeration infeasible #seq = 4 k Prob S* Estimate the p-value by sampling from P 0 : samples scores: s 1 …s n

7
7 p-value Estimation Need ~10 7 attempts to get a sample with pvalue < 10 -7 Prob Problem 2: Multiple hypothesis Testing low p-values (10 -7 ) S* Score S*

8
8 Importance Sampling Approach Score 1.Cheat: Sample from Q(s 1 …s k ), to get high scoring samples 2. Get absolution: Weigh each sample S* Prob Empirical p-value ~ 10 -8 N ~ 10 4

9
9 Why is this allowed? x = subsequence Importance Sampling Desired estimate: expectation of log-odds Sample from P 0 (x) and count Multiply and divide by Q(x) Sample from Q(x) and reweight How to choose Q?

10
10 Choosing Sampling Distribution Score Q 10 = MotifQ 1 = Background Q5Q5 Under-sampled region Density

11
11 Choosing Sampling Distribution wRescale wCombine Comprehensive Coverage Sampling distribution Score Density Mixing ratio

12
12 PSSM Example 6e-5 Naive 0 2e-5 4e-5 10121416182022 MAST (Bailey et al. 98) Normal p-value Score CIS (10 000000) (40 000) What if we want something else?

13
13 wDependency Models - Many possible variants: Trees, Mixture of PSSMs, Mixture of Trees etc. Tree Example: wSuggested by several recent papers: Barash et al.(2003), King & Roth (2003), Zhou & Liu (2004),… Beyond PSSM Models wMain Point: Capture dependencies between biding site positions Improve sites predictions Challenge: compute p-values for general models X1X1 X2X2 X3X3 X4X4 X5X5

14
14 Tree Model Example 0 2e-5 4e-5 6e-5 8e-5 1e-4 101214161820 p-value Scor e X Not efficient X Not applicable X Not accurate wNaïve Sampling wMAST (Baily et al,98) wNormal Approx. Naive Normal CIS (10 000000) (40 000)

15
15 Decreased Estimator Variability 0 2e-5 4e-5 6e-5 8e-5 1e-4 101214161820 p-value Scor e 10 repeats of sampling Naive Normal CIS ( 10x10 000000 ) ( 10x40 000 )

16
16 CIS - Summary General form – Wide range of probabilistic models Computationally efficient Handles low p-values accurately Available online, at: http://compbio.cs.huji.ac.il/CIS

17
17 Thank you http://compbio.cs.huji.ac.il/CIS Joint Work with: Nir Friedman Gal Elidan Tommy Kaplan

Similar presentations

OK

1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.

1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on 14 principles of management Ppt on network topologies Ibm si view ppt online Ppt on 3 idiots movie songs Ppt on cloud computing in healthcare Ppt on sbi life insurance Ppt on hong kong tourism Ppt on schottky diode leakage Ppt on dairy milk silk Ppt on global warming download