Download presentation

Presentation is loading. Please wait.

1
Cis/TF discovery for Arabidopsis Aristotelis Tsirigos email: tsirigos@cs.nyu.edu NYU Computer Science

2
2 Outline Input data The proposed model Results on yeast Results on arabidopsis Unsupervised pattern discovery

3
3 Input data

4
4 ~23,000 genes 25 points 1,500bp upstream gctaagc...

5
5 Normalization ~23,000 genes 25 points 1,500bp upstream normalize columns (mean=0) gctaagc...

6
6 Filtering ~23,000 genes 25 points 1,500bp upstream normalize columns (mean=0, stdev=1) ~5,000 genes 25 points gctaagc... motif bitmap 001011… filter out low-variance

7
7 The proposed model

8
8 Assumption 1 A single TF binds on a single cis element (motif) Source: U.S. Department of Energy Genomics (http://doegenomestolife.org)

9
9 Assumption 2 TFs regulate genes sharing a motif only on subset of conditions

10
10 Assumption 2 (cont’d) TFs regulate genes sharing a motif only on subset of conditions

11
11 Assumption 3 The TF expression correlates with the sum of the partially correlating expression patterns

12
12 Objective For each cis element (motif): –discover groups of co-regulated genes –compute aggregate motif expression For each TF: –find best correlating motifs

13
13 The algorithm – step 1 ~5,000 genes step 1: clustering 25 points............

14
14 The algorithm – step 2 ~5,000 genes step 1: clustering 25 points step 2 for any motif compute its gene set......

15
15 The algorithm – step 3 ~5,000 genes step 1 clustering 25 points step 2 for any motif compute its gene set step 3 compute the distribution of its genes into the clusters......

16
16 The algorithm – step 4 ~5,000 genes step 1 clustering 25 points step 2 for any motif compute its gene set step 3 compute the distribution of its genes into the clusters step 4 determine overrepresented clusters using t-test......

17
17 The algorithm – final step ~5,000 genes 25 points final step compute motif aggregate expression 25 points......

18
18 Yeast

19
19 Example TF: BAS1 RANK MOTIF OCCUR corr score 1 gactcg 46 0.6446 66 2 cgagtc 46 0.6446 16 3 gactaa 163 0.6381 66 4 ttagtc 163 0.6381 33 5 tcggct 87 0.6374 33... 12 gctagt 110 0.6268 33 13 agtcac 137 0.6262 83 p-value=0.079... 27 gagtca 136 0.6192 100 p-value=0.004 Using cis/TF version 1:

20
20 Example TF: BAS1 Using cis/TF version 2: RANK MOTIF OCCUR signf corr score 1 ctgact 122 0.62 0.66 33 2 agtcag 122 0.62 0.66 83 3 ggttta 187 0.62 0.63 50 4 taaacc 187 0.62 0.63 33 5 gagtca 136 0.68 0.63 100 p-value=0.002 6 tgactc 136 0.68 0.63 33 7 atttga 378 0.64 0.63 33 8 tcaaat 378 0.64 0.63 50 9 agtggc 126 0.66 0.61 50 10 gccact 126 0.66 0.61 50

21
21

22
22

23
23

24
24

25
25

26
26 Conclusions Advantages of version 2: gives ability to focus on gene cluster that correlates best with a given TF thus, increases overall correlation and motif rank offers a measure of motif significance can be extended to pairs of TFs/motifs

27
27 Arabidopsis

28
28 Procedure Permute gene cluster assignment Compile list of putative motifs Compute significance score of known motifs Repeat 1000 times Compute p-value of the score

30
30 TF discovery? Need data for training! (TFs and their associated binding cites) Parameters to be estimated: number of clusters motif size & degeneracy

31
31 Pattern discovery

32
32 TF-driven pattern discovery Unsupervised pattern discovery Find groups of genes partially correlating with TF Apply statistical filter Look for over-represented motifs in genes’ upstream regions Data for validation?

33
33

34
34 Pattern discovery example

35
35 “Predicting Gene Expression form Sequence” Beer & Tavazoie, Cell 2004 Group genes in 49 clusters Predict gene cluster using motifs discovered in its upstream region

36
36

37
37 Conclusions

38
38 Conlusions Two options: Supervised training: –uses background knowledge to construct model –needs more training data Unsupervised pattern discovery: –minimal model bias (no prior knowledge) –needs more ‘expert’ help to filter results

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google