Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cis-regulatory Modules and Module Discovery

Similar presentations


Presentation on theme: "Cis-regulatory Modules and Module Discovery"— Presentation transcript:

1 Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting

2 Cis-regulatory Modules and Module Discovery
The slides for module discovery are provided by Prof. Qing UCLA

3 Motif Discovery Background Motif (weight matrix) Mixture modeling 1 2
3 4 5 Mixture modeling

4 Difficulties in motif discovery in higher organisms
Upstream sequences are longer. Motifs are less conserved and shorter. Background sequence structures are more complicated. To solve the problem, utilize more biological knowledge in our model. 1) module structure 2) multiple species conservation

5 Cis-regulatory module
Combinatorial control of genes: cis-regulatory modules module

6 CisModule: modeling module structure (Zhou and Wong, PNAS 2004)
Module structure: consider co-localization of motif sites. B M S Motif 1 Motif 2 Motif 3 Hierarchical Mixture modeling  K: # of motifs

7 Parameters and missing data
Missing data problem. K # of motifs l Module length S Set of sequences M Indicators for a module start A Indicators for a motif site start Background model Weight matrices for motifs W Motif widths r Probability of a module start q Probability of starting a motif site Given  Observed data Missing data Parameters Ψ

8 Bayesian inference by posterior sampling
Module-motif detection Given Θ, r, q, and W, Sample modules: 2) Within each module, sample motif sites: M=1 M=0 Parameter Update Given M and A, 1) Infer Θ from aligned sites. 2) Update r, q and W. Aligned TTTGC TATCC CTTGC TTTAC GTTGC

9 Module sampling Want to sample from P (M | S, Ψ), need to calculate
Denote Forward summation: Module: Background:

10 Module sampling Backward sampling How to calculate

11 Posterior inference Motif sites: marginal posterior probability of being a motif start position > 0.5. Modules: marginal posterior probability of being within a module > 0.5.

12 Simulation study Generate 30 data sets independently, each contains:
1) 20 sequences, each of length 1000; 2) 25 modules, with length 150; 3) each module contains 1 E2F site, 1 YY1 site, and 1 cMyc site. CisModule Do not consider module Motifs Fail TP FP E2F 0.03 17.9 7.5 0.37 17.1 11.6 YY1 0.07 16.0 8.7 0.20 11.0 cMyc 15.7 9.9 0.63 13.6 12.4

13 Example: Discovery of tissue-specific modules in Ciona
Sidow lab Collected 21 genes that are co-expressed during the development of muscle tissue in Ciona. Want to find motifs and modules in the upstream sequences (average length = 1330) of these genes. Found 3 motifs in 28 modules (4860 bps). Are they real motifs that determine the gene expression??

14 Experimental validation
Positive element: the shortest sufficient and non-overlapping sequence that drives strong expression in muscle: average length of 289 bps.

15 Experimental validation
70% of our predicted motif sites are located in the positive elements!

16 Other tools Gibbs Module Sampler (Thompson et al. Genome Res. 2004)
EMCMODULE (Gupta and Liu, PNAS, 2005)

17 Phylogenetic Footprinting

18 Functional elements tend to be conserved across species
For example, exons are conserved due to the selection pressure. Introns and intergenic regions are less likely to be conserved.

19 Phylogenetic footprinting
Miller et al. Annu. Rev. Genomics Hum. Genet. 2004

20 Incorporating cross-species conservation into motif discovery
A threshold method (Wasserman et al. Nature Genetics, 2000) STEP1: construct cross-species alignment STEP2: compute conservation measure from the alignment STEP3: Non-conserved regions are filtered out STEP4: Gibbs motif sampler is applied to conserved regions of the target genome

21 Phylogenetic footprinting & motif discovery
CompareProspector (Liu Y. et al. Genome Res. 2004) STEP1: construct cross-species alignment STEP2: compute conservation measure (window percent identity, WPID) from the alignment STEP3: multiply the likelihood ratio at a position by the corresponding WPID, thus likelihood landscape is changed to favor conserved sites STEP4: apply a Gibbs motif sampler based algorithm

22 Phylogenetic footprinting & motif discovery
Evolutionary model based approach EMnEM (Moses et al. 2004) PhyME (Sinha et al. 2004) PhyloGibbs (Siddharthan et al. 2005) Tree Sampler (Li and Wong, 2005)

23 Incorporating cross-species conservation into motif discovery
PhyloCon(Wang and Stormo, Bioinformatics, 2003) STEP 1: construct alignment among orthologous sequences; STEP 2: convert conserved regions into profiles; STEP 3: use profiles in the first sequence as seeds; STEP 4: find matches of each seed in the second sequence; STEP 5: update seeds; STEP 6: repeat step 2 and 3 for all sequences.

24 Phylogenetic footprinting & module discovery
Multimodule (Zhou and Wong, The Annals of Applied Statistics, 2007)

25 Multimodule Module structure of each sequence is modeled by an HMM.
Couple HMMs via multiple alignment: Aligned states are coupled and collapsed into one common state. Uncoupled states: similar to single species model. Coupled states: evolutionary model.

26 Comparing with other methods
Three data sets with experimental validation reported previously, which contain 9 known motifs with 152 validated sites. CompareProspector (Liu et al. 2004): conservation score PhyloCon (Wang and Stormo 2003): progressive alignment of profiles EMnEM (Moses et al. 2004): Phylogenetic motif discovery CisModule (Zhou and Wong 2004): Single-species module discovery.

27 Comparing with other methods
# known motifs identified For correctly identified motifs by each method # predicted sites # overlaps Sensitivity (%) Specificity (%) CompareProspector 7 75 36 24 48 PhyloCon 3 50 26 17 52 EMnEM 6 130 44 29 34 CisModule 5 110 35 23 32 MultiModule 8 157 79 # of known sites = 152


Download ppt "Cis-regulatory Modules and Module Discovery"

Similar presentations


Ads by Google