Two études on modularity

Two études on modularity
Ron Shamir Tel Aviv University Simons Networks workshop, April

Integrated analysis of two biological networks
David Amar NAR Feb 2014

A module map Input: networks H,G with same vertex set
Goal: summarize the key information in both in a module map Node – module: gene set highly connected in H Link – two modules highly interconnected in G

Generalizes clustering and biclustering

The winning alg Initial solution based on enumeration of maximal bicliques in G Global refinement based on probabilistic score of the whole map Successfully tested on Yeast H: PPI G: GI Yeast H: PPI+ G: DDR specific GI Lung cancer patients, H: co-expression, G: differential co-expression Software available

Flexible module discovery in 3-way data
David Amar, Daniel Yekutieli , Adi Maron-Katz, Talma Hendler Proc. ISMB/ECCB 2015, Bioinformatics July 15

Outline Intro: input and goals Probabilistic model Algorithm Results
Simulations Gene expression Brain fMRI Comparison to other approaches

Genes\voxels\parcels
Module discovery A fundamental task The standard case: a matrix Cluster: subset of rows Bicluster: subset row, subset of columns Subjects Genes\voxels\parcels Cluster Bicluster

Genes\voxels\parcels
Three-way data Time points A matrix for each subject Rows: measured objects Columns: time points Measured objects: Gene expression Activity at voxels / parcels in fMRI … Subjects Genes\voxels\parcels Kim J, Matthews NL, Park S., Via: Wikimedia Commons Kim J et al. Via: Wikimedia Commons

A single subject vs. multiple subjects
Time points Consistent signal across multiple subjects? Genes\voxels\parcels A module (e.g., a bicluster): Gene set; Time point set “Interesting signal” within ISA (Bergmann et al Physical Review E) Samba (Tanay et al PNAS) Plaid (Lazzeronni and Owen 2002, Stat. Sinica) Bimax (Prelic et al. 2006, Bioinformatics)

Consistent signal across multiple subjects? Time points Subjects Subjects Only a subset is relevant Genes\voxels\parcels

Consistent signal across multiple subjects? Subject 1 Subject 2 B B Subjects Only a subset is relevant Time points Subject-specific (asynchronicity) Gene set Same set across subjects: too rigid What if some are missing? What if subject s has additional activated pathways? A A A A C C Subject 5 D A D

Modules in 3-way data Core module:
A subset of the subjects  A subset of rows A (possibly different) subset of time points for every subject in  Subject-specific part of the module: A subset of the core module rows Additional rows per subject 5

(v,t,s) in a subject- specific module:
Modules in 3-way data II Want modules with unusually high / low values F0: background, F1: distribution within modules Binary data Probability of 1 within modules is much higher Real valued data: distinct normal distributions (v,t,s) in a subject- specific module: Zv,t,s ~ F1 Zv,t,s ~ F0

The generative model: scheme
Toy example Core module rows Relevant subjects Subjects: 1, 2, 5 Core rows Rows: core + subject specific Time points Subject-specific information Subject 1 Subject 2 Subject 5 Observed data D B B Zv,t,s A A A A A C C D

The posterior Z HSvs: Row v in the module of subject s?
Subject rows depend on the core and θ Markov model for the time points θ: Hyper-parameters Core rows, subject relevancy Data likelihood HSvs: Row v in the module of subject s? Z PS: Row relevant to subject s? Cst: Time point t in the module of subject s? Hv: Row v in the module? f: current implementation: Bernoulli (binary data) or normal

The Algorithm Initialization: Use a biclustering solution (Bimax, ISA)
TWIGS - Three-Way module Inference via Gibbs Sampling Initialization: Use a biclustering solution (Bimax, ISA) Improvement: Gibbs sampling Requires deriving all conditionals ~50 iterations suffice (<10sec) Local optimum Our algorithm starts from a solution produced using a standard biclustering algorithm, and then applies iterative improvement steps. In each step all parameters are fixed except a single one that is sampled according to its conditional probability. The order of parameters matches the subsections below. This order is repeated cyclically k times. The output of the process is the set of sampled values for each parameter in all iterations. We then extract the core modules and the subject-specific modules from this output Initial solution

Finding multiple modules
Algorithm filter Get a set of biclusters Run Gibbs on each one – get a set of modules M Remove redundancies Select large modules Algorithm masker Get the best bicluster Run Gibbs on it – get a module m Remove the signal of m from the data Z Repeat 1-4 until no module is found

Simulation results TWIGS = Bimax-Gibbs-masker
V=500, T=50, S=10, 5 core modules, ps=0.9, p0 = 0.01 (+ additional subject-specific rows in each module) TWIGS = Bimax-Gibbs-masker improves performance even when no subject-specific information is present

Gene Expression analysis
Septic Shock data (Parnel et al. 2013, Shock) 14 subjects, 4 measurements each Binarized matrix: fold change > 2 compared to time zero, genes TWIGS found 2 core modules Module 1: 11 subjects; 53 core genes Enriched with response to bacteria Module 2: 7 subjects, 62 core genes Enriched with T-cell activity

Core module 1: non-core enrichments
Red stripes: relevant time points Highlighted subjects did not survive.

Brain fMRI analysis Rest data: 20 subjects; 464 parcels; 94 time points (Vaisvaser 2013 Frontiers in Human Neuroscience) TWIGS found 4-5 core modules Modules partition the brain into functional regions known to be related to rest Subject-specific enrichments reveal additional connections among regions 4-5 core modules: (depending on the desired intensity; results are robust) 1 2 3 4 4 subjects had additional enrichment for attention  suggests a tendency to be more attentive to sensory stimuli during thought processes.

Brain fMRI results Subjects cover a different % of the core module
Subjects have different time points

Brain fMRI vs. GE results
Small, highly enriched core module Subject-specific modules of 100+ additional genes fMRI: Large core modules: each subject covers a part Subject-specific modules are relatively small Integrated analysis is crucial in both cases Subject-specific signal found in both Subject-specific signal less extensive in fMRI

Comparison to other methods: subject coverage
> 2-fold improvement in subject coverage

Comparison to other methods: Enrichment tests (GE)
Rank summarizes enrichment and redundancy

Summary acgt.cs.tau.ac.il/twigs/ A probabilistic model for analyzing 3-way data: Core + subject-specific modules Developed a Gibbs sampling algorithm Outperforms extant methods Even in “pure” biclustering tasks 2-fold improvement in subject coverage Successfully applied to gene expression and fMRI; Integrative analysis was crucial R implementation available Lots of room to improve

w postdocs available

Two études on modularity

Similar presentations

Presentation on theme: "Two études on modularity"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Two études on modularity

Similar presentations

Presentation on theme: "Two études on modularity"— Presentation transcript:

Similar presentations

About project

Feedback