Modeling Perturbations using Gene Networks

Modeling Perturbations using Gene Networks
Nirmalya Bandyopadhyay, Manas Somaiya, Tamer Kahveci and Sanjay Ranka Bioinformatics Lab., University of Florida

Gene interaction through regulatory networks
Incoming neighbor K-Ras Raf MEK ERK JNK RalGDS Ral RalBP1 PLD1 Cob42Rac Outgoing neighbor Genes interact through regulatory networks. Gene networks: The genes are nodes and the interactions are directed edges. Neighbors incoming neighbors and outgoing neighbors. A gene can changes the state of other genes Activation Inhibition

Perturbation experiments
Gene A Gene B Gene C Gene D Gene E Gene F Before perturbation Perturbation experiment Stimulants (radiation, toxic elements, medications) are applied. Gene expressions are measured before and after the perturbation. Control group Non-control group Two sets of genes Differentially expressed (DE). Equally expressed (EE). Gene A Gene B Gene C Gene D Gene E Gene F After perturbation Differentially expressed genes

Primary and secondary affects of perturbation
Primarily affected genes ERK K-Ras Raf MEK JNK Secondarily affected genes RalGDS Ral RalBP1 Cob42Rac PLD1 Primarily affected genes : Directly affected by perturbation. Secondarily affected genes : Primarily affected genes affect some other genes.

Problem & method Input: Gene expression (control and non-control).
Problem: Analyzing the primary and secondary affects of the perturbation Estimate probability that a gene is DE because of the perturbation or because of the other genes (incoming neighbors)? What are the primarily affected genes? Method Probabilistic Bayesian method, where we employ Markov Random Field to leverage domain knowledge. Metagene : Perturbation is modeled as a new gene.

Notation Observed variables Microarray datasets:
A single gene gi: All genes: Neighborhood variables Hidden variable State variable Interaction variable: control non-control

Problem formulation Input to the problem: Goal:
Microarray expression: Y Gene network V = {G, W} G = {g0, g1, g2, …, gM} g0 is metagene. Goal: Estimate the density p(Xij| X- Xij, Y, V, Wij = 1 ) for all Wij Note: A higher value for p(Xij =1| X- Xij, Y, V, Wij = 1 ) indicates a higher chance that gj is affected by gi

Bayesian distribution
We propound a Bayesian model as it allows us to incorporate our beliefs into the model. The joint probability distribution over X We can derivate the density of Xij , p(Xij| X- Xij, Y, V, Wij =1) from the joint density function. Posterior density Likelihood density Prior density

Prior density function : Markov random field
MRF is an undirected graph Ψ = (X, E). X = {Xij} represents an edge in the gene network. E = {(Xij, Xpj)| Wpi = Wij= 1} U {(Xij, Xik) | Wjk= Wij = 1} An edge in MRF corresponds to two edges in the gene network. (X23, X25) corresponds to (g2, g3) and (g3, g5) g0 g2 g3 g1 g4 g5 (a) Perturbation experiment. X01 X02 X03 X05 X04 X12 X23 X35 X14 X13 X25 (b) Markov random field graph.

Prior density function: Feature functions
Two beliefs relevant to our model: A gene can affect the state of its outgoing neighbors. The metagene g0 can affect the states of all other genes. We incorporate these beliefs into the MRF graph using four feature functions. Feature function: Boolean function over the nodes of MRF. It encapsulates the properties of the graph. It allows us to introduce our belief on the graph.

Feature Functions Unary: Captures the frequency of Xij
Binary: Captures the two beliefs. Prior density function Left Equality Right Equality Feature functions

Binary feature functions
g1 g2 g3 g4 1 2 3 4 (a) Gene network Right equality for X23 X12 (a) Left equality f3(Xij, Xpj) (b) Right equality f4(Xij, Xik) X23 X13 X24 1 2 3 4 Left equality for X23 X34 (a) MRF network

Likelihood density function
Assumption: For a DE gene data points in yi and yi’ follow different distribution. For an EE gene data points in Yi = yi U yi’ follow a single distribution. Likelihood Hierarchy For a control/non-control group For a gene For an interaction

Objective function optimization
Direct optimization of the objective function is very difficult. So, we optimize an approximate version called pseudo-likelihood. Differential evolution Obtain an initial estimate of state variables. Estimate parameters that maximize the data likelihood. Estimate parameters that maximize the prior density. Student’s t Estimate parameters that maximize the pseudo-likelihood density. Rank the DE genes based on the likelihood w.r.t the metagene. ICM

Dataset and experimental setup
Real: Adapted from Smirnov et al. generated using 10 Gy ionizing radiation over immortalized B cells obtained from 155 doner. Real: The genetic interactions were collected from KEGG database. Real/Synthetic: We created synthetic data to simulate the perturbation experiment based on the real dataset. Experimental setup Implemented our method in MATLAB and java. Ran our code on a AMD Opteron 2.4 Ghz workstation with 4GB memory. We compared our method to SSEM and Student’s t test.

Evaluation of biological significance
Investigate the support for primarily affected genes. Nine out of the ten highest ranked genes have significant biological evidence that they are impacted by radiation. List of top 25 genes that are mostly affected by external perturbation. PGF IL8RB FOSL1 F2R PPM1D MDM2 CDKN1A TNC PLXNB2 EPHA2 DDB2 TP53I3 PLK1 TNFSF9 ADRB2 MAP3K12 JUN SORBS1 LRDD SDC1 MYC PRKAB1 EI24 DDIT4 FAS

Evaluation of the ranking of neighbor genes
Train our model on the training dataset and use the learnt parameters in the test dataset. For each gene create a rank of its incoming neighbors in terms of responsibility. Estimate the difference of ranking obtained from training and test dataset. Frequency of distance of rankings over training and testing data. The figure shows the difference is very close to zero.

Comparison with other methods
Created synthetic data by hypothetically perturbing some genes as primarily and secondarily affected. Obtained ranking from each of the methods and designate the top ranked genes as primarily affected genes. The two graphs demonstrate that our method performs best among the three. (a) Gap = 0.2 x σ (b) Gap = 0.6 x σ

Conclusions Our method could find primarily affected genes with high accuracy. It achieved significantly better accuracy than SSEM and the student’s t test method. Our method produces a probability distribution rather than a fixed binary decision.

Acknowledgement This work was supported partially by NSF under grants CCF and IIS

Thank you!

Appendix

Sensitivity to the gap between primary and secondary effects
Changed the gap between the primary and secondary effects. A higher gap implies more pronounced distinction between two kinds of effects. Our method performs better than the other two methods most of the cases. Comparison of accuracies with SSEM and Student’s t test while varying the ratio of gaps of primarily and secondarily affected genes.

Modeling Perturbations using Gene Networks

Similar presentations

Presentation on theme: "Modeling Perturbations using Gene Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling Perturbations using Gene Networks

Similar presentations

Presentation on theme: "Modeling Perturbations using Gene Networks"— Presentation transcript:

Similar presentations

About project

Feedback