Functional genomics and inferring regulatory pathways with gene expression data
Principle of Epistasis Analysis Determines order of influence Used to reconstruct pathways
Experimental Design: Single vs Double-Gene Deletions
Epistasis Analysis Using Microarrays to Determine the Molecular Phenotypes Van Driessche et al. Epistasis analysis with global transcriptional phenotypes. Nature Genetics 37, 471 - 477 (2005) Time series expression (0-24hrs) every 2hrs
Pathway Reconstruction Expression data Known pathway Inferred pathway
Expression Profiling in 276 Yeast Single-Gene Deletion Strains “The Rosetta Compendium” Only 19 % of yeast genes are essential in rich media, Giaever et. al. Nature (2002)
Clustered Rosetta Compendium Data
Gene Deletion Profiles Identify Gene Function and Pathways
Systematic phenotyping Barcode (UPTAG): CTAACTC TCGCGCA TCATAAT yfg1D yfg2D yfg3D … Deletion Strain: Growth 6hrs in minimal media (how many doublings?) Rich media Harvest and label genomic DNA
Microarrays for functional genomics Hillenmeyer M, et al., Science 2008
Explaining deletion effects This is how we display one effect, but there are 276 deletion profiles
Relevant Relationships (that need to be explained) Rosetta compendium used 28 deletions were TF (red circles) 355 diff. exp. genes (white boxes) P < 0.005 755 TF-deletion effects (grey squiggles)
Evidence for pathway inferrence Step 1: Physical Interaction Network Y2H, chIP-chip Step 2: Integrate state data Measure variables that are a function of the network (gene expression) Monitor these effects after perturbing the network (TF knockouts). The kinds of models Not an original idea Which active Disease (later) Consequence of loss
Inferring regulatory paths = Direct = Indirect
Annotate: inducer or repressor
Annotate: Inducer or Repressor
Computational methods Problem Statement: Find regulatory paths consisting of physical interactions that “explain” functional relationship Method: A probabilistic inference approach Yeang, Ideker et. al. J Comp Bio (2004) To assign annotations Formalize problem using a factor graph Solve using max product algorithm Kschischang. IEEE Trans. Information Theory (2001) Mathematically similar to Bayesian inference, Markov random fields, belief propagation
Test & Refine
Inferred Network Annotations Genes go down GLN3? A network with ambiguous annotation
Inferring Regulatory Role 50/132 protein-DNA interactions had been confirmed in low-throughput assays (Proteome BioKnowledge Library) Inferred regulatory roles (induction or repression) for 48 out of 50 of these interactions agreed with their experimentally determined roles. (96%, binomial p-value < 1.22 × 10-7)
Target experiments to one network region Expression for: sok2, hap4, msn4, yap6
Expression of Msn4 targets Average Z-score Msn4 target genes Used a number of randomized trials to determine significance Squigglies Negative control
Expression of Hap4 targets Show msn4
Yap6 targets are unaffected
Refined Network Model Caveats Assumes target genes are correct Only models linear paths Combinatorial effects missed Measurements are for rich media growth
Using this method of choosing the next experiment Is it better than other methods? How many experiments? Run simulations vs: Random Hubs Simulated performance of our method for picking experiments
Simulation results # simulated deletions profiles used to learn a “true” network