Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha.

Presentation on theme: "Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha."— Presentation transcript:

Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha

New York University Department of Biology Gloria Coruzzi Mike Chou Andrew Kouranov Laurence Lejay Bud Mishra Marco Antoinotti Marc Rejali Courant Institute of Math & Computer Sciences Dennis Shasha

LIGHT NH 4 + Amino Acids Sugar Photosynthesis Asp Glu Asn Gln

Light, Carbon and Amino acids differentially regulate N-assimilation genes Light Carbon GS2 Gln C :N C5:N2 Light Carbon AS1 Asn C: N C4:N2 Amino acids

Goal: Figure out the Circuit for many genes Identify Arabidopsis mutants defective in C:N sensing Forward genetics: Selections for C:N sensing mutants Reverse genetics: Mutants in candidate C:N signaling genes Ultimate Goal: Virtual plant… (frankenfoods) A Multi-factor Approach to C:N sensing in plants. Identify how a combination of interactions of “inputs” (Light, Carbon, & Nitrogen) affects gene regulation using Combinatorial Design and Genome Chip analysis.

A Combinatorial Approach to discovering interactions Inputs:*Light *Starvation to Various Nutrients *Carbon *Inorganic N (NO3/NH4) *Organic N (Glu) *Organic N (Gln) If inputs are take binary values (first approximation) 6 binary (+/-) inputs= 2 6 or 64 input combinations (or treatments) Use combinatorial design to reduce number of treatment combinations required to effectively cover the experimental space

Combinatorial design generates a subset of the 64 treatments that give “good” approximation of the entire experimental space. For every pair of “inputs”, all four combinations of binary variables are tested: Example; NO 3 and Carbon have four possible combinations +NO 3 +Carbon; +NO 3 -Carbon; -NO 3 +Carbon; -NO 3 -Carbon Each combination of inputs is present in at least one treatment of experiments predicted by combinatorial design ACTIVIST DATA MINING Don’t study the experiments (only). Change them.

EXPT 1 PIVOT LIGHT LANELIGHTSTARVECARBONNO3NH4GLUGLN 1LIGHTN0000 2 Y0L0H 3 YL0H0 4 NLLHH 5 N00HH 6 NLL00 7DARKN0000 8 Y0L0H 9 YL0H0 10DARKNLLHH 11DARKN00HH 12DARKNLL00 “Combinatorial design” predicts 12 conditions to test the effect of Light in all combinations of Starvation, Carbon, and Nitrogen

Find “minimal pairs” of treatments that are the same except in one input (e.g. Light) to measure its effect on a dependent variable (gene) (e.g. AS1) PIVOTDependent Variable (Gene) EFFECTEvidence = Minimal pair treatments LITESTARVECARBONNO3GLU LIGHTAS1repress 4_8 L_D N L0H Analyze a series of minimal pair treatments using one input (e.g. Light) as a “pivot”, to determine the effect of light on a dependent variable (e.g. AS1) under a variety of carbon and nitrogen combinations. If consistent, likely always true. “Pivot” analysis of gene expression data from C:N treatments

PIVOTdependentEFFECTEvidence= Minimalpair treatments LITESTARVECARBONNO3/NH4GLU LIGHTAS1repress1_5L_DY000 LIGHTAS1repress2_6L_DYL00 LIGHTAS1repress3_7L_DYLL0 LIGHTAS1repress4_8L_DNL0H LIGHTAS1repress10_14L_DN000 LIGHTAS1repress11_15L_DYL00 LIGHTAS1repress12_16L_DYLL0 LIGHTAS1repress13_17L_DYL0H LIGHTGS2induce1_5L_DY000 LIGHTGS2induce2_6L_DYL00 LIGHTGS2induce3_7L_DYLL0 LIGHTGS2induce4_8L_DYL0H LIGHTGS2induce10_14L_DN000 LIGHTGS2induce11_15L_DNL00 LIGHTGS2induce12_16L_DNLL0 LIGHTGS2induce13_17L_DYL0H LITE represses AS1 & induces GS2 under a variety of C:N conditions

PIVOTGeneEFFECT Evidence= Minimalpair Treatments LIGHTSTARVECarbon NO3/NH4 GLU AS1induce2_4LYL00_H GLUAS1induce6_8DYL00_H GLUAS1induce15_17DYL00_H GLUAS1induce19_21DNL00_H GLUAS1induce23_25LNL00_L GLUAS1induce26_28LYL00_L GLUAS1induce30_32LYL00_L GLUGS2repress2_4LYL00_H GLUGS2repress6_8DYL00_H GLUGS2repress11_13LYL00_H GLUGS2repress15_17DYL00_H GLUGS2repress19_21DNL00_H GLUGS2repress20_22DNLL0_H GLUGS2repress23_25LNL00_L GLUGS2repress30_32LYL00_L GLU induces AS1 & represses GS2 under a variety of conditions

Underlying Method: combinatorial design Combinatorial design: Inspired by work in software testing by David Cohen, Siddhartha Dalal, Michael Fredman and Gardner Patton at Bellcore/Telcordia. Their problem: how to test a good set of inputs to a program to discover whether there are any bugs. Not program coverage, but input coverage. Not all input combinations, but all combinations of every pair of of input variables. Hypothesis: every input combination should give same output: no error. If true for designed subset, then program is ok.

Underlying Method: combinatorial design 2 Scientific question: does input X induce (resp. repress) the output? If so, then, regardless of the other inputs, X should induce. So, choose X = low and then a combinatorial design of the other inputs. Then choose X = high and then the same combinatorial design of the other inputs. If for each context c in the design (high,c) has more output than (low,c) -- minimal pair -- then X is inductive.

Underlying Methods: adaptive design What happens when X isn’t uniformly inductive or repressive? Suppose X shows induction normally, but repression occasionally. That is for most c values (low, c) vs. (high, c) shows induction, but for one c’ (low,c’) vs. (high, c’) shows repression. Then study difference between those c values showing induction that are closest to c’ and design experiments to reduce those differences.

Conclusions About Methodology Design/don’t wait: Use the data you are given, sure, but don’t be shy to ask for more. Combinatorial Design can help test a hypothesis: e.g. 10 three-valued variables require 59,049 experiments to cover whole space. Combinatorial design can reduce this to 27. Adaptation is easy: Study differences between normal cases and abnormal ones to discover fine structure.

Download ppt "Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha."

Similar presentations