Download presentation

Presentation is loading. Please wait.

Published byRebecca Rowe Modified over 3 years ago

1
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Discovery of Explanatory Process Models Thanks to K. Arrigo, D. Billman, S. Borrett, M. Bravo, W. Bridewell, S. Dzeroski, and L. Todorovski for their contributions to this research, which is funded by a grant from the National Science Foundation.

2
Computational Models of Scientific Discovery numeric laws in physics and chemistry (Langley et al., 1983) qualitative conjectures in number theory (Colton et al., 2000) structural models of molecules (Langley et al., 1987) causal models of urea biochemistry (Kulkarni & Simon, 1988) reaction pathways in catalytic chemistry (Valdes-Perez, 1994) The past three decades have seen multiple accounts of scientific discovery cast in terms of problem-space search, including: This work has focused on discovering knowledge cast in the same formalisms as used by scientists themselves.

3
Ecosystem Dynamics in the Ross Sea d[phyto,t,1] = 0.307 phyto 0.495 zoo + 0.411 phyto d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo d[detritus,t,1] = 0.307 phyto + 0.251 zoo + 0.385 0.495 zoo 0.005 detritus d[nitro,t,1] = 0.098 0.411 phyto + 0.005 detritus Formal accounts of ecosystem dynamics are often cast as sets of differential equations. Here four equations describe the concentrations of phytoplankton, zooplankton, nitrogen, and detritus in the Ross Sea over time. Such models can match observed variables with some accuracy.

4
A Deeper Account of Ross Sea Dynamics d[phyto,t,1] = 0.307 phyto 0.495 zoo + 0.411 phyto d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo d[detritus,t,1] = 0.307 phyto + 0.251 zoo + 0.385 0.495 zoo 0.005 detritus d[nitro,t,1] = 0.098 0.411 phyto + 0.005 detritus As phytoplankton uptakes nitrogen, its concentration increases and the nitrogen decreases. This continues until the nitrogen is exhausted, which leads to a phytoplankton die off. This produces detritus, which gradually remineralizes to replenish nitrogen. Zooplankton grazes on phytoplankton, which slows the latters increase and also produces detritus.

5
Processes in Ross Sea Dynamics d[phyto,t,1] = 0.307 phyto 0.495 zoo + 0.411 phyto d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo d[detritus,t,1] = 0.307 phyto + 0.251 zoo + 0.385 0.495 zoo 0.005 detritus d[nitro,t,1] = - 0.098 0.411 phyto + 0.005 detritus As phytoplankton uptakes nitrogen, its concentration increases and the nitrogen decreases. This continues until the nitrogen is exhausted, which leads to a phytoplankton die off. This produces detritus, which gradually remineralizes to replenish nitrogen. Zooplankton grazes on phytoplankton, which slows the latters increase and also produces detritus.

6
Processes in Ross Sea Dynamics d[phyto,t,1] = 0.307 phyto 0.495 zoo + 0.411 phyto d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo d[detritus,t,1] = 0.307 phyto + 0.251 zoo + 0.385 0.495 zoo 0.005 detritus d[nitro,t,1] = 0.098 0.411 phyto + 0.005 detritus As phytoplankton uptakes nitrogen, its concentration increases and the nitrogen decreases. This continues until the nitrogen is exhausted, which leads to a phytoplankton die off. This produces detritus, which gradually remineralizes to replenish nitrogen. Zooplankton grazes on phytoplankton, which slows the latters increase and also produces detritus.

7
A Process Model for the Ross Sea model Ross_Sea_Ecosystem variables: phyto, zoo, nitro, detritus observables: phyto, nitro process phyto_loss equations:d[phyto,t,1] = 0.307 phyto d[detritus,t,1] = 0.307 phyto process zoo_loss equations:d[zoo,t,1] = 0.251 zoo d[detritus,t,1] = 0.251 zoo process zoo_phyto_grazing equations:d[zoo,t,1] = 0.615 0.495 zoo d[detritus,t,1] = 0.385 0.495 zoo d[phyto,t,1] = 0.495 zoo process nitro_uptake equations:d[phyto,t,1] = 0.411 phyto d[nitro,t,1] = 0.098 0.411 phyto process nitro_remineralization; equations:d[nitro,t,1] = 0.005 detritus d[detritus,t,1 ] = 0.005 detritus We can formalize these links by recasting the equations as a quantitative process model. Such a model is equivalent to a standard differential equation model, but it makes explicit assumptions about processes that are involved. Each process indicates that certain terms in equations must stand or fall together.

8
process exponential_growth variables: P {population} equations: d[P,t] = [0, 1, ] P process logistic_growth variables: P {population} equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) process constant_inflow variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, ] process consumption variables: P1 {population}, P2 {population}, nutrient_P2 equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, d[P2,t] = [0, 1, ] P1 nutrient_P2 process no_saturation variables: P {number}, nutrient_P {number} equations: nutrient_P = P process saturation variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, ]) Inductive Process Modeling model AquaticEcosystem variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto observables: nitro, phyto, zoo process phyto_exponential_growth equations: d[phyto,t] = 0.1 phyto process zoo_logistic_growth equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) process phyto_nitro_consumption equations: d[nitro,t] = 1 phyto nutrient_nitro, d[phyto,t] = 1 phyto nutrient_nitro process phyto_nitro_no_saturation equations: nutrient_nitro = nitro process zoo_phyto_consumption equations: d[phyto,t] = 1 zoo nutrient_phyto, d[zoo,t] = 1 zoo nutrient_phyto process zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5) Heuristic Search observations generic processes process model phyto, nitro, zoo, nutrient_nitro, nutrient_phyto variables

9
Generic Processes for Aquatic Ecosystems generic process exponential_lossgeneric process remineralization variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus} parameters: [0, 1] parameters: [0, 1] equations:d[S,t,1] = 1 S equations:d[N, t,1] = D d[D,t,1] = Sd[D, t,1] = 1 D generic process grazinggeneric process constant_inflow variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient} parameters: [0, 1], [0, 1] parameters: [0, 1] equations:d[S1,t,1] = S1 equations: d[N,t,1] = d[D,t,1] = (1 ) S1 d[S2,t,1] = 1 S1 generic process nutrient_uptake variables: S{species}, N{nutrient} parameters: [0, ], [0, 1], [0, 1] conditions:N > equations:d[S,t,1] = S d[N,t,1] = 1 S Our current library contains about 30 generic processes, including ones with alternative functional forms for loss and grazing processes. These form the building blocks from which to compose models.

10
An Approach to Process Model Construction 1. Instantiate known generic processes with specific variables, subject to type specifications; 2. Combine these instantiated processes into candidate model structures, limited by size and connectivity; 3. For each model structure, carry out search through parameter space to find good coefficients; 4. Return the parameterized model with the best overall score (e.g., lowest squared error). We have developed a number of IPM systems that construct process models from generic components in four stages: We have reported variants on this approach in numerous papers (Todorovski et al., AAAI-2005; Bridewell et al., MLj, 2008).

11
Estimating Parameters in Process Models 1. Selects random initial values that fall within ranges specified in the generic processes; 2. Improves these parameters using the Levenberg-Marquardt method until it reaches a local optimum; 3. Generates new candidate values through random jumps along dimensions of the parameter vector and continue search; 4. If no improvement occurs after N jumps, it restarts the search from a new random initial point. To estimate the parameters for each generic model structure, the IPM algorithm: This multi-level method gives reasonable fits to time-series data from a number of domains, but it is computationally intensive.

12
Results on Training Data from Ross Sea We provided IPM with 188 samples of phytoplnkton, nitrate, and ice measures taken from the Ross Sea. From 2035 distinct model structures, it found accurate models that limited phyto growth by the nitrate and the light available. Some high-ranking models incorporated zooplankton, whereas others did not.

13
Results on Test Data from Ross Sea Generalization to a second years data benefited from treating initial zooplankton concentration as a free model parameter. Another good-fitting model suggested that the nitrogen to carbon ratio varies as a function of available light.

14
Results on a Protist Ecosystem We also ran the system on protist data from Villeaux (1979), using 54 samples of two variables (P. aurelia and P. nasutum). In this run, IPM considered a space of 470 distinct model structures and reproduced basic trends.

15
Results on Rinkobing Fjord Data from a Danish fjord included measurements on fjord height, sea level, water inflow, and wind direction and speed. We used 1100 samples for training and 551 samples for testing over a space of 32 model structures.

16
Results on Biochemical Kinetics We also ran IPM on 14 samples of six chemicals involved in glycolysis from a pulse response study. Here the system considered some 172 model structures. The best model fit the data but reproduced only part of the known pathway.

17
Results on Battery Data from Space Station Data from the Space Station batteries included current, voltage, and temperature, with resistance and state of charge unobserved. We used 6000 samples for training and 2640 samples for testing over a space of 162 model structures.

18
specify a quantitative process model of the target system; display and edit the models structure and details graphically; simulate the models behavior over time and situations; compare the models predicted behavior to observations; invoke a revision module in response to detected anomalies. Because few scientists want to be replaced, we are developing an interactive environment, P ROMETHEUS, that lets users: The environment offers computational assistance in forming and evaluating models but lets the user retain control. Interfacing with Scientists For more details about P ROMETHEUS, see Bridewell et al. (K-CAP-2007).

19
Viewing a Process Model Graphically

20
Viewing a Process Model as Equations

21
Adding a Process Manually

22
Requesting Automatic Model Revision

23
Results of Automatic Model Revision

24
liked the ability to alter models at an abstract process level; appreciated the ability to simulate models and plot variables; believed the environment would be a good teaching tool; had some interference from STELLAs graphical notation; felt system would be more useful if it supported PDE models; indicated that some model structures were implausible. An Initial User Study We asked three oceanographers to use P ROMETHEUS to revise a model of the Ross Sea ecosystem; this study revealed that they: We are addressing the last two issues in our next version of the modeling environment.

25
Knowledge and Search in Science Traditional treatments of problem solving hold that knowledge reduces search. But adding generic processes leads to a combinatorial increase in the number of model structures. Yet scientists are not overwhelmed by the size of their model spaces and they reject many model structures as unacceptable. This suggests two distinct forms of scientific knowledge – components of models and constraints on their combination. This distinction seldom occurs in the literature, but it appears central to understanding scientific explanation.

26
Inducing Process Models with Constraints represent modular constraints on process combinations use these constraints to eliminate unacceptable models reduce search through the model space, which leads to far more efficient model construction produces little or no increase in generalization error improves the comprehensbility of generated models We have extended our framework for process model induction to: The resulting system (SC-IPM) offers a more complete account of explanatory scientific discovery.

27
Four Types of Process Constraints two processes must always occur together producer(P) growth(P), loss(P) exactly one of a set of processes must be in a model producer(P), grazer(G) lotka-volterra(P, G), ivlev(P, G), watts(P, G) at most one of a set of processes can appear in a model producer(P), energy(E) photoinhibition(P, E) processes must always appear in a model nutrient(N), detritus(D) nutrient-mixing(N), remineralization(N, D) The SC-IPM system supports four distinct types of constraints: We hypothesize that these constraint types are sufficient to guide process modeling in many scientific domains.

28
no constraints constraints R 2 = 0.92 R 2 = 0.97 variance explained Experimental Benefits of Constraints

29
computational scientific discovery (e.g., Langley et al., 1983) qualitative physics and simulation (e.g., Forbus, 1984) constraint-satisfaction methods (Freuder & Macworth, 1994) languages for scientific simulation (e.g., STELLA, MATLAB ) interactive tools for data analysis (e.g., Schneiderman, 2001) Intellectual Influences Our approach to computational discovery incorporates ideas from many traditions: However, it combines these ideas in novel ways to assist in the construction of explanatory scientific models.

30
Directions for Future Research produce additional results on other scientific data sets develop more efficient methods for fitting model parameters extend framework to handle partial differential equations explore evaluation metrics like match to trajectory shape use knowledge of subsystems to support large-scale modeling improve usability of the P ROMETHEUS modeling environment Despite our progress to date, we need further work in order to: Taken together, these will make inductive process modeling a more robust approach to scientific model construction.

31
Key Contributions incorporates a formalism that is familiar to many scientists; takes into account background knowledge about the domain; produces meaningful results from moderate amounts of data; generates models that explain rather than describe observations; provides an interactive environment for model construction. In summary, our work on computational model construction has produced an approach that: Our IPM systems search differently from human scientists, but they rely on the same kinds of processes and constraints.

32
End of Presentation

33
Results with Inductive Process Modeling acquatic ecosystems protist dynamics hydrology biochemical kinetics

34
The Task of Inductive Process Modeling We can use these ideas to reformulate the modeling problem: Given: A set of variables of interest to the scientist; Given: Observations of how these variables change over time; Given: Background knowledge about plausible processes; Find: A process model that explains these variations and that generalizes well to future observations. Background knowledge takes the form of generic processes that provide the building blocks for models.

35
Contributions of the Research incorporates a formalism that is familiar to many scientists; takes into account background knowledge about the domain; produces meaningful results from small amounts of data; generates models that explain rather than describe observations; provides an interactive environment for model construction. In summary, our work on computational model construction has produced an approach that: We need much more research in computational systems science that addresses these challenges.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google