Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University, Stanford, California http://www.isle.org/~langley langley@isle.org Interactive Software Environments for Computational Modeling and Discovery Thanks to S. Bay, V. Brooks, L. Chrisman, S. Klooster, A. Pohorille, C. Potter, K. Saito, H. Spencer, J. Shrager, M. Schwabacher, and A. Torregrosa.

The Challenge of Systems Science develop models of complex systems with many components; develop models of complex systems with many components; compare these models to observational data from the systems; compare these models to observational data from the systems; evaluate their models ability to fit these observations; and evaluate their models ability to fit these observations; and improve their models in response to detected anomalies. improve their models in response to detected anomalies. As a field of science matures, researchers move beyond accounts of simple, isolated phenomena to: Developing, testing, and revising such models is a challenging endeavor that would benefit from computational aides. Our research goal is to design, construct, evaluate, and understand such computational tools for systems science.

Lessons about Scientific Knowledge Discovery Our research collaborations in Earth science and microbiology have suggested some important lessons: 1. Traditional notations from machine learning and data mining are not communicated easily to domain scientists. 2. Scientists often want models that move beyond description to provide explanations of their data. 3. Scientists often have initial models and background knowledge that should influence the discovery process. 4. Scientific data are often rare and difficult to obtain rather than being plentiful, making variance reduction a key issue. 5. Scientists often want computational assistance rather than automated discovery systems. These observations suggest clear needs for additional research in computational approaches to scientific knowledge discovery.

process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P equations: d[P,t] = [0, 1, ] P process logistic_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) process constant_inflow variables: I {inorganic_nutrient} variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, ] equations: d[I,t] = [0, 1, ] process consumption variables: P1 {population}, P2 {population}, nutrient_P2 variables: P1 {population}, P2 {population}, nutrient_P2 equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, d[P2,t] = [0, 1, ] P1 nutrient_P2 d[P2,t] = [0, 1, ] P1 nutrient_P2 process no_saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P equations: nutrient_P = P process saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, ]) equations: nutrient_P = P / (P + [0, 1, ]) Inductive Process Modeling model AquaticEcosystem variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto observables: nitro, phyto, zoo process phyto_exponential_growth equations: d[phyto,t] = 0.1 phyto equations: d[phyto,t] = 0.1 phyto process zoo_logistic_growth equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) process phyto_nitro_consumption equations: d[nitro,t] = 1 phyto nutrient_nitro, equations: d[nitro,t] = 1 phyto nutrient_nitro, d[phyto,t] = 1 phyto nutrient_nitro d[phyto,t] = 1 phyto nutrient_nitro process phyto_nitro_no_saturation equations: nutrient_nitro = nitro equations: nutrient_nitro = nitro process zoo_phyto_consumption equations: d[phyto,t] = 1 zoo nutrient_phyto, equations: d[phyto,t] = 1 zoo nutrient_phyto, d[zoo,t] = 1 zoo nutrient_phyto d[zoo,t] = 1 zoo nutrient_phyto process zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5) equations: nutrient_phyto = phyto / (phyto + 0.5) Induction training data background knowledge learned knowledge

Why Are Process Models Interesting? they incorporate scientific formalisms rather than AI notations; they incorporate scientific formalisms rather than AI notations; that are easily communicable to scientists and engineers; that are easily communicable to scientists and engineers; they move beyond descriptive generalization to explanation; they move beyond descriptive generalization to explanation; while retaining the modularity needed to support induction. while retaining the modularity needed to support induction. Process models are good targest for knowledge discovery because: These reasons point to process models as an ideal representation for scientific and engineering knowledge. Process models are an important alternative to formalisms used currently in machine learning and data mining.

Three Challenging Scientific Domains NPPc = month max (E · IPAR, 0) E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt 2 T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI 3 – 0.000077 · AHI 2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000) DFR NBLANBLR RRPhoto PBS Health - + + + - - - psbA1 psbA2 cpcB + + - - Light + Earth ecosystem gene regulation activity level lung capacity GSR resp. rate heart rate heart capacity heart activity lung activity human activities human activities

Challenges of Inductive Process Modeling process models characterize behavior of dynamical systems; process models characterize behavior of dynamical systems; variables are mainly continuous and data are unsupervised; variables are mainly continuous and data are unsupervised; observations are not independently and identically distributed; observations are not independently and identically distributed; process models contain unobservable processes and variables; process models contain unobservable processes and variables; multiple processes can interact to produce complex behavior. multiple processes can interact to produce complex behavior. Process model induction differs from typical learning tasks in that: Compensating factors include a focus on deterministic systems and the availability of background knowledge.

An Environment for Interactive Process Modeling specify process models of static and dynamic systems; specify process models of static and dynamic systems; display and edit a models structure and details graphically; display and edit a models structure and details graphically; utilize a model to simulate a systems behavior over time; utilize a model to simulate a systems behavior over time; incorporate background knowledge cast as generic processes; incorporate background knowledge cast as generic processes; indicate which processes to consider during model revision; indicate which processes to consider during model revision; invoke a revision module that improves a models fit to data. invoke a revision module that improves a models fit to data. We plan to develop an interactive environment that lets users: Our initial implementation focuses on quantitative processes, but future versions should also support qualitative models.

A Process Model for Carbon Production model npp; variables NPPc, E, IPAR, T1, T2, W, Topt, tempc, eet, PET, PETTWM, ahi, A, FPARFAS, monthlySolar, SolConver, MONFASNDVI, umd_veg; observable ahi,eet,tempc,Topt,MONFASNDVI,monthlySolar,PETTWM,umd_veg; process CarbonProd; equations NPPc = E * IPAR; process PhotoEfficiency; equations E = (0.389 * (T1 * (T2 * W))); process TempStress1; equations T1 = (0.8 + ((0.02 * Topt) - (0.0005 * (Topt ^ 2)))); process TempStress2; equations T2 = ((1.1814 / (1 + (2.718281828 ^ (0.2 * (Topt - 10 - tempc))))) / (1 + (2.718281828 ^ (0.3 * (tempc - 10 - Topt))))); process WaterStress; conditions PET!=0; equations W = (0.5 + (0.5 * (eet / PET))); process WSNoEvapoTrans; conditions PET==0; equations W = 0.5; process EvapoTrans; conditions tempc>0; equations PET = 1.6 * (10 * tempc / ahi) ^ A * PETTWM;

Viewing and Editing a Process Model

Initial Results on Ecosystem Model Revision Initial model: E = 0.56 · T1 · T2 · W E = 0.56 · T1 · T2 · W T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} Cross-validated RMSE = 465.212 and r 2 = 0.799 Revised model: E = 0.353 · T1 0.00 · T2 0.08 · W 0.00 E = 0.353 · T1 0.00 · T2 0.08 · W 0.00 T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )] T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )] PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} Cross-validated RMSE = 397.306 and r 2 = 0.853 [ 15 % reduction ]

How do plants modify their photosynthetic apparatus in high light? A Qualitative Model of Gene Regulation dspA NBLANBLR RRPhoto PBS Health - + + + - - - psbA1 psbA2 cpcB + + - - Light + This model is qualitative but relates continuous variables, much as formalisms from qualitative physics (e.g., Forbus, 1984).

Fields Contributing to the Proposed Research computational scientific discovery qualitative reasoning simulation languages, numerical analysis human-computer interaction biology, physiology, Earth science

Plans for Experimental Evaluation demonstrating new functionality in each of three domains demonstrating new functionality in each of three domains collecting and analyzing traces of users interactions collecting and analyzing traces of users interactions formulation of hypotheses about the human-computer system formulation of hypotheses about the human-computer system lesion studies with synthetic data to test those hypotheses lesion studies with synthetic data to test those hypotheses revision of environment based on results of experiments revision of environment based on results of experiments Our plans for evaluation include a variety of methods, including: Taken together, these studies should uncover the design principles that produce successful modeling and discovery environments. The methodology for evaluating intelligent assistants is not yet mature, so we must develop it along the way.

1. A general-purpose modeling environment may not be justified given the differences in the proposed application domains. 2. We should take a closer look at existing modeling environments like S TELLA and link our work to them if possible. 3. The research plan for modeling human activities is vague. 4. The schedule of work follows a standard software life cycle, rather than giving detail about tasks relevant to the project. Some Legitimate Reviewer Concerns

5. We may not need to develop new modeling formalism, since inductive logic programming can handle most of our needs. 6. The proposed research program will not use a "cutting-edge AI approach" because it relies on the heuristic search metaphor. 7. We should not incorporate qualitative physics because it did not scale well, has made little progress, and has had little impact. 8. The proposal reads like a CYC project for scientists. 9. The work plan is sketchy and, since the main task is developing the modeling environment, one postdoc may not be enough. Less Legitimate Reviewer Concerns

10. The proposal makes little commitment to data-mining methods and it does not offertimely advances. There is no conceptual novelty, and the framework is not "radically new". 10. The proposal makes little commitment to data-mining methods and it does not offer timely advances. There is no conceptual novelty, and the framework is not "radically new". 11. No work is cited for keeping qualitative, quantitative, verbal, and visual representations consistent. 12. There is a fundamental assumption that automated discovery tools are inferior to interactive ones. 13. We should take advantage of recent advances in genetic methods and ones for learning generative models. 14. The research seems unlikely to have a big commercial impact. Less Legitimate Reviewer Concerns

Planned Collaborations using constraints to control search for models (Freuder et al.) using constraints to control search for models (Freuder et al.) learning numeric constraints from observations (Freuder et al.) learning numeric constraints from observations (Freuder et al.) using methods for case adaptation to revise models (Bridge) using methods for case adaptation to revise models (Bridge) modeling regulation of apoptotic cell death (Cotter, Higgins) modeling regulation of apoptotic cell death (Cotter, Higgins) modeling behavior of Irish ecosystems (OKane) modeling behavior of Irish ecosystems (OKane) Likely collaborations with current UCC researchers include: We also plan to continue ongoing collaborations with scientists at: Stanford University, ISLE, and NASA Ames (USA) Stanford University, ISLE, and NASA Ames (USA) Josef Stefan Institute (Slovenia) Josef Stefan Institute (Slovenia) NTT Communication Science Laboratories (Japan) NTT Communication Science Laboratories (Japan)

Principal Investigator – Oversight of entire research project Senior Scientist – Oversight of environment design/implementation Postdoc – Implementing and maintaining modeling environment Postdocs – One for each scientific application domain Postdoc – Experimental evaluation of modeling environment PhD students – Two for each scientific application domain Laboratory manager – Responsible for general operations Computer manager – Responsible for computing environment Technical writer – Prepare manuals and co-author research reports Proposed Research Staff

Concluding Remarks moves beyond description and prediction to explanatory models; moves beyond description and prediction to explanatory models; uses domain knowledge to initialize and constrain search for improved models; uses domain knowledge to initialize and constrain search for improved models; provides an interactive environment that lets the user specify initial models and direct the revision process; provides an interactive environment that lets the user specify initial models and direct the revision process; presents the revised knowledge in some communicable notation that is familiar to domain experts. presents the revised knowledge in some communicable notation that is familiar to domain experts. In summary, unlike work in the data-mining paradigm, our research on computational modeling and discovery: This approach holds great potential to aid the modeling of complex systems in science and engineering.

The NPPc Portion of CASA NPPc = month max (E · IPAR, 0) E = 0.56 · T1 · T2 · W E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt 2 T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt 2 T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI 3 – 0.000077 · AHI 2 + 0.018 · AHI + 0.49 A = 0.00000068 · AHI 3 – 0.000077 · AHI 2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000) SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

The NPPc Portion of CASA NPPc IPAR PET T1T2We_max E EET Tempc Topt NDVI SOLAR AHI A PET TWM SR FPAR VEG

History of Research on Computational Scientific Discovery 1989199019791980198119821983198419851986198719881991199219931994199519961997199819992000 Bacon.1–Bacon.5 Abacus, Coper Fahrehneit, E*, Tetrad, IDS N Hume, ARC DST, GP N LaGrange SDS SSF, RF5, LaGramge Dalton, Stahl RL, Progol Gell-Mann BR-3, Mendel Pauli Stahlp, Revolver Dendral AM GlauberNGlauber IDS Q, Live IE Coast, Phineas, AbE, Kekada Mechem, CDP Astra, GP M HR BR-4 Numeric lawsQualitative lawsStructural modelsProcess models Legend

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,

Similar presentations

Presentation on theme: "Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,

Similar presentations

Presentation on theme: "Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,"— Presentation transcript:

Similar presentations

About project

Feedback