Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California.

Slides:



Advertisements
Similar presentations
Computational Revision of Ecological Process Models
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Dileep George Stephen Bay Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford,
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Institute for the Study of Learning and Expertise Javier Sanchez CSLI / Stanford University Ljupco Todorovski Saso Dzeroski Jozef Stefan Institute.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise and Center for the Study of Language and Information Stanford University, Stanford, California.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA
Filtering Information in Complex Temporal Domains
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona A Cognitive Architecture for Integrated.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Discovery.
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Discovery of Explanatory Process Models Thanks to.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Computer Science and Engineering / Psychology Arizona State University Tempe, Arizona Challenges and Opportunities in Informatics Research.
The University of Michigan Georgia Institute of Technology
Lesson Overview 1.1 What Is Science?.
INTRODUCTION TO MODELING
Chapter 1 Introduction to Modeling DECISION MODELING WITH MICROSOFT EXCEL Copyright 2001 Prentice Hall.
Display of Information for Time-Critical Decision Making Eric Horvitz Decision Theory Group Microsoft Research Redmond, Washington 98025
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Institute for the Study of Learning and Expertise Palo Alto, California.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Developing Ideas for Research and Evaluating Theories of Behavior
Framework for K-12 Science Education
Research method2 Dr Majed El- Farra 1 Research methods Second meeting.
Cookies, Spreadsheets, and Modeling: Dynamic, Interactive, Visual Science and Math Scott A. Sinex Prince George’s Community College Presented at Network.
Section 2: Science as a Process
Virginia Standard of Learning BIO.1a-m
Unit 2: Engineering Design Process
Taxonomies and Laws Lecture 10. Taxonomies and Laws Taxonomies enumerate scientifically relevant classes and organize them into a hierarchical structure,
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
Discovering Dynamic Models Lecture 21. Dynamic Models: Introduction Dynamic models can describe how variables change over time or explain variation by.
1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.
Pat Langley Adam Arvay Department of Computer Science University of Auckland Auckland, NZ Heuristic Induction of Rate-Based Process Models Thanks to W.
The Field of Psychology Gaining Insight into Behavior Behavior results from physiological (physical) processes and cognitive (intellectual) processes.
Thanks to G. Bradshaw, W. Bridewell, S. Dzeroski, H. A. Simon, L. Todorovski, R. Valdes-Perez, and J. Zytkow for discussions that led to many of these.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Developing and Evaluating Theories of Behavior.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
Unpacking the Elements of Scientific Reasoning Keisha Varma, Patricia Ross, Frances Lawrenz, Gill Roehrig, Douglas Huffman, Leah McGuire, Ying-Chih Chen,
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Lecture 1 – Operations Research
Neural Modeling - Fall NEURAL TRANSFORMATION Strategy to discover the Brain Functionality Biomedical engineering Group School of Electrical Engineering.
The Scientific Method. Objectives Explain how science is different from other forms of human endeavor. Identify the steps that make up scientific methods.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Research for Nurses: Methods and Interpretation Chapter 1 What is research? What is nursing research? What are the goals of Nursing research?
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Data Mining and Decision Support
Lesson Overview Lesson Overview What Is Science?.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Assistance for Systems Biology of Aging Thanks to.
Helpful hints for planning your Wednesday investigation.
Introduction to Modeling Technology Enhanced Inquiry Based Science Education.
Section 2: Science as a Process
CSc4730/6730 Scientific Visualization
What Is Science? Read the lesson title aloud to students.
Causal Models Lecture 12.
Discovery Informatics
Presentation transcript:

Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California An Interactive Environment for Scientific Model Construction Thanks to N. Asgharbeygi, K. Arrigo, S. Bay, J. Fitzgerald, D. George, S. Klooster, C. Potter, K. Saito, and T. Shinar.

Lessons about Scientific Knowledge Discovery Our research collaborations in Earth science and microbiology have revealed some important lessons: 1. Scientists are more comfortable with their own notations than ones from machine learning and data mining. 2. Scientific data are often rare and difficult, indicating a need for additional constraints. 3. Scientists often have initial models and knowledge that should influence the discovery process. 4. Scientists typically want computational assistance rather than automated discovery systems. These observations suggest a need for alternative computational approaches to scientific model construction.

ModelRevision Initial model Observations Revisedmodel Scientist Data, Knowledge, and the Scientist process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P equations: d[P,t] = [0, 1, ] P process logistic_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) process constant_inflow variables: I {inorganic_nutrient} variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, ] equations: d[I,t] = [0, 1, ] process consumption variables: P1 {population}, P2 {population}, nutrient_P2 variables: P1 {population}, P2 {population}, nutrient_P2 equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, d[P2,t] = [0, 1, ] P1 nutrient_P2 d[P2,t] = [0, 1, ] P1 nutrient_P2 process no_saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P equations: nutrient_P = P process saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, ]) equations: nutrient_P = P / (P + [0, 1, ]) model AquaticEcosystem variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto observables: nitro, phyto, zoo process phyto_exponential_growth equations: d[phyto,t] = 0.1 phyto equations: d[phyto,t] = 0.1 phyto process zoo_logistic_growth equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) process phyto_nitro_consumption equations: d[nitro,t] = 1 phyto nutrient_nitro, equations: d[nitro,t] = 1 phyto nutrient_nitro, d[phyto,t] = 1 phyto nutrient_nitro d[phyto,t] = 1 phyto nutrient_nitro process phyto_nitro_no_saturation equations: nutrient_nitro = nitro equations: nutrient_nitro = nitro process zoo_phyto_consumption equations: d[phyto,t] = 1 zoo nutrient_phyto, equations: d[phyto,t] = 1 zoo nutrient_phyto, d[zoo,t] = 1 zoo nutrient_phyto d[zoo,t] = 1 zoo nutrient_phyto process zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5) equations: nutrient_phyto = phyto / (phyto + 0.5) model AquaticEcosystem variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto observables: nitro, phyto, zoo process phyto_exponential_growth equations: d[phyto,t] = 0.1 phyto equations: d[phyto,t] = 0.1 phyto process zoo_logistic_growth equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) process phyto_nitro_consumption equations: d[nitro,t] = 1 phyto nutrient_nitro, equations: d[nitro,t] = 1 phyto nutrient_nitro, d[phyto,t] = 1 phyto nutrient_nitro d[phyto,t] = 1 phyto nutrient_nitro process phyto_nitro_no_saturation equations: nutrient_nitro = nitro equations: nutrient_nitro = nitro process zoo_phyto_consumption equations: d[phyto,t] = 1 zoo nutrient_phyto, equations: d[phyto,t] = 1 zoo nutrient_phyto, d[zoo,t] = 1 zoo nutrient_phyto d[zoo,t] = 1 zoo nutrient_phyto process zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5) equations: nutrient_phyto = phyto / (phyto + 0.5)

The P ROMETHEUS Modeling Environment specify process models of static and dynamic systems; specify process models of static and dynamic systems; display and edit a models structure and details graphically; display and edit a models structure and details graphically; utilize a model to simulate a systems behavior over time; utilize a model to simulate a systems behavior over time; incorporate background knowledge cast as generic processes; incorporate background knowledge cast as generic processes; indicate which processes to consider during model revision; indicate which processes to consider during model revision; invoke a revision module that improves a models fit to data. invoke a revision module that improves a models fit to data. P ROMETHEUS is an interactive environment that lets its users: Our initial results focused on static models, but in this talk we illustrate the systems use on dynamic models.

A Process Model for an Aquatic Ecosystem model AquaticEcosystem; variables phyto, zoo, nitro, residue; observables phyto, nitro; process zoo_exponential_decay; equationsd[zoo,t,1] = zoo; equationsd[zoo,t,1] = zoo; d[residue,t,1] = 0.251; process zoo_phyto_predation; equationsd[zoo,t,1] = zoo; equationsd[zoo,t,1] = zoo; d[residue,t,1] = zoo; d[phyto,t,1] = zoo; process nitro_uptake; conditionsnitro > 1.25; conditionsnitro > 1.25; equationsd[phyto,t,1] = phyto; equationsd[phyto,t,1] = phyto; d[nitro,t,1] = phyto; process nitro_remineralization; equationsd[nitro,t,1] = residue; equationsd[nitro,t,1] = residue; d[residue,t,1 ] = residue;

Advantages of Quantitative Process Models they embed quantitative relations within qualitative structure; they embed quantitative relations within qualitative structure; that refer to notations and mechanisms familiar to scientists; that refer to notations and mechanisms familiar to scientists; they support both algebraic and dynamical relationships; they support both algebraic and dynamical relationships; they offer causal and explanatory accounts of phenomena; they offer causal and explanatory accounts of phenomena; while retaining the modularity needed to support induction. while retaining the modularity needed to support induction. Process models are a good target for modeling systems because: Quantitative process models provide an important alternative to formalisms used currently in scientific modeling.

Viewing a Process Model Graphically

Simulation and Prediction in P ROMETHEUS To utilize a given process model, P ROMETHEUS simulates its behavior over time or samples by: accepting initial values for input variables and a time step size; accepting initial values for input variables and a time step size; on each time step, determining which processes are active; on each time step, determining which processes are active; solving active static/differential equations with known values; solving active static/differential equations with known values; propagating values and solving other active equations; propagating values and solving other active equations; when multiple processes influence the same variable, assuming their effects are additive. when multiple processes influence the same variable, assuming their effects are additive. This module makes specific predictions that the user can compare to observations.

Predictions from Aquatic Ecosystem Model

A User-Guided Method for Model Revision 1. Specify all ways to alter the initial model in terms of revising parameters in, removing, and adding processes; 2. Find all ways to instantiate candidate additions with specific variables, subject to type constraints; 3. Generate candidate model structures by removing and adding indicated processes, with limits on total number of processes. 4. For each model structure, search for parameter values that provide a good fit to the data; 5. Return a list of the best N parameterized models, ranked by their mean squared error. The P ROMETHEUS system revises a process model in five stages: The user can inspect these revisions and select the one he finds most plausible.

Marking Processes to Revise or Remove

Indicating Processes to Consider Adding

Generic Processes as Background Knowledge the variables involved in a process and their types; the variables involved in a process and their types; the parameters appearing in a process and their ranges; the parameters appearing in a process and their ranges; the forms of conditions on the process; and the forms of conditions on the process; and the forms of associated equations and their parameters. the forms of associated equations and their parameters. P ROMETHEUS casts background knowledge about a domain as generic processes that specify: Generic processes are the building blocks that P ROMETHEUS uses in its revision of specific process models.

Generic Processes for Aquatic Ecosystems generic process exponential_decay;generic process remineralization; variables: S{species}, D{detritus}; variables N{nutrient}, D{detritus}; variables: S{species}, D{detritus}; variables N{nutrient}, D{detritus}; parameters [0, 1]; parameters [0, 1]; parameters [0, 1]; parameters [0, 1]; equations d[S,t,1] = 1 S; equations d[N, t,1] = D; equations d[S,t,1] = 1 S; equations d[N, t,1] = D; d[D,t,1] = S;d[D, t,1] = 1 D; generic process predation;generic process constant_inflow; variables S1{species}, S2{species}, D{detritus}; variables N{nutrient}; variables S1{species}, S2{species}, D{detritus}; variables N{nutrient}; parameters [0, 1], [0, 1]; parameters [0, 1]; parameters [0, 1], [0, 1]; parameters [0, 1]; equations d[S1,t,1] = S1; equations d[N,t,1] = ; equations d[S1,t,1] = S1; equations d[N,t,1] = ; d[D,t,1] = (1 ) S1; d[S2,t,1] = 1 S1; generic process nutrient_uptake; variables S{species}, N{nutrient}; variables S{species}, N{nutrient}; parameters [0, ], [0, 1], [0, 1]; parameters [0, ], [0, 1], [0, 1]; conditions N > ; conditions N > ; equations d[S,t,1] = S ; equations d[S,t,1] = S ; d[N,t,1] = 1 S;

Specifying Data and Search Parameters

Inspecting Revised Process Models

Best Fit to Nitrate Data from Ross Sea

Best Fit to Phytoplankton Data from Ross Sea

computational scientific discovery (e.g., Langley et al., 1983); computational scientific discovery (e.g., Langley et al., 1983); theory revision in machine learning (e.g., Towell, 1991); theory revision in machine learning (e.g., Towell, 1991); qualitative physics and simulation (e.g., Forbus, 1984); qualitative physics and simulation (e.g., Forbus, 1984); languages for scientific simulation (e.g., STELLA, MATLAB ); languages for scientific simulation (e.g., STELLA, MATLAB ); interactive tools for data analysis (e.g., Schneiderman, 2001). interactive tools for data analysis (e.g., Schneiderman, 2001). Intellectual Influences Our approach to scientific model construction incorporates ideas from many traditions: The P ROMETHEUS environment combines insights from machine learning, AI, programming languages, and HCI in novel ways.

Valdes-Perez (1995) M ECHEM [chemistry] Valdes-Perez (1995) M ECHEM [chemistry] Rickel and Porters (1997) T RIPEL [biology] Rickel and Porters (1997) T RIPEL [biology] Sleeman et al.s (1997) D AVICCAND [metallurgy] Sleeman et al.s (1997) D AVICCAND [metallurgy] Mahidadia and Comptons (2001) J UST A ID [endocrinology] Mahidadia and Comptons (2001) J UST A ID [endocrinology] Specific Precursors A few earlier systems support the interactive discovery of scientific models, including: P ROMETHEUS adapts their ideas to scientific domains that involve quantitative explanatory models, such as Earth science.

Directions for Future Research produce additional results on other scientific data sets; produce additional results on other scientific data sets; develop more robust methods for fitting model parameters; develop more robust methods for fitting model parameters; implement interactive methods for searching the model space; implement interactive methods for searching the model space; introduce models with subsystems to handle complexity; and introduce models with subsystems to handle complexity; and carry out usability studies with the modeling environment. carry out usability studies with the modeling environment. Despite our progress to date, we need further work in order to: Interactive environments for model construction and revision have great potential to speed progress in science and engineering.

Contributions of the Research a new formalism for representing scientific process models; a new formalism for representing scientific process models; a graphical interface for displaying and editing these models; a graphical interface for displaying and editing these models; a computational method for simulating these models behavior; a computational method for simulating these models behavior; an encoding for background knowledge as generic processes; an encoding for background knowledge as generic processes; an interactive method for revising process models given data. an interactive method for revising process models given data. In summary, our work on the P ROMETHEUS system has led to: We have demonstrated this approach to model construction and revision on Earth science problems with encouraging results.

Hierarchical Model of a Power Grid

End of Presentation

Steps in Applying Computational Scientific Discovery problem formulation representation engineering data collection/ manipulation algorithm manipulation filtering and interpretation algorithm invocation

A Process Model for Carbon Production model npp; variables NPPc, E, IPAR, T1, T2, W, Topt, tempc, eet, PET, PETTWM, ahi, A, FPARFAS, monthlySolar, SolConver, MONFASNDVI, umd_veg; observable ahi,eet,tempc,Topt,MONFASNDVI,monthlySolar,PETTWM,umd_veg; process CarbonProd; equations NPPc = E * IPAR; process PhotoEfficiency; equations E = (0.389 * (T1 * (T2 * W))); process TempStress1; equations T1 = (0.8 + ((0.02 * Topt) - ( * (Topt ^ 2)))); process TempStress2; equations T2 = (( / (1 + ( ^ (0.2 * (Topt tempc))))) / (1 + ( ^ (0.3 * (tempc Topt))))); process WaterStress; conditions PET!=0; equations W = (0.5 + (0.5 * (eet / PET))); process WSNoEvapoTrans; conditions PET==0; equations W = 0.5; process EvapoTrans; conditions tempc>0; equations PET = 1.6 * (10 * tempc / ahi) ^ A * PETTWM;

Viewing a Process Model Graphically

Results of Revising the NPP Model Initial model: E = 0.56 · T1 · T2 · W E = 0.56 · T1 · T2 · W T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} RMSE on training data = and r 2 = Revised model: E = · T · T · W 0.00 E = · T · T · W 0.00 T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )] T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )] PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} Cross-validated RMSE = and r 2 = [ 15 % reduction ]

The Challenge of Systems Science focus on synthesis rather than analysis in their operation; focus on synthesis rather than analysis in their operation; rely on computer modeling as one of their central methods; rely on computer modeling as one of their central methods; develop system-level models with many variables and relations; develop system-level models with many variables and relations; evaluate their models on observational, not experimental, data. evaluate their models on observational, not experimental, data. Disciplines like Earth science and computational biology differ from traditional fields in that they: Developing and testing such models are complex tasks that would benefit from computational aids. Our research goal is to design, construct, evaluate, and understand such computational tools for systems science.

Fields Contributing to the Proposed Research computational scientific discovery qualitative reasoning simulation languages, numerical analysis human-computer interaction biology, physiology, Earth science

identifying conditions on processes (parameter optimization) identifying conditions on processes (parameter optimization) inferring initial values of unobservables (parameter optimization) inferring initial values of unobservables (parameter optimization) keeping the search space tractable (typing on variables) keeping the search space tractable (typing on variables) reducing variance to mitigate overfitting (min. desc. length) reducing variance to mitigate overfitting (min. desc. length) Inductive process modeling raises a number of issues that have clear analogues in other paradigms: We have demonstrated promising responses to these four problems within the IPM framework. Issues in Process Model Induction

Best Model Fit to Data from Ross Sea

Best Model Fit to Data on Protozoan Predation

Collecting Data on Photosynthetic Processes External stimuli (e.g., light) Adaptation Period Sampling mRNA/cDNA Equlibrium Period MicroarrayTrace Continuous Culture (Chemostat) /wwwscience.murdoch.edu.au/teach Health of Culture Time

Gene Expressions for Cyanobacteria

Generic Processes for Photosynthesis Regulation generic process translationgeneric process transcription variables: P{protein}, M{mRNA} variables: M{mRNA}, R{rate} variables: P{protein}, M{mRNA} variables: M{mRNA}, R{rate} parameters: [0, 1] parameters: parameters: [0, 1] parameters: equations:d[P,t,1] = M equations:d[M,t,1] = R equations:d[P,t,1] = M equations:d[M,t,1] = R generic process regulate_onegeneric process regulate_two variables: R{rate}, S{signal} variables: R{rate}, S{signal} variables: R{rate}, S{signal} variables: R{rate}, S{signal} parameters: [ 1, 1] parameters: [ 1, 1], [0, 1] parameters: [ 1, 1] parameters: [ 1, 1], [0, 1] equations:R = S equations:R = S equations:R = S equations:R = S d[S, t,1] = 1 S generic process automatic_degradationgeneric process controlled_degradation variables: C{concentration} variables: D{concentration}, E{concentration} variables: C{concentration} variables: D{concentration}, E{concentration} conditions:C > 0 conditions:D > 0, E > 0 conditions:C > 0 conditions:D > 0, E > 0 parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] equations:d[C,t,1] = 1 C equations:d[D,t,1] = 1 E equations:d[C,t,1] = 1 C equations:d[D,t,1] = 1 E d[E,t,1] = 1 E generic process photosynthesis variables: L{light}, P{protein}, R{redox}, S{ROS} variables: L{light}, P{protein}, R{redox}, S{ROS} parameters: [0, 1], [0, 1] parameters: [0, 1], [0, 1] equations:d[R,t,1] = L P equations:d[R,t,1] = L P d[S,t,1] = L P

A Process Model for Photosynthetic Regulation model photo_regulation variables: light, mRNA_protein, ROS, redox, transcription_rate observables: light, mRNA process photosynthesis; equations:d[redox,t,1] = light protein equations:d[redox,t,1] = light protein d[ROS,t,1] = light protein process protein_translationprocess mRNA_transcription equations:d[protein,t,1] = 7.54 mRNA equations:d[mRNA,t,1] = transcription_rate equations:d[protein,t,1] = 7.54 mRNA equations:d[mRNA,t,1] = transcription_rate process regulate_one_1process regulate_two_2 equations: transcription_rate = 0.99 light equations:transcription_rate = redox equations: transcription_rate = 0.99 light equations:transcription_rate = redox d[redox,t,1] = redox process automatic_degradation_1process controlled_degradation_1 conditions:protein > 0 conditions:redox > 0, ROS > 0 conditions:protein > 0 conditions:redox > 0, ROS > 0 equations:d[protein,t,1] = 1.91 protein equations:d[redox,t,1] = ROS equations:d[protein,t,1] = 1.91 protein equations:d[redox,t,1] = ROS d[ROS,t,1] = ROS

Predictions from Best Parameterized Model

Electric Power on the International Space Station

Telemetry Data from Space Station Batteries

Induced Process Model for Battery Behavior model Battery variables: Rs, Vcb, soc, Vt, i, temperature observable: soc, Vt, i, temperature process voltage_chargeprocess voltage_discharge conditions:i 0 conditions:i < 0 conditions:i 0 conditions:i < 0 equations:Vt = Vcb Rs i equations:Vt = Vcb 1.0 / (Rs + 1.0) equations:Vt = Vcb Rs i equations:Vt = Vcb 1.0 / (Rs + 1.0) process charge_transfer equations:d[soc,t,1] = i Vcb/ equations:d[soc,t,1] = i Vcb/ process quadratic_influence_Vcb_soc equations:Vcb = soc soc equations:Vcb = soc soc process linear_influence_Vcb_temp equations:Vcb = temperature equations:Vcb = temperature process linear_influence_Rs_soc equations:Rs = soc equations:Rs = soc

Results on Battery Test Data

Hierarchical Model of a Power Grid

specify a quantitative process model of the target system; specify a quantitative process model of the target system; display and edit the models structure and details graphically; display and edit the models structure and details graphically; simulate the models behavior over time and situations; simulate the models behavior over time and situations; compare the models predicted behavior to observations; compare the models predicted behavior to observations; invoke a revision module in response to detected anomalies. invoke a revision module in response to detected anomalies. Because scientists do not want to be replaced, we are developing an interactive environment that lets users: The environment offers computational assistance in forming and evaluating models but lets the user retain control. Challenge 5: Interfacing with Scientists

In Memoriam Herbert A. Simon (1916 – 2001) Herbert A. Simon (1916 – 2001) Jan M. Zytkow (1945 – 2001) Jan M. Zytkow (1945 – 2001) Two years ago, computational scientific discovery lost two of its founding fathers: Both contributed to the field in many ways: posing new problems, inventing methods, training students, and organizing meetings. Moreover, both were interdisciplinary researchers who contributed to computer science, psychology, philosophy, and statistics. Herb Simon and Jan Zytkow were excellent role models that we should all aim to emulate.

Data Mining vs. Scientific Discovery Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. There exist two computational paradigms for discovering explicit knowledge from data: Both approaches draw on heuristic search to find regularities in data, but they differ considerably in their emphases.

Time Line for Research on Computational Scientific Discovery Bacon.1–Bacon.5 Abacus, Coper Fahrehneit, E*, Tetrad, IDS N Hume, ARC DST, GP N LaGrange SDS SSF, RF5, LaGramge Dalton, Stahl RL, Progol Gell-Mann BR-3, Mendel Pauli Stahlp, Revolver Dendral AM GlauberNGlauber IDS Q, Live IE Coast, Phineas, AbE, Kekada Mechem, CDP Astra, GP M HR BR-4 Numeric lawsQualitative lawsStructural modelsProcess models Legend

Why Are Process Models Interesting? they incorporate scientific formalisms rather than AI notations; they incorporate scientific formalisms rather than AI notations; that are easily communicable to scientists and engineers; that are easily communicable to scientists and engineers; they move beyond descriptive generalization to explanation; they move beyond descriptive generalization to explanation; while retaining the modularity needed to support induction. while retaining the modularity needed to support induction. Process models are a crucial target for machine learning because: These reasons point to process models as an ideal representation for scientific and engineering knowledge. Process models are an important alternative to formalisms used currently in machine learning.

Challenges of Inductive Process Modeling process models characterize behavior of dynamical systems; process models characterize behavior of dynamical systems; variables are mainly continuous and data are unsupervised; variables are mainly continuous and data are unsupervised; observations are not independently and identically distributed; observations are not independently and identically distributed; process models contain unobservable processes and variables; process models contain unobservable processes and variables; multiple processes can interact to produce complex behavior. multiple processes can interact to produce complex behavior. Process model induction differs from typical learning tasks in that: Compensating factors include a focus on deterministic systems and the availability of background knowledge.

Making Predictions with Process Models Specify initial values for input variables and the size for time steps On each time step, check conditions to decide which processes are active Solve algebraic and differential equations with known values Propagate values and recurse to solve other equations Add the effects of different processes on each variable

Predictions from IPMs Induced Model

Observed values for a set of continuous variables as they vary over time or situations Generic processes that characterize causal relationships among variables in terms of conditional equations Inductive Process Modeling A specific process model that explains the observed values and predicts future data accurately Induction training data background knowledge learned model

Inductive Process Modeling as Search an initial state from which to start search; an initial state from which to start search; some operators that generate new states; some operators that generate new states; an evaluation function that selects among states; an evaluation function that selects among states; an overall control regime for the search; and an overall control regime for the search; and a halting criterion for ending the search. a halting criterion for ending the search. To construct a quantitative process model, we need an algorithm to search the space of models that assumes: We have implemented a four-stage method that takes positions on these design decisions.

The IPM Method for Process Model Induction Find all ways to instantiate known generic processes with specific variables Combine subsets of instantiated processes into generic models Remove candidates that are too complex or not connected graphs For each generic model, search for good parameter values Return parameterized model with the smallest error

A Biologists Depiction of Photosynthesis

Predictions from Best Parameterized Model

The NPPc Portion of CASA NPPc = month max (E · IPAR, 0) E = 0.56 · T1 · T2 · W E = 0.56 · T1 · T2 · W T1 = · Topt – · Topt 2 T1 = · Topt – · Topt 2 T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] W = · EET / PET W = · EET / PET PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 PET = 0 if Tempc < 0 A = · AHI 3 – · AHI · AHI A = · AHI 3 – · AHI · AHI IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] SR-FAS = (Mon-FAS-NDVI ) / (Mon-FAS-NDVI – 1000) SR-FAS = (Mon-FAS-NDVI ) / (Mon-FAS-NDVI – 1000)

A Process Model for an Aquatic Ecosystem model AquaticEcosystem; variables phyto, zoo, nitro, residue; observables phyto, nitro; process phyto_exponential_decay; equationsd[phyto,t,1] = phyto; equationsd[phyto,t,1] = phyto; d[residue,t,1] = phyto; process zoo_exponential_decay; equationsd[zoo,t,1] = zoo; equationsd[zoo,t,1] = zoo; d[residue,t,1] = 0.251; process zoo_phyto_predation; equationsd[zoo,t,1] = zoo; equationsd[zoo,t,1] = zoo; d[residue,t,1] = zoo; d[phyto,t,1] = zoo; process nitro_uptake; conditionsnitro > 1.25; conditionsnitro > 1.25; equationsd[phyto,t,1] = phyto; equationsd[phyto,t,1] = phyto; d[nitro,t,1] = phyto; process nitro_remineralization; equationsd[nitro,t,1] = residue; equationsd[nitro,t,1] = residue; d[residue,t,1 ] = residue;