Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.

Similar presentations


Presentation on theme: "Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies."— Presentation transcript:

1 Discovering Descriptive Knowledge Lecture 18

2 Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies and laws. Informatics tools for working with taxonomies represent them as a collection of hypotheses about categories and their is-a relationships; use them to organize knowledge and to classify new observations. represent them as hypotheses about quantitative and/or qualitative relationships among an object’s properties; use them to predict the static or dynamic properties of an entity or an interconnected system. Informatics tools for working with laws

3 The Taxonomy Formation Task Taxonomy formation consists of three tasks that may be solved separately or simultaneously: the construction of categories; the organization of the categories into a hierarchy; and the explicit definition of the categories. Informatics tools for taxonomy formation fall into two general categories: those that analyze finite batches of observations and create separate taxonomies for each batch; and those that incrementally construct and refine taxonomies based on an effectively continuous stream of data.

4 Cluster 3.0 Cluster is designed to construct and organize categories from a batch of gene expression data. As input, Cluster takes gene expression levels from multiple experiments. The program clusters genes based on their expression patterns across experiments. Scientists can select the clustering method and set the available parameters. Cluster produces a text file that contains the taxonomy.

5 Cluster 3.0: Results Viewing the taxonomy produced by Cluster requires a separate program, such as Tree View. taxonomyselected section of the taxonomy data gene annotations

6 ReTAX ReTAX is an interactive environment that helps scientists revise taxonomies in response to new observations. A taxonomy in ReTAX includes hierarchically organized categories and their definitions. The data for ReTAX are a set of features, such as the size of a plant’s leaf, the type of its fruit, etc. and a category. As a scientist enters data, ReTAX ensures that the new item’s features match or specialize the category’s defining features; and distinguish it from other categories in the taxonomy. If the new item violates either of these rules, then ReTAX attempts to revise its taxonomy.

7 ReTAX Andromeda Ericaceae GaultheriaPernettya… A. uva-ursiP. tasmanicaG. oppositifoliaG. rupestris G. antipoda Working in the context of a botanical taxonomy like this one, ReTAX replicated historical revisions. In the course of its use, ReTAX identified descriptive features that were insufficient for distinguishing members of two taxa; searched for new features to refine the taxa; and eventually suggested that the genera Pernettya and Gaultheria should be merged.

8 Qualitative Law Discovery Qualitative laws fall into two primary categories: those involving categorical statements about objects, such as “all ravens are black”; and those describing qualitative changes, such as “temperature and pressure increase proportionately”. Informatics tools that discover categorical relationships have received the majority of the attention in this area. These tools typically address a supervised learning task: data are described by multiple features (color = black, wings = present); one of these features serves as a target for classification (species = C. corax); and the tool relates the features to the target.

9 RL RL addresses the supervised learning task to produce qualitative laws that are expressed as logical rules. The rules are qualitative laws such that if all the conditions are true of a datum, then it is assigned to the target class. As input, RL takes a data set and information that controls the characteristics of the rules, such as taxonomies of the values for features, constraints among features in each rule, minimum accuracy, & maximum features.

10 RL As an example, consider the task of finding law-like relationships that link medical findings to a disease class. The data are patient findings, and the target is a syndrome that covers several ailments (lower respiratory syndrome). RL produces rules that relate the findings to the syndrome. Each rule has numeric measures of support. RL has been applied to identify carcinogens, and to determine parameters for crystallographic experiments.

11 Quantitative Law Discovery Quantitative laws may describe: algebraic relationships such as Newton’s second law of motion, a=F/m; and dynamic responses such as the unbounded growth rate of a population, dP/dt = kP. Informatics tools address both classes of laws through a variety of techniques. BACON discovers quantitative, algebraic laws through problem space search guided by declarative heuristics. Cubist discovers conditional, algebraic laws using techniques for linear regression.

12 LAGRAMGE LAGRAMGE, and it’s precursor LAGRANGE, were the first in a line of law discovery systems for differential equations. LAGRAMGE takes as input time series for multiple variables, an indicator that identifies the dependent variable, and knowledge about the structure of plausible solutions. As output, the system produces an algebraic or differential equation for the dependent variable. LAGRAMGE has been applied in ecosystem dynamics, fjord hydrodynamics, and other domains.

13 Discovering Descriptive Knowledge: Summary The computational scientific discovery has a long history particularly in the context of descriptive knowledge. Such systems have played a large role in exploring, analyzing, and understanding data. Work in this area laid the foundations for the field of data mining both in terms of research and applications. However, the discovery of descriptive knowledge can lead to a shallow interpretation of data; generally avoids statements of causality; and makes limited contact with the rich, theoretical content of a scientific discipline Next we will discuss systems that address these concerns.


Download ppt "Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies."

Similar presentations


Ads by Google