Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.

Slides:



Advertisements
Similar presentations
Chapter 14 Simulation. 2 What Is Simulation?  Simulation: A model of a complex system and the experimental manipulation of the model to observe the results.
Advertisements

Copyright © Allyn & Bacon (2007) Hypothesis Testing, Validity, and Threats to Validity Graziano and Raulin Research Methods: Chapter 8 This multimedia.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Machine Learning: Symbol-Based
1 Welcome to Biol 178 Principles of Biology Course goals Course information Text Grading Syllabus Lab Chapter Organization.
Chapter 5 Data mining : A Closer Look.
Chapter 14 Simulation. 2 What Is Simulation?  Simulation: A model of a complex system and the experimental manipulation of the model to observe the results.
CSCI 347 / CS 4206: Data Mining Module 01: Introduction Topic 03: Stages in Data Mining.
Formulating objectives, general and specific
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Unit 2: Engineering Design Process
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
Types of Research 1. Categorized by Practicality a. Basic research  done to satisfy a need to know with no intention of resolving an immediate social.
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
Taxonomies and Laws Lecture 10. Taxonomies and Laws Taxonomies enumerate scientifically relevant classes and organize them into a hierarchical structure,
Evaluating a Research Report
Role of Statistics in Geography
Discovering Dynamic Models Lecture 21. Dynamic Models: Introduction Dynamic models can describe how variables change over time or explain variation by.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Prepared By Ms.R.K.Dharme Head Computer Department.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 3: The Foundations of Research 1.
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
URBDP 591 I Lecture 3: Research Process Objectives What are the major steps in the research process? What is an operational definition of variables? What.
Data Mining By Dave Maung.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Producer Questions 6 December Producer Questions 2 Purpose The SIP standard envisions the development of a formal model of the data for.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
LECTURE 1 - SCOPE, OBJECTIVES AND METHODS OF DISCIPLINE "ECONOMETRICS"
Discovering Structural Models Lecture 19. Structural Models in Science Structural models encode the spatial relationships among the components of some.
บทบาทของนักสถิติต่อภาคธุรกิจ และอุตสาหกรรม. Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or.
Intro to Scientific Research Methods in Geography Chapter 2: Fundamental Research Concepts.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Generic Tasks by Ihab M. Amer Graduate Student Computer Science Dept. AUC, Cairo, Egypt.
Physical Science Chapter 2 – The Scientific Method.
Data Mining and Decision Support
Lecture №1 Role of science in modern society. Role of science in modern society.
What is Science? SECTION 1.1. What Is Science and Is Not  Scientific ideas are open to testing, discussion, and revision  Science is an organize way.
Understanding the difference between an engineer and a scientist There are many similarities and differences.
Lesson Overview Lesson Overview What Is Science?.
Develop and Use Models I can identify limitations of models. I can use a model to test cause and effect relationships or interactions concerning the functioning.
Research Methodology II Term review. Theoretical framework  What is meant by a theory? It is a set of interrelated constructs, definitions and propositions.
Chapter 1: Section 1 What is Science?. What Science IS and IS NOT.. The goal of Science is to investigate and understand the natural world, to explain.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 1 Introduction to Research in Communication Research: –Process of asking questions.
© Houghton Mifflin Harcourt Publishing Company Section 1 What Is Physics? Preview Objectives Physics The Scientific Method Models Hypotheses Controlled.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Jeffery S. Horsburgh Hydroinformatics Fall 2014
SNS COLLEGE OF TECHNOLOGY
Chapter 2: Measurements and Calculations
Purpose of Research Research may be broadly classified into two areas; basic and applied research. The primary purpose of basic research (as opposed to.
Hypothesis Testing, Validity, and Threats to Validity
Statistical Data Analysis
Introduction to Physical Science
Lee, Jung-Woo Interdisciplinary Program in Cognitive Science
Nature of Science Understandings for HS
What is Science?.
SCIENCE AND ENGINEERING PRACTICES
Causal Models Lecture 12.
Statistical Data Analysis
Classification of Organisms
INTRODUCTION TO STATISTICS
Presentation transcript:

Discovering Descriptive Knowledge Lecture 18

Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies and laws. Informatics tools for working with taxonomies represent them as a collection of hypotheses about categories and their is-a relationships; use them to organize knowledge and to classify new observations. represent them as hypotheses about quantitative and/or qualitative relationships among an object’s properties; use them to predict the static or dynamic properties of an entity or an interconnected system. Informatics tools for working with laws

The Taxonomy Formation Task Taxonomy formation consists of three tasks that may be solved separately or simultaneously: the construction of categories; the organization of the categories into a hierarchy; and the explicit definition of the categories. Informatics tools for taxonomy formation fall into two general categories: those that analyze finite batches of observations and create separate taxonomies for each batch; and those that incrementally construct and refine taxonomies based on an effectively continuous stream of data.

Cluster 3.0 Cluster is designed to construct and organize categories from a batch of gene expression data. As input, Cluster takes gene expression levels from multiple experiments. The program clusters genes based on their expression patterns across experiments. Scientists can select the clustering method and set the available parameters. Cluster produces a text file that contains the taxonomy.

Cluster 3.0: Results Viewing the taxonomy produced by Cluster requires a separate program, such as Tree View. taxonomyselected section of the taxonomy data gene annotations

ReTAX ReTAX is an interactive environment that helps scientists revise taxonomies in response to new observations. A taxonomy in ReTAX includes hierarchically organized categories and their definitions. The data for ReTAX are a set of features, such as the size of a plant’s leaf, the type of its fruit, etc. and a category. As a scientist enters data, ReTAX ensures that the new item’s features match or specialize the category’s defining features; and distinguish it from other categories in the taxonomy. If the new item violates either of these rules, then ReTAX attempts to revise its taxonomy.

ReTAX Andromeda Ericaceae GaultheriaPernettya… A. uva-ursiP. tasmanicaG. oppositifoliaG. rupestris G. antipoda Working in the context of a botanical taxonomy like this one, ReTAX replicated historical revisions. In the course of its use, ReTAX identified descriptive features that were insufficient for distinguishing members of two taxa; searched for new features to refine the taxa; and eventually suggested that the genera Pernettya and Gaultheria should be merged.

Qualitative Law Discovery Qualitative laws fall into two primary categories: those involving categorical statements about objects, such as “all ravens are black”; and those describing qualitative changes, such as “temperature and pressure increase proportionately”. Informatics tools that discover categorical relationships have received the majority of the attention in this area. These tools typically address a supervised learning task: data are described by multiple features (color = black, wings = present); one of these features serves as a target for classification (species = C. corax); and the tool relates the features to the target.

RL RL addresses the supervised learning task to produce qualitative laws that are expressed as logical rules. The rules are qualitative laws such that if all the conditions are true of a datum, then it is assigned to the target class. As input, RL takes a data set and information that controls the characteristics of the rules, such as taxonomies of the values for features, constraints among features in each rule, minimum accuracy, & maximum features.

RL As an example, consider the task of finding law-like relationships that link medical findings to a disease class. The data are patient findings, and the target is a syndrome that covers several ailments (lower respiratory syndrome). RL produces rules that relate the findings to the syndrome. Each rule has numeric measures of support. RL has been applied to identify carcinogens, and to determine parameters for crystallographic experiments.

Quantitative Law Discovery Quantitative laws may describe: algebraic relationships such as Newton’s second law of motion, a=F/m; and dynamic responses such as the unbounded growth rate of a population, dP/dt = kP. Informatics tools address both classes of laws through a variety of techniques. BACON discovers quantitative, algebraic laws through problem space search guided by declarative heuristics. Cubist discovers conditional, algebraic laws using techniques for linear regression.

LAGRAMGE LAGRAMGE, and it’s precursor LAGRANGE, were the first in a line of law discovery systems for differential equations. LAGRAMGE takes as input time series for multiple variables, an indicator that identifies the dependent variable, and knowledge about the structure of plausible solutions. As output, the system produces an algebraic or differential equation for the dependent variable. LAGRAMGE has been applied in ecosystem dynamics, fjord hydrodynamics, and other domains.

Discovering Descriptive Knowledge: Summary The computational scientific discovery has a long history particularly in the context of descriptive knowledge. Such systems have played a large role in exploring, analyzing, and understanding data. Work in this area laid the foundations for the field of data mining both in terms of research and applications. However, the discovery of descriptive knowledge can lead to a shallow interpretation of data; generally avoids statements of causality; and makes limited contact with the rich, theoretical content of a scientific discipline Next we will discuss systems that address these concerns.