Relative Information Capacity of Simple Relational Database Schemata Paper by: Richard Hull Presented by: Jose Picado.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

T YPE I SOMORPHISM O LA M AHMOUD S UPERVISED BY : D R. M ARCELO F IORE UNIVERSITY OF CAMBRIDGE Computer Laboratory Theory & Semantics Group I NTRODUCTION.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Dr. A.I. Cristea CS 319: Theory of Databases: FDs.
From Handbook of Temporal Reasoning in Artificial Intelligence By Jan Chomicki & David Toman Temporal Databases Presented by Leila Jalali CS224 presentation.
Introduction The concept of transform appears often in the literature of image processing and data compression. Indeed a suitable discrete representation.
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 14: DATA PROVENANCE PRINCIPLES OF DATA INTEGRATION.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
CS 440 Database Management Systems Lecture 4: Constraints, Schema Design.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
FALL 2004CENG 351 File Structures and Data Managemnet1 Relational Algebra.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
1 Relational Algebra. 2 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
Relational Algebra, R. Ramakrishnan and J. Gehrke (with additions by Ch. Eick) 1 Relational Algebra.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
1 Relational Algebra. 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of data from a database. v Relational model supports.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
FEN Introduction to the database field:  The Relational Model Seminar: Introduction to relational databases.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
Relational Algebra.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Xin Zhang.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
1 Finite Model Theory Lecture 3 Ehrenfeucht-Fraisse Games.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
UW CSE 503 ▪ Software Engineering ▪ Spring 2004 ▪ Rob DeLine1 CSE 503 – Software Engineering Lecture 3: An introduction to Alloy Rob DeLine 5 Apr 2004.
Relational Algebra p BIT DBMS II.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 5 (Part a): Logical Database Design and the Relational Model Modern Database Management.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Relational Algebra & Calculus
Logical Database Design and the Rational Model
Chapter 4: Logical Database Design and the Relational Model
Relational Algebra Chapter 4 1.
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra.
Relational Algebra 1.
Relational Algebra Chapter 4 1.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
CHAPTER 4: LOGICAL DATABASE DESIGN AND THE RELATIONAL MODEL
Chapter 2: Intro to Relational Model
CENG 351 File Structures and Data Managemnet
Relational Algebra & Calculus
Presentation transcript:

Relative Information Capacity of Simple Relational Database Schemata Paper by: Richard Hull Presented by: Jose Picado

Outline Problem: Data relativism and information capacity – Definition – Examples – Importance Hierarchy of dominance measures Basic results Discussion

Data relativism Represent the same data in different ways

Data relativism Represent the same data in different ways Represent the same data under different schemas

Data relativism Represent the same data in different ways Represent the same data under different schemas Person namesexspouse Schema 1 Example taken from: Kosky, Anhony. Transforming Databases with Recursive Data Structures, 1996.

Data relativism Represent the same data in different ways Represent the same data under different schemas Person namesexspouse Female name Male name Marriage husbandwife Schema 1 Schema 2 Example taken from: Kosky, Anhony. Transforming Databases with Recursive Data Sturctures, 1996.

Relative information capacity Expressiveness of a schema Different schemas representing same data may have different information capacity

Relative information capacity Expressiveness of a schema Different schemas representing same data may have different information capacity Person namesexspouse Female name Male name Marriage husbandwife Schema 1 Schema 2 Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.

Relative information capacity Expressiveness of a schema Different schemas representing same data may have different information capacity Person namesexspouse Female name Male name Marriage husbandwife Schema 1: Does not require that the spouse attribute of a man goes to a woman. Does not require that for each spouse attribute in one direction there is a corresponding spouse attribute in another direction. Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.

Relative information capacity Expressiveness of a schema Different schemas representing same data may have different information capacity Person namesexspouse Female name Male name Marriage husbandwife Schema 2: Allows unmarried people to be represented in the database. Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.

Relative information capacity Possible solution: – Transform existing schema to new schema by structural manipulations Person namesexspouse Female name Male name Marriage husbandwife transformation

Relative information capacity Possible solution: – Transform existing schema to new schema by structural manipulations – Information capacity preserving? Person namesexspouse Female name Male name Marriage husbandwife transformation

Importance Schema evolution – None of the information stored in the initial database is lost Person namesexspouse Female name Male name Marriage husbandwife

Importance Data integration – All information in one of the component databases is reflected in the integrated database City namestate State namecapital City nameisCapitalcountry Country namelanguagecurrency City nameplace Country namelanguagecurrencycapital State namecapital Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.

Importance Database normalization theory User view construction Schema simplification Translation between data models

Hulls paper Introduces theoretical tools for studying measures of relative information capacity – Theoretical frameworks at the time were complex – There was no clear definition about the concept – Hull introduced nice ways of comparing schemata and their information capacity Defines a hierarchy of measures to compare information capacity of schemata

Hulls paper Gives some basic results concerning the previous measures Considers only non-keyed relations Person idnam e Person idnam e 123John 123Mary 123John 123Mary Non-keyed Keyed Instances: Relations:

Definitions Schema P is a set of relations Relations composed of attributes, which may be of different basic types Basic types are domain designators (have a fixed domain of possible values) I(P) is the instances of P, usually infinite Person idnam e 111John 222Mary 123Anne 234Joe aaaJack bbbTed Schema P Instances I(P) …

Transformation P and Q are relational schemata A transformation from P to Q is a map

Transformation P and Q are relational schemata A transformation from P to Q is a map P Person idname Birth iddate

Transformation P and Q are relational schemata A transformation from P to Q is a map P Q PersonInfo idnamebdate Person idname Birth iddate

Transformation P and Q are relational schemata A transformation from P to Q is a map P Q PersonInfo idnamebdate Person idname Birth iddate PersonInfo(x,y,z) :- Person(x,y), Birth(x,z).

Dominance P and Q are relational schemata Q dominates P via if the composition of followed by is the identity on P

Dominance Person namesexspouse Female name Male name Marriage husbandwife P Q

Dominance 1.Take instances of P: I(P) Person JohnmaleMary femaleJohn AnnefemaleJoe maleAnne

Dominance 2.Apply to I(P) Male(x) :- Person(x,y,z), y=male. Female(x) :- Person(x,y,z), y=female. Marriage(x,y) :- Person(x,u,y), Person(y,v,x), u=male, v=female Male John Joe Female Mary Anne Marriage JohnMary JoeAnne Person JohnmaleMary femaleJohn AnnefemaleJoe maleAnne

Dominance 3.Apply to (I(P)) Person(x,male,z) :- Male(x), Marriage(x,z). Person(x,female,z) :- Female(x), Marriage(x,z). Male John Joe Female Mary Anne Marriage JohnMary JoeAnne Person JohnmaleMary femaleJohn AnnefemaleJoe maleAnne

( (I(P))) Dominance 4.Compare I(P) and ( (I(P))) Person JohnmaleMary femaleJohn AnnefemaleJoe maleAnne Person JohnmaleMary femaleJohn AnnefemaleJoe maleAnne I(P)

Dominance P and Q are relational schemata Q dominates P via if the composition of followed by is the identity on P Q has at least as much capacity for storing information as P Information structured according to P can be restructured to fit into Q, and restructured again to fit into P

Equivalence P and Q are equivalent (xxx) if they have equivalent information capacity P and Q are equivalent if – Q dominates P (xxx) and – P dominates Q (xxx)

Information dominance measures 1.Calculous dominance 2.Generic dominance 3.Internal dominance 4.Absolute dominance More restrictive Less restrictive

Types of equivalency 1.P and Q are equivalent (calc) 2.P and Q are equivalent (gen) 3.P and Q are equivalent (int) 4.P and Q are equivalent (abs) More restrictive Less restrictive

Level 1: Calculous dominance Only allow transformations to be relational calculus expressions Relational calculus: – First order logic or predicate calculus – Predicates: atom, – Each query Q(x1, …, xn) is a predicate P

Level 1: Calculous dominance Only allow transformations to be relational calculus expressions are relational calculus expressions Q dominates P calculously

Level 2: Generic dominance Only allow transformations that treat domain elements as essentially uninterpreted objects Treat all elements as equals except some set of constants Property of all query languages, such as SQL and Datalog

Level 2: Generic dominance Only allow transformations that treat domain elements as essentially uninterpreted objects treat all elements as equals Q dominates P generically

Level 3: Internal dominance Only allow transformations that do not invent any data Invent data: numerical computations or string manipulations playergoalsgamesplayerperformance performance = goals/games

Level 3: Internal dominance Only allow transformations that do not invent any data do not invent data Q dominates P internally

Level 4: Absolute dominance Some set of values : instances of P that contain only values in Y, where : cardinality of instances of P containing only values in Y If then Q dominates P absolutely Easy to compute: based on counting of instances, instead of transformations

Basic results Q dominates P calculously Q dominates P generically Q dominates P internally Q dominates P absolutely

Basic results Sometimes absolute and internal dominance hold, but generic and calculous dominance dont AA BB AB Q P Q dominates P (abs, int) and transformation (int) does not invent data Q does not dominate P (gen, calc) There is no transformation (gen, calc) that takes instances of P to Q and then back to P

Basic results Absolute dominance useful for verifying calculous (not) dominance AB AC ABC Q P Q dominates P calculously Q dominates P absolutely P does not dominate Q absolutely P does not dominates Q calculously *under certain constraints

Basic results Dominance is preserved by re-namings of basic types (homomorphism) – h(P): homomorphism of P – If Q dominates P then h(Q) dominates h(P) for any measure of dominance (calc, gen, int, abs)

Basic results Calculous dominance does not accurately measure the presence of semantic correspondence

Basic results Calculous dominance does not accurately measure the presence of semantic correspondence namepositiongoals namegoalsminutes S1 R1 NAMENUMBER NAME NUMBER titlepublisherpages titlepagesedition S2 R2 P

Basic results Calculous dominance does not accurately measure the presence of semantic correspondence NAME NUMBER T P Q namepositiongoals namegoalsminutes S1 R1 NAMENUMBER NAME NUMBER titlepublisherpages titlepagesedition S2 R2

Basic results Calculous dominance does not accurately measure the presence of semantic correspondence NAME NUMBER T P Q Q dominates P (calc), but there is not semantic mapping from P to Q namepositiongoals namegoalsminutes S1 R1 NAMENUMBER NAME NUMBER titlepublisherpages titlepagesedition S2 R2

Basic results If only non-keyed relational schemata with only one basic type, then all types of dominance are equivalent Theorem: Let P and Q be non-keyed relational schemata over a single basic type B. Then the following are equivalent: a.Q dominates P (calc) b.Q dominates P (gen) c.Q dominates P (int) d.Q dominates P (abs)

Basic results With any reasonable measure of relative information capacity, two non-keyed relational schemata are equivalent iff they are identical In the relational model (non-keyed), there is essentially at most one way to represent a given data set

Discussion Strong points: – ???

Discussion Strong points: 1.Provides a theory to study relative information capacity

Discussion Strong points: 1.Provides a theory to study relative information capacity 2.Data relativism is important as it arises in many areas

Discussion Strong points: 1.Provides a theory to study relative information capacity 2.Data relativism is important as it arises in many areas 3.Defines a hierarchy of dominance measures

Discussion Strong points: 1.Provides a theory to study relative information capacity 2.Data relativism is important as it arises in many areas 3.Defines a hierarchy of dominance measures 4.Gives important results about the relational model

Discussion Weak points: – ???

Discussion Weak points: 1.Does not support dependencies/constraints Hierarchy of dominance measures Basic results

Discussion Functional dependency (FD): Given attributes in relation R, the functional dependency means that all tuples in R that agree on attributes must also agree on. idnameaddress 123John21 Kings St. 234Mary31 Kings St.

Discussion Multivalued dependency (MVD): For MVD, if two tuples of R agree on all the attributes of X, then their components in Y may be swapped, and the result will be two tuples that are also in the relation. coursebooklecturer Machine Learning Pattern Recognition John Artificial Intelligence AIMAMary

Discussion Inclusion dependency (IND): For, for any tuple t1 in R1, there must exist a tuple t2 in R2, such that idtitle 111Pattern Recognition 222AIMA bookidcustomer 111John 222Mary Book Order

Discussion Weak points: 1.Does not support dependencies/constraints Hierarchy of dominance measures Basic results Dependencies change the final result of the paper

Discussion Weak points: 1.Does not support dependencies/constraints Hierarchy of dominance measures Basic results 2.Open questions: Absolute dominance implies internal dominance? Generic dominance implies calculous dominance? Is there a measure for semantic correspondence?

Thank you

Quiz What are the four formal measures of relative information capacity defined by Hull? Write them in order from most restrictive to less restrictive.