IDEAS 2011 I nternational D atabase E ngineering & A pplications S ymposium September 21-23, Lisbon – Portugal Aggregates and Priorities in P2P Data Management.

Slides:

Advertisements

Similar presentations

Applications Computational LogicLecture 11 Michael Genesereth Spring 2004.

Advertisements

Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance [1] Pirooz Chubak May 22, 2008.

2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.

CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.

CS CS4432: Database Systems II Logical Plan Rewriting.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.

CS4432: Database Systems II Query Operator & Algebraic Expressions 1.

1 Lecture 12: Further relational algebra, further SQL

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.

1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.

Chapter 2 Relational Model (part II) Hankz Hankui Zhuo

Data Integration Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

CS 4432logical query rewriting - lecture 151 CS4432: Database Systems II Lecture #15 Logical Query Rewriting Professor Elke A. Rundensteiner.

 x (x 2  0) 1. True2. False.  x (3x + 2 = 12) 1. True2. False.

Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.

Instructor: Mohamed Eltabakh

CS405G: Introduction to Database Systems Final Review.

Joint work with Werner Nutt Free University of Bozen-Bolzano Completeness of Queries over Incomplete Databases Simon Razniewski.

Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 8 Chapter 5: CPU Scheduling.

Luís Moniz Pereira Centro de Inteligência Artificial - CENTRIA Universidade Nova de Lisboa, Portugal Pierangelo Dell’Acqua Aida Vitória Dept. of Science.

Midterm 1 Concepts Relational Algebra (DB4) SQL Querying and updating (DB5) Constraints and Triggers (DB11) Unified Modeling Language (DB9) Relational.

Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.

Relational Model Concepts. The relational model represents the database as a collection of relations. Each relation resembles a table of values. A table.

DATABASE TRANSACTION. Transaction It is a logical unit of work that must succeed or fail in its entirety. A transaction is an atomic operation which may.

Structured Query Language. Group Functions What are group functions ? Group Functions Group functions operate on sets of rows to give one result per group.

Lparse Programs Revisited: Semantics and Representation of Aggregates Guohua Liu and Jia-Huai You University of Alberta Canada.

Computing & Information Sciences Kansas State University Monday, 08 Sep 2008CIS 560: Database System Concepts Lecture 5 of 42 Monday, 08 September 2008.

Yufis Azhar – Teknik Informatika – UMM.  Aggregation function takes a collection of values (of a single attribute) and returns a single value as a result.

Hippo a System for Computing Consistent Query Answers to a Class of SQL Queries Jan Chomicki University at Buffalo Jerzy Marcinkowski Wroclaw University.

Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Chapter 6 The Relational Algebra Copyright © 2004 Ramez Elmasri and Shamkant Navathe.

Chapter 5 Relational Algebra and Relational Calculus Pearson Education © 2009.

Relational Algebra Instructor: Mohamed Eltabakh 1 Part II.

Approximation Algorithms Department of Mathematics and Computer Science Drexel University.

©Silberschatz, Korth and Sudarshan3.1Database System Concepts Extended Relational-Algebra-Operations Generalized Projection Aggregate Functions Outer Join.

1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

Relational Algebra Instructor: Mohamed Eltabakh 1.

Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Helsinki University of Technology Systems Analysis Laboratory Incomplete Ordinal Information in Value Tree Analysis Antti Punkka and Ahti Salo Systems.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 5 SQL.

SQL Aggregeringsfunktioner. AGGREGATE FUNCTIONS Include COUNT, SUM, MAX, MIN, and AVG Query 15: Find the maximum salary, the minimum salary, and the average.

Chapter 3: Relational Model III Additional Relational Algebra Operations Additional Relational Algebra Operations Views Views.

2.5 The Fundamental Theorem of Game Theory For any 2-person zero-sum game there exists a pair (x*,y*) in S  T such that min {x*V. j : j=1,...,n} =

Answering Queries Using Views Presented by: Mahmoud ELIAS.

1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.

Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 6: Formal Relational.

Chapter 71 The Relational Data Model, Relational Constraints & The Relational Algebra.

CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre

CS580 Advanced Database Topics

Module 2: Intro to Relational Model

CS257 Query Optimization.

CS 9633 Machine Learning Concept Learning

Chapter 3 Introduction to SQL(3)

CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.

Chapter 3: Relational Model III

Chapter 2: Intro to Relational Model

CS405G: Introduction to Database Systems

Relational Algebra : #I

Instructor: Mohamed Eltabakh

Normalization cs3431.

Logic Based Query Languages

Query Functions.

Chapter 2: Intro to Relational Model

Example of a Relation attributes (or columns) tuples (or rows)

Chapter 2: Intro to Relational Model

Equivalence of Aggregate Queries in Conjunctive QL

CS 405G: Introduction to Database Systems

Presentation transcript:

IDEAS 2011 I nternational D atabase E ngineering & A pplications S ymposium September 21-23, Lisbon – Portugal Aggregates and Priorities in P2P Data Management Systems DEIS – University Of Calabria - Italy Luciano Caroprese - Ester Zumpano

P2P Systems Peer D P4 P3 P2 P1 Query P2P System Autonomous system Import Export Import/export data from/to other peers IC Imported data should not ‘violate’ local integrity constraints

P1P1 FOL semantics: q(X) r(X) r(a) r(b) q(a) q(b) The whole system is inconsistent! To “isolate” the inconsistency… q(a) q(b) q(X) r(X) P2P2 X=Y  q(X), q(Y) r(a) r(b) q(a) q(b) AN EXAMPLE q(a) q(b) X=Y  q(X), q(Y) r(a) r(b) The P2P system is consistent after removing inconsistent P 2

AN EXAMPLE P1P1 q(X) r(X) P2P2 r(a) r(b) Which are the ‘true’ atoms? 2 possible scenarios : M 1 ={ r(a), r(b), q(a) q(b) M 2 ={ r(a), r(b), The first step is… …modeling mapping rules to capture this semantics q(a)} q(b)} X=Y  q(X), q(Y)

Our proposed semantics for mapping rules P1P1 q(X) r(X) P2P2 FOL semantics: q(X) r(X) Satisfied if… Val(q(X)) ≥ Val(r(X)) New semantics: Satisfied if… Val(q(X)) ≤ Val(r(X)) Possible scenarios: r(a) r(b) q(b)r(a) r(b) q(a)r(a) r(b) r(a) r(b) q(a)q(b) X=Y  q(X), q(Y)

Maximal weak model semantics P1P1 q(X) r(X) P2P2 Our system q(X) r(X) Its weak models… M 3 ={r(a), r(b), q(b) } M 2 ={r(a), r(b), q(a) } M 1 ={r(a), r(b)} r(a) r(b) q(X), q(Y), X≠Y r(a) r(b) PS Maximal models are those that contain maximal subset of imported atoms (M 2, M 3 ) In each weak model we look for imported atoms. MWM(PS)={M 2,M 3 } Its maximal weak models X=Y  q(X), q(Y)

Modeling a P2P system with a disjunctive logic program with priorities An equivalent characterization The program: a  b c c A (positive) exclusive disjunctive Datalog rule is of the form: A 1  …  A m B 1, …,B n M 1 = { a, c }, M 2 = { b, c }. A priority rule is of the form: a ≥ b The preference rule intuitively reads: a is preferable over b. Thus M 1 is the preferred minimal model.

Modeling a P2P system with a disjunctive logic program with priorities An equivalent characterization If r(X) is true in the source peer then it is possible either to import or not to import q(X) in the target peer. q(X) r(X) q(X)   q’(X) r(X) Obviously, we prefer to import as much knolewdge as possible in each peer. Thus… q(X) ≥  q’(X)

Preferred Minimal Model Semantics P1P1 q(X) r(X) P2P2 r(a) r(b) Our system becomes… q(X) r(X) q(X), q(Y), X≠Y r(a) r(b) PS q(X) ≥  q’(X) q(X)  q’(X) r(X) Its minimal models… M 1 ={r(a), r(b), q’(a),q’(b)} M 2 ={r(a), r(b),q(a), q’(b) M 3 ={r(a), r(b), q’(a), q(b) } Used to select preferred models… Its preferred minimal models PMM(PS) Deleting primed atoms, we obtain… = MWM(PS) } X=Y  q(X), q(Y)

The problem of this framework is that is does not allow to set preferences among Maximal Weak Models. Example… ``in the case of conflicting information, it is preferable to import data from the neighbor peer that can provide the maximum number of tuples“ ``in the case of conflicting information, it is preferable to import data from the neighbor peer such that the sum of the values of an attribute is minimum" P1P1 cons(1,N,S) emp(N,S) P1P1 P3P3 emp(john,200), P2P2 Pa=Pb  cons(Pa,Na,Sa),. cons(Pb,Nb,Sb) emp(mary,50), emp(tom,50)} emp(dan,200), emp(lucy,50)} cons(2,N,S) emp(N,S) DB 1={ DB 2={ M 1 ={ cons(1,john,200), cons(1,mary,50), cons(1,tom,50)} U DB 1 U DB 2 M 2 ={ cons(2,dan,200), cons(2,lucy,50)} U DB 1 U DB 2

We introduce in our framework: 1)aggregate functions 2)priorities New Framework… cons(1,mary,50) cons(1,tom,50) cons(2,dan,200) cons(2,lucy,50) cons(1,john,200) b(Source, > )  cons(Source,Name,Salary) s(Source, )  cons(Source,Name,Salary) b(1, {200,50,50} ) s(1, {200,50} ) Bag Set b(2, {200,50} ) s(2, {200,50} ) DB 1 U DB 2 M1M1 M2M2

New Framework… cons(1,mary,50) cons(1,tom,50) cons(2,dan,200) cons(2,lucy,50) cons(1,john,200) s(1, 300 )s(2, 250 ) To bags and sets we can apply many Aggregate Functions: 1)MIN / MAX 2)AVG (Average) 3)Count 4)SUM S(Source,SUM >)  cons(Source,Name,Salary) DB 1 U DB 2 M1M1 M2M2

Using aggregate functions we can derive aggregate data. Then we can apply priority rules. Our goal… P1P1 cons(1,N,S) emp(N,S) P1P1 P3P3 emp(john,200) P2P2 Pa=Pb  cons(Pa,Na,Sa), cons(Pb,Nb,Sb) emp(mary,50) emp(tom,50) emp(dan,200) emp(lucy,50) cons(2,N,S) emp(N,S) S(Source,SUM >)  cons(Source,Name,Salary) :: LP: IC: M 1 ={cons(1,john,200),cons(1,mary,50),cons(1,tom,50),S(1,300)} U DB 1 U DB 2 M 2 ={cons(2,dan,200),cons(2,lucy,50),S(2,250)} U DB 1 U DB 2

The complete framework allows to define many levels of preferences! Levels of preferences… P1P1 cons(1,N,S) emp(N,S) P1P1 P3P3 emp(john,200) P2P2 Pa=Pb  cons(Pa,Na,Sa),. cons(Pb,Nb,Sb) emp(mary,50) emp(dan,150) emp(lucy,50) cons(2,N,S) emp(N,S) S(Source,SUM >)  cons(Source,Name,Salary) Count 2 }, :: LP: IC: We prefer to import tuples from the peer that can provide the maximum number of tuples. In the case the peers provide the same number of tuples we prefer to import tuples from the peer that can provide tuples s.t. the total amount of the salary is minimum. C(Source,COUNT >)  cons(Source,Name,Salary) {S(P 1,Sum 1 ) ≥ S(P 2,Sum 2 ) | Sum 1

The complete framework allows to define many levels of preferences! Levels of preferences… P1P1 cons(1,N,S) emp(N,S) P1P1 P3P3 emp(john,200) P2P2 Pa=Pb  cons(Pa,Na,Sa),. cons(Pb,Nb,Sb) emp(mary,50) emp(dan,150) emp(lucy,50) cons(2,N,S) emp(N,S) S(Source,SUM >)  cons(Source,Name,Salary) Count 2 }, :: LP: IC: C(Source,COUNT >)  cons(Source,Name,Salary) {S(P 1,Sum 1 ) ≥ S(P 2,Sum 2 ) | Sum 1 M 1 ={cons(1,john,200),cons(1,mary,50), C(1,2),S(1,250)} U DB 1 U DB 1 M 2 ={cons(2,dan,150),cons(2,lucy,50), C(2,2),S(2,200)} U DB 1 U DB 2

We allow many levels of priorities. Extended Prioritized Logic Program The program: a  b c c M 1 = { a, c,d }, M 2 = { b, c,d}. M 3 = { a, c,e}, M 4 = { b, c,e}. A preference rule is of the form: We first apply the first level containing a ≥ b and then the second level containing d ≥ e d  e c

We extend the previous rewriting allowing levels of priorities. Let us suppose that our P2P system has just a mapping rule and the following levels of priorities. Priorities: q(a) r(a) q(a)   q’(a) r(a) <{q(a) ≥  q’(a)},  1,...,  n > Computation The levels of priorities will be used sequentially in order to select the preferred stable models of the logic program. The priorities derived from mapping rules are the most important (the first level)!

This work enhances a previous semantics for P2P systems, introducing aggregate functions and priorities in order to define preferences among maximal weak models. Presents an alternative characterization of the proposed semantics rewriting the P2P system into an extended prioritized logic program. Conclusions

The problem of deciding whether an atom is true in some preferred weak models is  - complete. The problem of deciding whether an atom is true in all preferred weak models is  -complete. Complexity Results 2 p 2 p