Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 18, 2005.

Similar presentations


Presentation on theme: "Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 18, 2005."— Presentation transcript:

1 Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 18, 2005

2 2 Administrivia  Next readings and summaries:  Dong and Halevy on Personal Info Management  2 paragraph summary of the problems they focus on, key contributions  From Piazza to pizza … and scheduling

3 3 Today’s Trivia Question

4 4 Our Discussion  The SW as originally posed:  RDF as “semantic” format  Also RDFS schema format  Ontologies as the standard way of defining concepts  Description logics are the way most ontologies are defined (OWL language)  Piazza PDMS:  Relations and views  Query language as mapping language  Transitive closure of composition of mappings

5 5 Peer Data Management: Decentralized Mediation for Ad Hoc Extensibility DB Projects UPennUW Stanford IIT Mumbai Data integration: 1 mediated schema, m mappings to sources Peer data management system (PDMS):  n mediated “peer schemas,” as few as (n - 1) mappings between them – evaluated transitively  m mappings to sources

6 6 Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) SameProject(a1,a2,p) Author(a1,w) Author(a2,w) ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2)CoAuthor(a2,a1) S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1) q r0 r1 r3 r2 Q’(a1,a2) :- S1(a1,p,_), S1(a2,p,_), S2(a1,a2)  S1(a1,p,_), S1(a2,p,_), S2(a2,a1)

7 7 RDF vs. XML  RDF explicitly names relationships: (book, title, “ABC”) (book, writtenBy, author) (author, name, “John Smith”)  XML does not always: 1. ABC John Smith 2. ABC John Smith titlename book author writtenBy

8 8 RDF vs. XML 2  RDF is subject-neutral (a graph)  XML centers around a subject (a tree): 1. ABC John Smith 2. John Smith ABC  This may result in duplication of contained objects

9 9 An XML Version of the Semantic Web Data model: XML + Schema  Vast volumes of data already in XML (or exported as XML)  CAVEAT: not all relationships are labeled in XML (“XML has no semantics.”) Concepts: Views ≈ classes; schemas ≈ ontologies  Views define membership via queries; can reason about containment  CAVEAT: less expressive than OWL classes Schema mappings: target schema as query over source Sophisticated reasoning about mappings is possible by extending existing data integration techniques  Can use mappings in in “forward” and “reverse” directions  Allows for “chaining” of mappings to answer queries

10 10 Piazza with XML (WWW03) Goals:  Build on XQuery and XML (extended with RDF-style identity, following lead of [Patel-Schneider & Simeon 02])  Remain computationally inexpensive  Capture the common mapping types Directional mapping language based on templates {: $var IN document(“doc”)/path WHERE condition :} $var  Translates between parts of data instances  Restricted subset of XQuery that’s decidable to reason about  Supports special annotations and object fusion Can map XML-XML, XML-RDF, RDF-XML (at data level)

11 11 Mapping Example between XML Schemas Target: pubs book* title author* name Source: authors author* full-name publication* title pub-type pub-type name publication author writtenBy title

12 12 Example Piazza Mapping {: $a IN document(“…”)/authors/author, $an IN $a/full-name, $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” PROPERTY $t >= ‘A’ AND $t {$t} {$an}

13 13 Challenges  Query reformulation for XML is significantly harder  Hierarchy, 1:n schema constraints, ability to map from values to tags, …  Redundant paths  Can only do ~ the XML equivalent of conjunctive queries  See the WWW03 paper (plus later work by Yu and Popa, Deutsch et al., many others) for details

14 14 What about Values?  Thus far, we’ve focused on schema mappings  Almost as important in the real world: mappings of values to values  Proteins to binding sites  SSNs to customer IDs  etc.  The Hyperion system (KAM 03) focuses on computing transitive relationships between mappings  In many cases, we only have partial transitive mappings  Key idea: divide all of the mappings into partitions, each of which can compute transitive closures separately

15 15 Assessment: The Semantic Web  The KB world focuses on expressively capturing concepts  The DB world focuses on integrating and restructuring data (but views are less expressive in certain ways)  Do either of these seem likely to change the world?  What barriers need to be removed?

16 16 From Managing the Web as a Database to Managing Databases of Databases  Many common operations in:  Data integration  Data interchange  Schema design  Semantic Web  Schema maintenance/evolution  For instance:  Creating a mediated schema  Defining mappings between schemas  Seeing what’s different between schemas  The vision: let’s build a system to manage metadata, not data!

17 17 Metadata Management  The challenges:  There are lots of metadata representations  Different data models; different definition types (e.g., Java classes, XML Schemas, SQL DDL, …)  Many of the problems are unsolvable in the abstract  e.g., schema matching  But maybe we can customize tools for each task  And maybe we can get user input to help  We want to create a clean, composable model of operators  Should be “algebraic” in some sense, with nice properties  Operators need to be generic but extensible

18 18 Data vs. Metadata vs. … Data  We know what this is Metadata (models)  Schemas, types, classes, etc. Metamodels  Things like the relational model, O-R model, …  Bernstein focuses on managing models, with customization for each metamodel (and perhaps special domains)

19 19 Models  A model is a set of objects with identity  Objects have at least extended ER-style traits:  attributes/properties  is-a, has-a relationships  loose associations  All of these are assumed to have types

20 20 Mappings A mapping describes a correspondence between parts of two models; it may be annotated with information about computing the transformation Emp Emp# Name Address Map ee 1=1= 2≈2≈ Employee EmployeeID FirstName LastName Phone

21 21 The Basic Algebraic Operators Match Basically, schema matching: takes two models and returns a mapping between them Elementary vs. complex match; reliance on morphisms Compose Takes two mappings and composes them Diff Takes a model A, a mapping A  B, and returns the part of A that’s not mapped ModelGen Takes model A, creates new model B plus mapping A  B Merge Takes models A, B, mapping between them, returns the union C, plus mappings A  C, B  C

22 22 Model Management in Action

23 23 Schematic of Changes the new parts in S2 that need to be propagated to d2 Dest. w/o deleted items from s1 the XML version of s2

24 24 Actual Operations

25 25 What’s Hard?  Match  We saw that LSD is far from perfect, and it’s the best out there…  Merge  Can we make (A merge B) merge C = A merge (B merge C)?  (Buneman, Davidson, Kosky 92)  With Diff, how do we ensure a well-formed model as the result?  They return a copy of the model, plus mappings showing what is actually part of the diff  Composition – it isn’t always closed within the mapping language!

26 26 More Challenges  What about:  Semantics of the meta-model – how do we handle, e.g., constraints?  What to do about approximate correspondences?  Can we actually make these things generic but expressive enough to be useful?  Do you think this vision is feasible?


Download ppt "Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 18, 2005."

Similar presentations


Ads by Google