Presentation is loading. Please wait.

Presentation is loading. Please wait.

The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 1 ICDE 06 04.05.06.

Similar presentations


Presentation on theme: "The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 1 ICDE 06 04.05.06."— Presentation transcript:

1 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 1 ICDE 06 04.05.06 Probabilistic Message Passing in Peer Data Management Systems Philippe Cudré-Mauroux, EPFL Joint work with: Karl Aberer (EPFL) Andras Feher (T.U. Darmstadt)

2 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 2 Overview of the talk Data Integration in Large-Scale Information Systems –Peer Data Management Systems (PDMS) Query Routing in PDMS –Precision / Recall tradeoff Probabilistic Message Passing –Deriving quality measures for the mappings Conclusions

3 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 3 Classical Data Integration: LAV/GAV Traditional database techniques (e.g., LAV/GAV) rely on centralized schemas to integrate data sources Not applicable to large-scale, decentralized contexts –Scale (upper ontologies?) –Churn –Autonomy How can we foster semantic interoperability in decentralized settings? Date myDate yourDate m(yourDate) = Date m(myDate) = Date

4 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 4 Peer Data Management Systems (1) Q1= $p/GUID FOR $p IN /Photoshop_Image WHERE $p/Creator LIKE "%Robi%" 178A8CD8865 Robinson Tunbridge Wells Royal Council … Photoshop (own schema) 178A8CD8866 Henry Peach Robinson Photographer Tunbridge Council … WinFS (known schema ) T12 = $fs/GUID $fs/Author/DisplayName FOR $fs IN /WinFSImage Q2= $p/GUID FOR $p IN T12 WHERE $p/Creator LIKE "%Robi%"  Extending data integration techniques to decentralized settings

5 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 5 Peer Data Management Systems (2) Pairwise mappings Local mappings overcome global heterogeneity –Iterative query reformulation 2001-12- 19T18:49:03Z 2001-12- 19T20:09:28Z date? 05/08/2004 Jan 1, 2005 article weather es:cDate  xap:CreateDate es:cDate  myRDF :Date myRDF: Date  xap:ModifyDate

6 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 6 PDMS Examples Some academic systems –Piazza –Hyperion –BestPeer –GridVine –… Out there on the Internet –The Sequence Retrieval System (SRS) 388 schemata (May 05, EBI repository) 518 mappings (ID ID) Power-law distribution of node degrees Clustering coefficient = 0.32 Diameter = 9 –Semantic Overlay Networks P2P + semi-structured data –The Semantic Web

7 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 7 Data in large-scale PDMS Large-Scale PDMS –Number of sources > 100 –Unreliable data Autonomy –Semi-structured data E.g., XML/RDF –No integrity constraints –No transactions –Simple SP queries E.g., triple patterns, ranking –Schemata created by end users –Network churn Distributed Databases –Number of sources < 100 –Consistent data Coordination –Structured data E.g., Relational data model –Integrity constraints –Transactions –Powerful queries E.g., SQL, aggregation –Schemas created by administrators –Relatively Fixed topology VS

8 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 8 Problem: Precision/Recall Tradeoff (1) Semantic Query routing –To whom shall I forward a query posed against my local schema? Some (most) mappings will be (partially) faulty –Low expressive power of mapping languages samePropertyAs / sameClassAs / subclassOf … or event worse (Microformats) –Automatic schema alignment techniques –Different views on conceptualizations Local query resolution –Low recall Flooding (PDMS so far) –Low precision

9 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 9 Problem: Precision/Recall Tradeoff (2) Standard deductive integration is not sufficient –Uncertainty on mappings and conceptualizations Probabilistic Message Passing –Deriving quality measures for the mappings Reduces uncertainty Used to route query / optimize mappings –Based on a notion of agreement on conceptualizations Decentralized decision making, Emergent Semantics From Schema Matching to Probabilistic Message Passing 1.Automatic Schema Matching INPUT: 2 schemas + data OUTPUT: 1 mapping 2.Probabilistic Message Passing INPUT: n schemas and m mappings OUTPUT: quality measures for the mappings

10 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 10 Probabilistic Message Passing Link-based analysis of the PDMS -Automatically deriving quality measures for the mappings Transitive closures of mapping operations -Mapping Cycles -Parallel Paths q VS m 3 (m 4 (m 0 (q))) m0m0 m3m3 m4m4 f0f0 art/Creator? VS art/creatDate? q:art/Creator?

11 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 11 On Cycles / parallel paths m0m0 m1m1 m2m2 m3m3 m4m4 m5m5 f0f0

12 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 12 Computing a Marginal for one cycle P(m 0, m 3, m 4, f 0 ) = P(m 0 ) P(m 3 ) P(m 4 ) P(f 0 | m 0, m 3, m 4, ) P(m 0 | f 0 )=  m3, m4 P(m 0, m 3, m 4, f 0 ) P(f 0 ) -1 But: feedbacks on different cycles are correlated –One wrong mapping will affect several cycles/paths –Need to express a global probabilistic model for the mapping graph observedunknown

13 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 13 A Brief Intro to Factor-Graphs g(x1, x2, x3, x4) = fA(x1, x2)fB(x2, x3, x4)

14 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 14 Deriving PDMS Factor-Graphs Abductive reasoning on transitive closures of mappings a priori information on mapping

15 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 15 PDMS Factor-Graphs Cyclic graph –Junction Tree? Clustering / Stretching of variables? Centralization Computational + communicational overhead –Iterative Sum-Product Approximate results How to perform iterative sum-product by message passing on the mapping graph? –Message passing in factor graph does not correspond to connectivity of mapping graph –We want to rely on decentralized computations only Locality VS Globality of nodes in the factor graph –Mappings: local –Feedback factor: common, global knowledge –Observed feedback variables: neighborhood

16 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 16 Embedded Message-Passing (1)

17 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 17 Embedded Message-Passing (2)

18 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 18 Message Passing Decentralized computations Computationally inexpensive –Sums and Products Message-Passing Schedules –Periodic –Lazy (piggybacking on query forwarding) No message overhead

19 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 19 Implemented System Schemas –Import from OWL (Web Ontology Language) Mappings –KnowledgeWeb Ontology Alignment API –Import from RDF/XML –Automated on-the-fly creation –Comparison to standard alignments  Automatic derivation of quality measures P(m=correct | {F}) for the mappings using iterative message-passing  Query routing based on the quality measures Precision / recall tradeoff

20 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 20 Some (Preliminary) Results: Convergence (undirected example graph, prior 0.7 delta 0.1)

21 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 21 Fault-tolerance (faulty links) (undirected example graph, prior 0.8 delta 0.1)

22 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 22 Detecting Erroneous Mappings (random network of 50 schemas and 200 mappings, no prior information)

23 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 23 Conclusions Deriving quality measures for PDMS mappings –Automated process –Decentralized computations –Based on agreements on conceptualizations Emergent Semantics Current work –More expressive mappings E.g., subsumption –Integration in the GridVine semantic overlay network –Application to other domains Web Services composition?

24 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 24 Thank you for your attention Web page: lsirpeople.epfl.ch/cudre Questions?

25 The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 25


Download ppt "The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 1 ICDE 06 04.05.06."

Similar presentations


Ads by Google