1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ.

1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ

2/27 Motivation Advanced applications They must deal with semantically rich data They use a high-level SQL-like query language Applications Epidemiological study Astronomic data sharing Little work on managing data replication in the presence of updates Gnutella and Kaaza: static files (no updates) Freenet: update propagation downward to close connect peers ActiveXML: on demand (web services) P-Grid: rumor spreading (probabilistic guarantees for consistency)

3/27 Motivation Replication in distributed systems Synchronous replication (ROWA) Asynchronous replication Preventive replication Optimistic replication Rumor spreading We propose a new P2P system to address Data replication in the context of advanced applications Query processing in the presence of advanced replication capabilities

4/27 Outline Motivation APPA Architecture Data Replication Query Processing Validation Conclusion Motivation APPA Architecture Data Replication Query Processing Validation Conclusion

5/27 APPA Architecture APPA P2P Network Key-based Storage and Retrieval Peer Linking Peer ID Assignment Peer Communication Advanced Services ReplicationCachingQuery Processing... Internet Basic Services ConsensusP2P Data Management …

6/27 Outline Motivation APPA Architecture Data Replication Query Processing Validation Conclusion

7/27 Data Replication Replication Model Assumptions Frequent and unpredictable network changes Small world Based on lazy multi-master scheme Log-based reconciliation to solve replica divergence Schema management r-lsd: local schema description of relation r r-csd: common schema description of relation r Each peer defines mapping functions between r-lsd and r-csd Data storage Each peer stores tuples using r-lsd and r-csd schemas Updates in one schema are mapped to the other Multi-master groups

8/27 Data Replication Reconciliation Properties Eventual consistency: when all clients stop submission of update actions, all replicas eventually achieve the same values Mergeability: it is possible to schedule any arbitrary collection of log operations respecting constraints Eventual decision: a decision is taken for each submitted action Eventual propagation: actions and constraints known at peer “p” at time “t” are eventually known by an arbitrary peer of the group Safe decisions: peers may not make conflicting decisions

9/27 Data Replication Reconciliation Solutions IceCube (Microsoft Research) Centralized conflict detection and resolution Resolution based on application semantics Non-deterministic resolution APPA Distributed conflict detection and resolution Resolution based on application semantics Deterministic resolution – enables parallelism Considers dynamique connections and disconnections

10/27 Data Replication APPA Distribued Reconciliation Foundation Use a common action log (P2P data) All tentative actions are stored in the action log Action log actions are grouped by time interval (log unit) The resolution (deterministic) is made on demand, comprises a log unit and produces a schedule The schedules are available (P2P data) to all peers Parallelism and distribution - scalability

11/27 Data Replication Distributed Reconciliation

12/27 Data Replication Distributed Reconciliation Log units assure unique vision over unordered actions Log unit life cycle must be managed Decision factor eliminates non-determinism Several peers can reconcile the same log unit concurrently A peer can reuse the reconciliation made by another one A peer can finish the reconciliation started by another one Reconciliation properties are assured Multi-master replication in P2P environment is reached

13/27 Data Replication Service Architecture

14/27 Outline Motivation APPA Architecture Data Replication Query Processing Validation Conclusion

15/27 Query Processing Problem d efinition Consider that Each peer has a local schema to describe their data Peers agree on a Common Schema Description (CSD) Each peer maps its local schema to the CSD Given a user query on a peer schema, the problem is To find the minimum set of peers that should answer the query To execute the query in these peers and return a list of (ranked) answers to the user Assumption A query answer includes data from several multi-master groups (all of them which store relevant data

16/27 Query Processing Proposed Solution

17/27 Query Processing Proposed Solution Query reformulation p:r(A,B,D)  csd:r1(A,B,C), csd:r2(C,D,E) select A,D from r where B=b select A,D from r1,r2 where B=b and r1.C=r2.C Query matching P: set of peers in the P2P system ps(p,r): peer schema of peer “p” involves relation “r” Problem: to find P’  P where each p in P’ has relevant data Result: P’= { p | p  P   r  R ps(p,r) }

18/27 Query Processing Proposed Solution r1r1 r1r1 r 1,s 1 s1s1 t 1,u 1 r 3,s 3 t 1,u 1 v t 2,u 2 r2r2 r 2,s 2 s2s2 s2s2 r 1,s 1 r2r2 s1s1 r 3,s 3 t 2,u 2 P Q = join (r,s,v) Query matching P’ 1 – European data 2 – American data 3 – African data

19/27 Query Processing Proposed Solution Query optimization Consider P’ a set of relevant peers Goal: obtain P’’  P’ such that For any two peers in P’’, their relevant data are not replicated The relevant data of peers in P’’ are equal to that in P’ The cost of query execution by peers in P’’ is minimum Cost function A function of communication, computing power, etc. Phases of optimization Determining relevant replicas for Q’s relations and their peers Determining best peer per replica

20/27 Query Processing Proposed Solution r1r1 r1r1 r 1,s 1 s1s1 t 1,u 1 r 3,s 3 t 1,u 1 v t 2,u 2 r2r2 r 2,s 2 s2s2 s2s2 r 1,s 1 r2r2 s1s1 r 3,s 3 t 2,u 2 P’ Q = join (r,s,v) Query optimization r 3, s 3 r 1, s 1 r2r2 s2s2 v P’’ 1 – European data 2 – American data 3 – African data

21/27 Query Processing Proposed Solution Algorithms Cost parameters t com (r,p): time to send the results of Q concerning to replica r from a peer p to the query originator t resp (r,p): time which p needs to execute the part of Q concerning to replica r and start to send the results to the query originator t djoin (S): time to join the set of replicas S in a distributed way Example rs p1p1 p2p2 p3p3 4 675 t com (r,p 1 ) + t resp (r,p 1 ) = 4 t com (s,p 2 ) + t resp (s,p 2 ) = 7 t com (r,p 2 ) + t resp (r,p 2 ) = 6t com (s,p 3 ) + t resp (s,p 3 ) = 5 t djoin ({r,s}) = 6 Total Cost = 4 + 5 + 6 = 15Total Cost = 6 + 7 = 13

22/27 Query Processing Proposed Solution A none-linear programming model Minimize Complexity

23/27 Query Processing Proposed Solution Algorithms Branch and bound Optimal selection of peers Complexity (worst case): O( ) A heuristic solution While there is an edge in the graph Select the edge with minimum label Set the peer p as selected peer for the replica r Update the label edges of other peers that hold the replica r Remove the replica r and its edges from the graph Complexity: O((m  a) 2 )

24/27 Outline Motivation APPA Architecture Data Replication Query processing Validation Conclusion

25/27 Implementation JXTA Community Applications Sun JXTA Applications JXTA Core JXTA Applications JXTA Services Sun JXTA Services Indexing Discover Search Membership JXTA Shell Peer Commands Peer Groups Peer Advertisements Peer PipesPeer Monitoring Peer IDsSecurity Any Connected Device JXTA Community Services APPA P2P Network Key-based Storage and Retrieval Peer Linking Peer ID Assignment Peer Communication Advanced Services ReplicationCachingQuery Processing... Basic Services ConsensusP2P Data Management GISP

26/27 Simulation JXTA Community Applications Sun JXTA Applications JXTA Core JXTA Applications JXTA Services Sun JXTA Services Indexing Discover Search Membership JXTA Shell Peer Commands Peer Groups Peer Advertisements Peer PipesPeer Monitoring Peer IDsSecurity Any Connected Device JXTA Community Services APPA P2P Network Key-based Storage and Retrieval Peer Linking Peer ID Assignment Peer Communication Advanced Services ReplicationCachingQuery Processing... Basic Services ConsensusP2P Data Management GISP Internet Simulation GT/ITM P2P Simulation Chord Simulator P2P Network Key-based Storage and Retrieval Peer Linking Peer ID Assignment Peer Communication APPA Simulation

27/27 Conclusion Summary Advanced cooperative applications (multi-master replication) A new P2P network-independent data management system A distributed optimistic multi-master replication solution Eventual consistency guarantee A query processing solution based on replication Validation Future work Consider secondary copies Consider replica quality in query optimization Data caching Implementation over other P2P architectures (e.g., flooding)

1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ.

Similar presentations

Presentation on theme: "1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ.

Similar presentations

Presentation on theme: "1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ."— Presentation transcript:

Similar presentations

About project

Feedback