1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Routing Indices For Peer-to-Peer Systems Svetlana Strunjas University of Cincinnati May,2002.
1 P2P Logging and Timestamping for Reconciliation M. Tlili, W. Dedzoe, E. Pacitti, R. Akbarinia, P. Valduriez, P. Molli, G. Canals, S. Laurière VLDB Auckland,
Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara
Company Confidential 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Towards a mobile content delivery network with a P2P architecture Carlos Quiroz.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
JXTA Object Store Tom Boyles SE692 Masters Project.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Distributed Systems 2006 Styles of Client/Server Computing.
JXTA P2P Platform Denny Chen Dai CMPT 771, Spring 08.
CS 582 / CMPE 481 Distributed Systems
Scalable Application Layer Multicast Suman Banerjee Bobby Bhattacharjee Christopher Kommareddy ACM SIGCOMM Computer Communication Review, Proceedings of.
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Object Naming & Content based Object Search 2/3/2003.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
A Local Facility Location Algorithm Supervisor: Assaf Schuster Denis Krivitski Technion – Israel Institute of Technology.
Searching in Unstructured Networks Joining Theory with P-P2P.
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
Distributed Systems Fall 2009 Distributed transactions.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Client-Server Computing in Mobile Environments
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Sun’s Project JXTA Technical Overview Presented by Sergei Kovalenko Red Team: Janhavi James A. Davis Fernando D. Diaz.
Event-Condition-Action Rule Languages over Semistructured Data George Papamarkos.
Chapter 2 CIS Sungchul Hong
Replication Mechanisms for a Distributed Time Series Storage and Retrieval Service Mugurel Ionut Andreica Politehnica University of Bucharest Iosif Charles.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
KEx objectives Supporting distributed and heterogeneous organizations in managing their knowledge processes, by technologically implementing the basic.
Speed-R : Semantic Peer to Peer Environment for Diverse Web Services Registries Kaarthik Sivashanmugam Kunal Verma Ranjit Mulye Zhenyu Zhong Final Project.
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
JuxMem: An Adaptive Supportive Platform for Data Sharing on the Grid Gabriel Antoniu, Luc Bougé, Mathieu Jan IRISA / INRIA & ENS Cachan, France Grid Data.
Enabling Peer-to-Peer SDP in an Agent Environment University of Maryland Baltimore County USA.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Data Management in Large-scale P2P Systems
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
1P2P4mm workshop, Vico Equense 6. June 2008 Information Society Technologies VICTORY – a multimodal, cross-platform and distributed multimedia repository.
Deterministic Distributed Resource Discovery Shay Kutten Technion David Peleg Weizmann Inst. Uzi Vishkin Univ. of Maryland & Technion.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
The Biologically Inspired Distributed File System: An Emergent Thinker Instantiation Presented by Dr. Ying Lu.
Peer-to-Peer Data Management
Chapter 25: Advanced Data Types and New Applications
CHAPTER 3 Architectures for Distributed Systems
PROGRAM STUDI TEKNIK INFORMATIKA FAKULTAS ILMU KOMPUTER
EE 122: Peer-to-Peer (P2P) Networks
Providing Secure Storage on the Internet
DHT Routing Geometries and Chord
Distributed Database Management Systems
Presentation transcript:

1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ

2/27 Motivation Advanced applications They must deal with semantically rich data They use a high-level SQL-like query language Applications Epidemiological study Astronomic data sharing Little work on managing data replication in the presence of updates Gnutella and Kaaza: static files (no updates) Freenet: update propagation downward to close connect peers ActiveXML: on demand (web services) P-Grid: rumor spreading (probabilistic guarantees for consistency)

3/27 Motivation Replication in distributed systems Synchronous replication (ROWA) Asynchronous replication Preventive replication Optimistic replication Rumor spreading We propose a new P2P system to address Data replication in the context of advanced applications Query processing in the presence of advanced replication capabilities

4/27 Outline Motivation APPA Architecture Data Replication Query Processing Validation Conclusion Motivation APPA Architecture Data Replication Query Processing Validation Conclusion

5/27 APPA Architecture APPA P2P Network Key-based Storage and Retrieval Peer Linking Peer ID Assignment Peer Communication Advanced Services ReplicationCachingQuery Processing... Internet Basic Services ConsensusP2P Data Management …

6/27 Outline Motivation APPA Architecture Data Replication Query Processing Validation Conclusion

7/27 Data Replication Replication Model Assumptions Frequent and unpredictable network changes Small world Based on lazy multi-master scheme Log-based reconciliation to solve replica divergence Schema management r-lsd: local schema description of relation r r-csd: common schema description of relation r Each peer defines mapping functions between r-lsd and r-csd Data storage Each peer stores tuples using r-lsd and r-csd schemas Updates in one schema are mapped to the other Multi-master groups

8/27 Data Replication Reconciliation Properties Eventual consistency: when all clients stop submission of update actions, all replicas eventually achieve the same values Mergeability: it is possible to schedule any arbitrary collection of log operations respecting constraints Eventual decision: a decision is taken for each submitted action Eventual propagation: actions and constraints known at peer “p” at time “t” are eventually known by an arbitrary peer of the group Safe decisions: peers may not make conflicting decisions

9/27 Data Replication Reconciliation Solutions IceCube (Microsoft Research) Centralized conflict detection and resolution Resolution based on application semantics Non-deterministic resolution APPA Distributed conflict detection and resolution Resolution based on application semantics Deterministic resolution – enables parallelism Considers dynamique connections and disconnections

10/27 Data Replication APPA Distribued Reconciliation Foundation Use a common action log (P2P data) All tentative actions are stored in the action log Action log actions are grouped by time interval (log unit) The resolution (deterministic) is made on demand, comprises a log unit and produces a schedule The schedules are available (P2P data) to all peers Parallelism and distribution - scalability

11/27 Data Replication Distributed Reconciliation

12/27 Data Replication Distributed Reconciliation Log units assure unique vision over unordered actions Log unit life cycle must be managed Decision factor eliminates non-determinism Several peers can reconcile the same log unit concurrently A peer can reuse the reconciliation made by another one A peer can finish the reconciliation started by another one Reconciliation properties are assured Multi-master replication in P2P environment is reached

13/27 Data Replication Service Architecture

14/27 Outline Motivation APPA Architecture Data Replication Query Processing Validation Conclusion

15/27 Query Processing Problem d efinition Consider that Each peer has a local schema to describe their data Peers agree on a Common Schema Description (CSD) Each peer maps its local schema to the CSD Given a user query on a peer schema, the problem is To find the minimum set of peers that should answer the query To execute the query in these peers and return a list of (ranked) answers to the user Assumption A query answer includes data from several multi-master groups (all of them which store relevant data

16/27 Query Processing Proposed Solution

17/27 Query Processing Proposed Solution Query reformulation p:r(A,B,D)  csd:r1(A,B,C), csd:r2(C,D,E) select A,D from r where B=b select A,D from r1,r2 where B=b and r1.C=r2.C Query matching P: set of peers in the P2P system ps(p,r): peer schema of peer “p” involves relation “r” Problem: to find P’  P where each p in P’ has relevant data Result: P’= { p | p  P   r  R ps(p,r) }

18/27 Query Processing Proposed Solution r1r1 r1r1 r 1,s 1 s1s1 t 1,u 1 r 3,s 3 t 1,u 1 v t 2,u 2 r2r2 r 2,s 2 s2s2 s2s2 r 1,s 1 r2r2 s1s1 r 3,s 3 t 2,u 2 P Q = join (r,s,v) Query matching P’ 1 – European data 2 – American data 3 – African data

19/27 Query Processing Proposed Solution Query optimization Consider P’ a set of relevant peers Goal: obtain P’’  P’ such that For any two peers in P’’, their relevant data are not replicated The relevant data of peers in P’’ are equal to that in P’ The cost of query execution by peers in P’’ is minimum Cost function A function of communication, computing power, etc. Phases of optimization Determining relevant replicas for Q’s relations and their peers Determining best peer per replica

20/27 Query Processing Proposed Solution r1r1 r1r1 r 1,s 1 s1s1 t 1,u 1 r 3,s 3 t 1,u 1 v t 2,u 2 r2r2 r 2,s 2 s2s2 s2s2 r 1,s 1 r2r2 s1s1 r 3,s 3 t 2,u 2 P’ Q = join (r,s,v) Query optimization r 3, s 3 r 1, s 1 r2r2 s2s2 v P’’ 1 – European data 2 – American data 3 – African data

21/27 Query Processing Proposed Solution Algorithms Cost parameters t com (r,p): time to send the results of Q concerning to replica r from a peer p to the query originator t resp (r,p): time which p needs to execute the part of Q concerning to replica r and start to send the results to the query originator t djoin (S): time to join the set of replicas S in a distributed way Example rs p1p1 p2p2 p3p t com (r,p 1 ) + t resp (r,p 1 ) = 4 t com (s,p 2 ) + t resp (s,p 2 ) = 7 t com (r,p 2 ) + t resp (r,p 2 ) = 6t com (s,p 3 ) + t resp (s,p 3 ) = 5 t djoin ({r,s}) = 6 Total Cost = = 15Total Cost = = 13

22/27 Query Processing Proposed Solution A none-linear programming model Minimize Complexity

23/27 Query Processing Proposed Solution Algorithms Branch and bound Optimal selection of peers Complexity (worst case): O( ) A heuristic solution While there is an edge in the graph Select the edge with minimum label Set the peer p as selected peer for the replica r Update the label edges of other peers that hold the replica r Remove the replica r and its edges from the graph Complexity: O((m  a) 2 )

24/27 Outline Motivation APPA Architecture Data Replication Query processing Validation Conclusion

25/27 Implementation JXTA Community Applications Sun JXTA Applications JXTA Core JXTA Applications JXTA Services Sun JXTA Services Indexing Discover Search Membership JXTA Shell Peer Commands Peer Groups Peer Advertisements Peer PipesPeer Monitoring Peer IDsSecurity Any Connected Device JXTA Community Services APPA P2P Network Key-based Storage and Retrieval Peer Linking Peer ID Assignment Peer Communication Advanced Services ReplicationCachingQuery Processing... Basic Services ConsensusP2P Data Management GISP

26/27 Simulation JXTA Community Applications Sun JXTA Applications JXTA Core JXTA Applications JXTA Services Sun JXTA Services Indexing Discover Search Membership JXTA Shell Peer Commands Peer Groups Peer Advertisements Peer PipesPeer Monitoring Peer IDsSecurity Any Connected Device JXTA Community Services APPA P2P Network Key-based Storage and Retrieval Peer Linking Peer ID Assignment Peer Communication Advanced Services ReplicationCachingQuery Processing... Basic Services ConsensusP2P Data Management GISP Internet Simulation GT/ITM P2P Simulation Chord Simulator P2P Network Key-based Storage and Retrieval Peer Linking Peer ID Assignment Peer Communication APPA Simulation

27/27 Conclusion Summary Advanced cooperative applications (multi-master replication) A new P2P network-independent data management system A distributed optimistic multi-master replication solution Eventual consistency guarantee A query processing solution based on replication Validation Future work Consider secondary copies Consider replica quality in query optimization Data caching Implementation over other P2P architectures (e.g., flooding)