Presentation is loading. Please wait.

Presentation is loading. Please wait.

PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed.

Similar presentations


Presentation on theme: "PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed."— Presentation transcript:

1 PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed Databases and Transaction Processing Professor: Iluju Kiringa Presented by Zhihong Li Rasha Tawhid Winter 2004

2 2 Outline Introduction What is PeerDB ? Architecture of PeerDB Features of PeerDB A Performance Study Related works Conclusion and future work

3 3 Introduction PeerDB and Related Concepts. p2p vs distribute database system vs BestPeer vs PeerDB. Architecture of PeerDB and Features of PeerDB. ● Sharing Data Without Shared Schema. Information retrieve system and matching strategy. ● Agent Assisted Query Processing. Mobile agent technology. ● Monitoring Statistics for Reconfiguration. ● Cache Management. A Performance Study shows the effectivity of PeerDB.

4 4 What is PeerDB ? A P2P-based system for distributed data sharing. A database application is implemented on top of BestPeer. ? P2P system ? Distributed database system ? BestPeer

5 5 Concepts review: peer-to-peer (P2P) systems A large number of nodes are pooled together to share their resources (provide and consume data or service). These nodes can join and leave the P2P network at any time. Limitation: ● provide only file level sharing (support coarse granularity). ● No easy way to extend their application quickly to fulfill new users needs. ● A node’s peers are statically defined.

6 6 Concepts review: Distributed Database systems The nodes are added to and removed from the network in a controlled manner. Data can be shared with a shared schema. They provide the complete set of answers that satisfy a query. The exact location to direct the query is known.

7 7 The Features of BestPeer Systems An adaptive platform for P2P applications. P2P applications can be developed easily and efficiently on BestPeer. Integrates two technologies: Mobile agents and P2P. Facilitates a finer granularity of data sharing. Also share computational power. A node’s neighbors in the network can be dynamically reconfigured by itself. Introduces a Location Independent Global Names Lookup Server (LIGLO) to provide each node with a unique global identity.

8 8 The Features of PeerDB The node is a data management system. Supports finer granularity of data sharing. Data can be shared without shared global schema. It combines the power of mobile agents into P2P systems to perform operations at peers’ sites. The node in the network can dynamically reconfigure it’s neighbors by itself.

9 9 Architecture of PeerDB Four components ● Data management system : DBMS, Local Dictionary, Export Dictionary ● DBAgent: Mobile agents: master agent and worker agent ● Cache manager: Caching remote data in secondary storage. ● User interface: Users search for data using SQL-like queries.

10 10 Features of PeerDB: Sharing Data Without Shared Schema Objective ● Users manage their (private and sharable) data using DBMS. ● Users share the interested sharable data without sharing schema. Problem There is no predetermined and uniform schema that nodes share. In naming a relation : Different users name “protein” relation by protein name (e.g., Kinases, annexin) or after the species (e.g, human,zebrafish) Also similar at attribute level : Some users call the length of sequences as “length”, others might use the term “len”. Solution Adopting Information Retrieval (IR) based approach.

11 11 Information Retrieval (IR) Based Approach Create Meta-data for each relation ● The meta-data (schema, keywords, etc) should be provided by the users upon creation of the table. ● Also the meta-data should be maintained for each relation name and attributes. ● The relevant data might have same keywords. Locate matching relations ● Apply Relation-matching Strategy to determine relevant relations. ● The relations and meta-data returned to the user first, who then decide which relations will be queried further.

12 12 Relation-matching Strategy: ● Given a query Q (R, A, C) R : relations A: attributes C: conditions ● Also given a relation D with attributes T ● The set of relations that potentially contain answers to Q are those that have Match(Q,D) above a certain threshold value. ● wtr : relation weight; wta : attribute weight. ● r : 1, relations match; r : 0, otherwise. ● N match (A ∪ C, T) : the number of matching keywords between attributes. ● N (A ∪ C) : the number of distinct keywords for attributes in Q.

13 13 Illustrate the Strategy with an Example Suppose we have peers P1, P2, P3 and P4; A query Q is from P1; SELECT SeqId,ProteinSeq FROM Kinases WHERE length > 30 ; Apply the matching strategy: ● P2, P3 and P4 all match query Q. ● P4 will be ranked lower than P2 and P3 ● Semantically, P2’s data are not interested by P1. Need user to select. ● Return multiple relations from P3. such as:

14 14 Features of PeerDB: Agent Assisted Query Processing Two-Phase query processing strategy. Phase I: ● Locate potential relations using relation matching strategy. ● User selects more relevant relations. Benefit: Minimize information overload. Better utilize the network bandwidth. Phase II: ● Begins after the user has selected the desired relations. ● Directs the query to the nodes containing the desired relations. ● Answers are finally returned. Mobile Agents perform operations at peers’ sites.

15 15 Query Processing on PeerDB nodes DBAgent component responses for the Query Processing. ● Local query: A query is local to a node if it is initiated there; ● Remote query: Otherwise. Query Processing is completely assisted by mobile agents. ● Master agent : When a query is issued, a master agent is created on the user node to oversee the evaluation of the query. The master agent will clone worker agents (Relation matching agent or Information Retrieval Agent ) and dispatch them to all neighbors of the node. ● Worker agent: Worker agent works on neighbor nodes and return results to Master agent.

16 16 Processing Local Query Phase I User Interface Object Management System (DBMS) DBAgen t Neighbor PeerDB nodes Export Dictionary Local Dictionary PeerDB node Query 1.User query is sent to dbagent 2.A master agent (MA) is created 3.MA extracts the Q(R,A,C) list 4.2 MA CloneS relation matching agents (RMAs) 4.1 Match(Q,D) is applied to local dictionary 4.11 Matching relations 4.21 RMAs is dispatched to all neighbors, carries with (a) IP address of the query node (b) TTL (Time-to-live) indicates lifetime of an agent. 4.22 relevant relations and meta-data returned by RMAs 4.23 Answers returned to user Cache Manager DBAgen t

17 17 Processing Local Query Phase II User Interface Object Management System (DBMS) DBAgen t Export Dictionary Local Dictionary PeerDB node 1. User selects relevant relations semantically. 2. Send selected relations to MA 3. MA Clones a data retrieval agent (DRA) for each selected relation 4. DRA reformulates the query for a selected relation 4.1 DRA retrieves data from local DBMS if the selected relation is local 5. DMA is dispatched to relevant nodes, carries with (a) IP address of the query node 6. data returned 7. data returned to user Cache Manager Neighbor PeerDB nodes DBAgen t 4.2 data returned to Agent 4.3 formulated data returned to user

18 18 Processing Remote Query Phase I: Relation Matching Agent User Interface Object Management System (DBMS) DBAgen t PeerDB Node Query Export Dictionary Local Dictionary PeerDB node 1.Relatin matching agent (RMA) come from query node TTL - 1,first time visit Cache Manager 3. Matched relations returned DBAgen t 2. RMA searches the export dictionary 4.Matched relations returned to query node Neighbor PeerDB nodes DBAgen t 5. If TTL >0 RMA clones more RMAs and dispatches them to the current node’s neighbors Otherwise, RMA is dropped

19 19 Processing Remote Query Phase II: Data Retrieval Agent User Interface Object Management System (DBMS) DBAgen t PeerDB Node Query Export Dictionary Local Dictionary PeerDB node 1.Data Retrieval agent (DRA) comes from query node Cache Manager 3. Answers are retrieved, processed DBAgen t 2. DRA formulates an SQL query, submits it to DBMS 4. Answers are returned to query node 5. DRA is dropped

20 20 Features of PeerDB: Monitoring Statistics for Reconfigration Performed by Master agent on the query node, for reconfiguring the network. Monitors two types of statistics: ● Relation information (schemas, keywords) obtained from Relation Matching Agent, for exchanging the key words of selected relations. ● The number of answer objects obtained from Data Retrieval Agent, for determining which nodes are to be connected directly. Reconfiguration policy: ● Favorite nodes are that have most recently provide answers. ● Use the notion of stack distance to measure the temporal locality. ● The top K peers in the stack are retained as the K directly connected peers. … K P4 P3 P2 Pk …

21 21 Features of PeerDB: Cache Management Caching answers returned from remote nodes by Cache Manager component. Reducing the response time for subsequent answers. However, caching raises complicated issues: ● Problem: The cached copy may be outdated. Solution: keeps the answers for a fixed period of time. ● Problem: Caching storage space is limited. Solution: Least Recently Used data is replaced when space runs out. ● Problem: PeerDB nodes may be caching the same data. Solution: All relations, except one, with the same keywords from the same source node will be pruned away during phase I of query processing.

22 22 A Performance Study The experimental environment: ● 32 PCs with Intel Pentium 200MHz processor and 64M of RAM. ● all the PCs are running on WinNT4.0 operating system. ● The physical network layout is shown in the Figure.

23 23 Studies relation matching strategy. ● The lift time of Worker agent is 1. Looks at the performance of PeerDB. ● Effect of Storage Capacity on Caching ● PeerDB vs CS (Client and Server System) ● Benefits of Agent-based Querying Remark: The extensive experimental studies show that PeerDB is a promising system for distributed processing. A Performance Study ( cont’d )

24 24 Conclusion and future work A P2P-based distributed data sharing system called PeerDB. PeerDB has several nice features. ● Employs a data management system and shares data without shared schema. ● Query processing is assisted by mobile agents. ● Reconfigures a node’s peers dynamically by itself. ● Cache management for efficiency. Experimental studies show that PeerDB is a promising system for distributed processing. Extending in two directions in the future: ● Making a node more intelligent by adopting code-shipping or data shipping technology. ● Looking for “similar” schema by integrating keyword-based search in PeerDB.

25 25 References [1] W. Ng, B. Ooi, K. Tan and A. Zhou. PeerDB: A P2P-based System for Distributed Data Sharing. The 19th International Conference on Data Engineering 2003. (ICDE 2003). [2] N. Karnik. Security in Mobile Agent System. http://www.cs.umn.edu/Ajanta/defense/ [3] C. Rijsbergen. Information Retrieval. London: Butterworths, 1979. http://www.dcs.gla.ac.uk/Keith/Preface.html

26 26 Thanks Welcome Questions?


Download ppt "PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed."

Similar presentations


Ads by Google