Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Query Processing and Catalogs for Peer-to-Peer Systems Professor: Iluju Kiringa Student: Fan Yang, Libin Cai.

Similar presentations


Presentation on theme: "Distributed Query Processing and Catalogs for Peer-to-Peer Systems Professor: Iluju Kiringa Student: Fan Yang, Libin Cai."— Presentation transcript:

1 Distributed Query Processing and Catalogs for Peer-to-Peer Systems Professor: Iluju Kiringa Student: Fan Yang, Libin Cai

2 Agenda About P2P Mutant Query Plan Distributed Catalog Intentional Statements Security and Privacy Conclusions

3 About P2P Advantages: –Ease of deployment –Ease of use –Fault tolerance –Scalability Limitations: –Weak query capabilities –No infrastructure for distributed queries –Limitations in index scalability and result quality

4 A query example FOR $r in document(‘‘film_reviews’’)//review, $g in document(‘‘preferences’’)//genre, $s in document(‘‘film_showings’’) / showing[date = ‘‘15 March 2002’’] WHERE $r/genre = $g AND $r/title = $s/title RETURN { $r/title } { $r/rating } { $s/theater } User Bob wants to see a movie tonight. Bob visits his favorite portal, BobsPortal.com. Bob uses GUI front-end to come up with an XML query: Three XML documents: film reviews, preferences, and film showings. [2]

5 A query example (cont’) The logical query plan Three elements: Regular query operators: select, join Pseudo-operator: document, display References to XML fragments Query processing: logical query plan physical query plan query processing executed algorithm [2]

6 Advent of Mutant Query Plan Why is MQP?  can cope with incomplete metadata  can decentralize query optimization and execution  Respect the autonomy and the local policies of sites  Adapt to server and network conditions even while being evaluated What is MQP? –An algebraic query plan graph, encoded in XML References to resource locations (URLs) References to abstract resource names (URNs) Verbatim XML fragments –Each MQP is tagged with a target once the MQP is fully evaluated.

7 Mutant Query Processing [1]

8 Mutant Query Plan Example Garage Sale example: Query: CDs for $10 or less in the Portland area. MQP: Regular query operators: select, join Pseudo-operator: display Constant piece of XML URNs [1]

9 Mutant Query Plan Example (cont’) (a) Resolution and rewriting (b) reduction [1]

10 Comparisons between Pipelined plan and Mutant plan (a) Pipelined plan (b) mutant plan [2]

11 Distributed Catalogs Question: ?how do peers find out resources available in other peers?  Build distributed catalogs to efficiently route queries Procedures: –Peers use multi-hierarchic namespaces to categorize data; –Data providers use multi-hierarchic namespaces to describe data they serve; –Data consumers use them to formulate queries.

12 Multi-hierarchic Namespaces Multi-hierarchic namespace: The set of categorization hierarchies relevant to an applications domain. [1] Interest area: Second-hand armchairs in the Portland area: [USA/OR/Portland, Furniture/Chairs] A multi-hierarchic namespaces with two categorization dimensions and two highlighted interest areas: (a) Vancouver- Portland furniture, (b) items in Portland [1]

13 Peer Roles

14 Resource Resolution Authoritative Server –Strives to know about all base servers within its interest area. –Through an authoritative index or meta-index server, the known base servers in a particular interest area can be found out. Resource Resolution 1.Seeks authoritative index or meta-index server 2.Recursively follows the index references 3.Finds all the relevant base servers and data items 4.Resolves URN

15 Example of Resource Resolution Urn: ForSale: Portland-CDs urls: Interest area: [USA/OR/Portland, Music/CDs] Authoritative meta-index server A :[USA, *] Index Server B: [USA, Music] Index Server C: [USA/OR, Music] Index Server G: replace URN with URLs Query plan A B C … G

16 Intentional Statements Purposes: –How can index and meta-index servers convey the relationships between the data they cover? –How can mutant queries use this information to make intelligent choices about completeness, currency and latency tradeoffs? Intentional Statements: –used to describe relationships between index and meta-index servers, can be expressed using coordination formulas. Server R replicates everything from server S for the Portland category of the Location hierarchy Only Oregon sporting goods information that R holds is for Portland and Eugene golf clubs at S R index several base servers base[Portland, = base[Portland, base[Oregon, Sporting = base[Portland, Golf  base[Eugene, Golf Index[Oregon, Golf = base[Oregon, Golf  Base[base[Oregon, Golf  base[base[Oregon, Golf

17 Utilizing Intentional Statements (cont’) Processes: –Whenever a server registers an interest area with the meta- index server, it provides intentional statements –Servers can then use such information in binding and routing MQPs. Assumptions: Meta-index server M knows about servers R and S Interest areas: R [Portland, Recreation] S [Oregon, Sporting Goods] M receives an MQP that contains the resource name [Portland, Golf Clubs] Then the name could be bound to: base[Portland, Golf  base[Portland, Golf If M knows the intentional statement, base[Portland, Sporting = base[Portland, Sporting then it could bind to: base[Portland, Golf | base[Portland, Golf Conclusion: the MQP could be routed to either R or S, but it need not go to both.

18 Utilizing Intentional Statements (cont’)  For queries run not instantly: Suppose: Server R replicates everything for Portland at S, also possibly keeps additional data about Portland, can be up to 30 minutes out of date R polls every 30 minutes to update the data it replicates from S. Intentional Statement: base[Portland, ≥ base[Portland, A binding for resource [Portland, CDs] might then be: base[Portland, | (base[Portland,  base[Portland, Explanations: One can get an answer quickly by just routing the MQP to R, but that answer could be up to 30 minutes out of date. By routing the MQP to both R and S, one can have a complete and current answer.  Conclusions: –Impossible to guarantee queries run instantly –Compromises on latency, completeness and currency. –Replication can’t be both scalable and instantaneous.

19 What else could be in MQPs Accumulating catalog and statistics information Maintaining provenance –Rewards system –Meta-index updating –Detection of spoofing

20 Security and Privacy Issues: –With MQPs, the partial results is possibly divulged to other undesirable servers Solutions: –MQPs need to incorporate ordering and transfer policies –Encrypts data or data elements with the public key –MQPs can allow to obtain answers under given server security policies

21 Conclusions Enable peers to independently optimize and partially evaluate queries without global knowledge, and with a minimum of coordination overhead.

22 References [1] Vassilis Papadimos, David Maier and Kristin Tufte. Distributed Query Processing and Catalogs for Peer-to-Peer Systems. OGI School of Science Engineering. Oregon Health Science University. [2] V. Papadimos and D. Maier. Distributed Queries without Distributed State. In Proc. of WebDB 2002, pages

23 Thanks! Questions?...


Download ppt "Distributed Query Processing and Catalogs for Peer-to-Peer Systems Professor: Iluju Kiringa Student: Fan Yang, Libin Cai."

Similar presentations


Ads by Google