Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Query Processing and Catalogs for Peer-to-Peer Systems Παρουσίαση: Πάνος Σκυβαλίδας, Κώστας Σταμκόπουλος, Κώστας Στεφανίδης Authors: V. Papadimos,

Similar presentations

Presentation on theme: "Distributed Query Processing and Catalogs for Peer-to-Peer Systems Παρουσίαση: Πάνος Σκυβαλίδας, Κώστας Σταμκόπουλος, Κώστας Στεφανίδης Authors: V. Papadimos,"— Presentation transcript:

1 Distributed Query Processing and Catalogs for Peer-to-Peer Systems Παρουσίαση: Πάνος Σκυβαλίδας, Κώστας Σταμκόπουλος, Κώστας Στεφανίδης Authors: V. Papadimos, D. Maier, K. Tufte Proceedings of the 2003 CIDR Conference

2 2 Overview P2P systems offer limited querying functionality –i.e., simple selection on a predefined set of index attributes These limitations are acceptable for file-sharing applications –there are several ways to encode metadata about a file in the filename In general P2P applications is required a richer query model –content publishers export structured or semi-structured views of their data (for example using XML) –users query them using a full-featured query language

3 3 Overview Here, is presented a peer-to-peer architecture for distributed querying –content providers have specific affinities for storing, replicating, or indexing different subsets of a global data namespace Peers express their preferences for the data they are serving or looking for using a name space of multiple hierarchical categories Queries are routed efficiently and can make intelligent choices about query latency, data completeness and currency tradeoffs

4 4 Motivating Example The running example which is used is a distributed garage sale People sell and buy things without middlemen, or predetermined seller/buyer (server/client) roles Each for sale item has associated information about: item name, seller location, description, condition, images, quantity, price, etc. Each seller exports these data in XML A seller can run his own server to publish items for sale, or can post them to a server run by a consignment shop

5 5 Mutant Query Plan (MQP) A mutant query plan is an algebraic query plan graph, encoded in XML, that includes: –verbatim XML encoded data –references to resource locations (URLs) –references to abstract resource names (URNs) Each MQP is tagged with a target –a network address (IP) to send the result to MQP Processing –Starts as regular query operator tree at the client Passes around from server-to-server accumulating partial results When the query is fully evaluated into a constant piece of XML is returned to the client

6 6 MQP Processing MQP arrives at a server encoded in XML The server parses the plan and determines the URNs than can be resolved Using the catalog we take the corresponding URLs The optimizer finds the locally evaluable sub-plans, optimizes them and estimates their costs Policy manager decides which sub-plans to evaluate locally and the new plan is propagated to the next server

7 7 MQP Example Garage Sale example Query: CDs for $10 or less in the Portland area MQP: Regular query operators: select, join Pseudo-operator: display Constant piece of XML URNs

8 8 MQP Example (a) Resolution and rewriting (b) reduction

9 9 Pipelined plan and Mutant plan (a) Pipelined plan (b) mutant plan

10 10 Distributed Catalogs P2P network maintains distributed catalogs that can efficiently route queries to peers with relevant data Peers use multi-hierarchic namespaces to categorize data –data providers use multi-hierarchic namespaces to describe the kind data they serve –data consumers use them to formulate queries Peers can play different roles in the system

11 11 Multi-Hierarchic Namespaces Categorization hierarchies: categories are specified at different levels – USA/OR/Portland (all items located in Portland) is a city-level category Each item belongs to one category called its most- specific category, and to all of its parents. –every item in the USA/OR/Portland category also belongs in the USA/OR and USA categories. Multi-hierarchic namespace: a set of categorization hierarchies relevant to an application domain –each hierarchy is called dimension –each dimension has a top category (*)

12 12 Example Multi-hierarchic namespaces with two categorization dimensions and two interested areas a)Vancouver-Portland furniture ([USA/OR/Vancouver, Furniture] AND [USA/OR/Portland, Furniture]) b)Items in Portland ([USA/OR/Portland, *])

13 13 Multi-Hierarchic Namespaces Data providers use interest areas to describe the kind of data they serve Data consumers also use interest areas to form queries Example: look for second-hand armchairs in the Portland area Interest area: [USA/OR/Portland, Furniture/Chairs] –contact with servers whose interest areas overlap with ours to find out about all pertinent items.

14 14 Peer Roles Base servers: maintain data within an interest area Index servers: keep track of base and index servers with interest areas overlapping their own and can also maintain indices on data attributes not used for categorization Meta-index servers: maintain mappings from interest areas to servers with relevant data (do not index non- hierarchic attributes, typically cover larger areas than index servers) Category servers: answer queries about the dimensions themselves Clients: peers that are interested in the query’s results

15 15 Index and Meta-Index Servers The main difference between an index and a meta-index server is the amount of information that is stored: –the richer extra indices are better for routing a query –extra indices need to be updated when their data base change limiting their scalability Peers cache index and meta-index servers they used in the past

16 16 Authoritative Servers Strive to know about all base servers within their area of interest –Through an authoritative index or meta-index server the known base servers in a particular interest area can be found out Base servers joining the network register with index or meta-index servers that intersect with their interest area Alternative idea: –group of servers chooses to stay authoritative for an area, guaranteeing that the union of their answers includes paths to the relevant base servers

17 17 Resource Resolution To form queries we encode interest areas into a namespace-specific string –e.g.”urn:InterestArea:(USA.OR.Portland,Furniture)+(USA.WA.Va ncouver,Furniture)” A server tries to resolve a urn –seeks an authoritative index or meta-index server that covers it –recursively follows the index references until it finds the relevant base servers

18 18 Resource resolution Try to resolve the urn that have the interest area of: [USA/OR/Portland, Music/CDs] The client knows an authoritative meta-index server for: [USA, *] –sends the query plan at [USA,*] –this server forwards the query plan to a server for [USA, Music] –the last one forwards it to a server that knows about [USA/OR, Music] and so on –until we reach an index server that will replace the urn with a combination of urls

19 19 Category Servers Category servers: maintain data about the categorization hierarchies Categorization hierarchies can be administered independently of each other System uses categories for index construction and query formulation so must be: –Stable (countries and state names will change less frequently than zip codes or road names) –Consistent (rewrite USA/OR/Portland into USA/OR, with a possible loss of precision, but no loss of recall)

20 20 END

Download ppt "Distributed Query Processing and Catalogs for Peer-to-Peer Systems Παρουσίαση: Πάνος Σκυβαλίδας, Κώστας Σταμκόπουλος, Κώστας Στεφανίδης Authors: V. Papadimos,"

Similar presentations

Ads by Google