Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,

Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth, Karl Aberer Agia Napa, Cyprus. 2 November 2005. Presenter: Gleb Skobeltsyn BRICKS This work was (partially) funded by the EU project BRICKS http://www.brickscommunity.org OnTheMove - OTM 2005 Federated Conferences

Efficient processing of XPath queries with structured overlay networks 2 / 22 Contents Motivation & Problem statement 1.P-Grid short overview 2.Indexing strategy –Basic Index –Caching strategy 3.Simulation results Conclusions

Efficient processing of XPath queries with structured overlay networks 3 / 22 Motivation Complex queries are easy to answer in unstructured P2P networks, e.g., edutella. But the approach doesn’t scale because of the high bandwidth consumption. Structured P2P networks typically offer logarithmic search complexity, but require a special index. Indexing structure to support XPath queries over a distributed XML warehouse???

Efficient processing of XPath queries with structured overlay networks 4 / 22 Problem statement (1/2) Problem: To be able to answer structured queries (e.g. XPath) in a XML warehouse distributed in a structured P2P network. We assume using 2 different indices: –For indexing structure (e.g. pure XML path); –For indexing values. In this paper we concentrate on the first issue.

Efficient processing of XPath queries with structured overlay networks 5 / 22 Problem statement (2/2) We support XPath {*,//} queries, i.e. queries containing: –Child axes ( “ / ” ); –Descendant axis ( “ // ” ); –Wildcards ( “ * ” ). Example: //A/B/*/C We propose an indexing structure to answer such queries in a large distributed P2P XML warehouse We try to minimize the consumed bandwidth measured in P2P overlay hops

Efficient processing of XPath queries with structured overlay networks 6 / 22 P-Grid (1/3): introduction P-Grid is a trie based DHT P2P, similar to Chord, Pastry, etc… (more info at http://www.p-grid.org/). In P-Grid each peer is responsible for a set of binary keys which start from the peer’s prefix. Routing is based on longest prefix matching (log search cost for skewed trees): B B 00* 0* 01* 1* 10*11* 1* : E 01* : B 1* : C, D 00* : F 0* : A, F 11* : E 0* : B, F 10* : D query for ‘100’ C C found A A D D P-Grid

Efficient processing of XPath queries with structured overlay networks 7 / 22 P-Grid (2/3): storing indexing information Information is stored in data items. Data item is a {key,data} tuple. Each peer in P-Grid network stores data items whose keys start from the peer’s prefix 1100* 1101* 110* 11000 110010 11011 11000 110010 11011 0* 1* 11* 00*10* 01* 010*011*111* 01001 … 011 … 10100 10 … 1111 … …… 00001 data1 0011 data2

Efficient processing of XPath queries with structured overlay networks 8 / 22 P-Grid (3/3): order preserving hash function Keys are generated using a P-Grid order preserving hash function h( ) : Example: the key h(“comp”) is a prefix for keys: h(“computer”), h(“complexity”), h( “comp*”). Routing to the key h(“comp”) may lead to two cases: h(co)* h(complexity) h(computer) h(corporation) … 1. 1. The peer responsible for h(comp) h(complexity) …… h(compu)*h(compl)* h(computer) … … 2. 2. The sub-tree responsible for h(comp)

Efficient processing of XPath queries with structured overlay networks 9 / 22 Basic Index (1/4): introduction We index XML paths found in the document. Given a path P = l 1 /../l m, m data items are stored in P- Grid, using the following sub-paths (suffixes) as keys: { l 1 /l 2 /.../l m, l 2 /.../l m, …, l m } Each data item stores path and URI. Example: given a path P = “store/book/title”, 3 data items are created: KeyOriginal PathURI h( “store/book/title” ) “store/book/title” Link to the document h( “book/title” ) “store/book/title” Link to the document h( “title” ) “store/book/title” Link to the document Basic index

Efficient processing of XPath queries with structured overlay networks 10 / 22 Basic Index (2/4): search Given a XPath query Q=l 1 s 1 l 2..s k-1 l k, where s i : { /,//,* }. The first longest sequence of labels divided by “ / ” is defined as q B. Example: for “A//C/D//E”: q B =“C/D” The query is answered by routing to the peer responsible for h(q B ). There are 2 cases: –There is one peer responsible for h(q B ) – answer the query, –There is a set (sub-tree) of peers responsible for h(q B ) – a shower broadcast is executed over this set.

Efficient processing of XPath queries with structured overlay networks 11 / 22 Basic index (3/4): shower broadcast 1100* 1101* 1101* 0* 1* 11* 00* 10* 01* 010*011* 111*110* * Shower broadcast – propagates a message (query) among all peers in the sub-tree: –Recursive algorithm, works in parallel fashion; –Each peer in the sub-tree is visited only once.

Efficient processing of XPath queries with structured overlay networks 12 / 22 Basic Index (4/4): properties Basic index is sufficient to answer XPath {*,//} queries. The shower broadcast consumes bandwidth, though efficient in time and distributes the computing. The improvement is to cache the most frequent queries locally and avoid shower broadcasts for them.

Efficient processing of XPath queries with structured overlay networks 13 / 22 Caching strategy (1/4): introduction Types of queries: 1. Queries that can be answered by one peer locally. Example: “A/B/C//E” at the peer responsible for h(“A/B”). 2. Queries that require additional broadcast and contain only one sub-path ( q=q B ). Example: “A” at the peer responsible for h(“A/B”). 3. Queries that require additional broadcast and contain more than one sub-path ( q≠q B ). Example: “A//C//E” at the peer responsible for h(“A/C”). 3We suggest caching the most popular queries of the type 3 to reduce the number of shower broadcasts. Caching strategy

Efficient processing of XPath queries with structured overlay networks 14 / 22 P=P= The key used for routing is no longer h(q B ), but q C =concat(P l 1, P l 2 …P l k ), where q B =P l 1 Example: ACE// D/ Caching strategy (2/4): search ACED The query is routed to a relevant peer which may (or may not) answer the query form cache. If the query is of the type 3 and cannot be answered locally, its result can be cached. Similarly, the existing cache can be deleted. qC=qC=

Efficient processing of XPath queries with structured overlay networks 15 / 22 … A//C/D//E … Caching strategy (3/4): example

Efficient processing of XPath queries with structured overlay networks 16 / 22 Caching strategy (4/4): analysis A query is profitable to cache if: UpdateCost * UpdateRate ( subtree ) <SearchCost ( subtree )* SearchRate ( query ) Where: –UpdateCost – the cost of one cache update ( log N ) –UpdateRate – average update rate in the sub-tree –SearchCost – the cost of search (routing+broadcast) –SearchRate – the query’s frequency (estimated locally) The indexing strategy is adaptive to search/update ratio and tries to keep the messaging costs optimal. gathered from neighbours

Efficient processing of XPath queries with structured overlay networks 17 / 22 Simulations (1/4): testbed Java application, stores data locally in a DBMS. 50 XML documents, >5k unique paths ~20k data items In each experiment we used 10k queries randomly generated from the paths Simulations

Efficient processing of XPath queries with structured overlay networks 18 / 22 Simulations (2/4): search cost Parameter t – fraction of “cachable” queries All “cachable” queries are cached

Efficient processing of XPath queries with structured overlay networks 19 / 22 Simulations (3/4): search cost 1000 peers t=0.5 (50% of queries can be cached)

Efficient processing of XPath queries with structured overlay networks 20 / 22 Simulations (4/4): average costs 1000 peers, t=0.5, Zipf s=1.2. For a given search/update ratio there is an optimal point

Efficient processing of XPath queries with structured overlay networks 21 / 22 Conclusions The efficient solution for indexing XML structure in structured overlay networks is proposed. The presented solution can be used in a P2P XML querying engine for answering structural (sub) queries.

Efficient processing of XPath queries with structured overlay networks 22 / 22 Last slide Thank you for your attention! Questions?

Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,

Similar presentations

Presentation on theme: "Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,

Similar presentations

Presentation on theme: "Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,"— Presentation transcript:

Similar presentations

About project

Feedback