Presentation is loading. Please wait.

Presentation is loading. Please wait.

Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.

Similar presentations


Presentation on theme: "Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented."— Presentation transcript:

1 Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented by Lee Jae-won (SNU)

2 Copyright  2006 by CEBT Introduction  Publish-subscribe applications (pub-sub) are an important class of asynchronous content-based dissemination systems Notification websites – Users can subscribe for events of interest and get automatic notification when relevant events arrive in the systems  Events are announced with a message generated outside of the system by third party applications referred as publishers These messages are then selectively delivered to interested subscribers that have announced their interest by submitting profiles IDS Lab. Seminar - 2Center for E-Business Technology

3 Copyright  2006 by CEBT Introduction  Architecture of a pub-sub system Matching process – Finding (filtering) which messages satisfy which profile subscriptions IDS Lab. Seminar - 3Center for E-Business Technology

4 Copyright  2006 by CEBT Introduction  Three predominant strategies to design the matching process Standard relational approach – Translating profiles and messages to the Relational Model – The matching can be expressed as join operations Index techniques – Aggregating the profiles using some indexing techniques – The matching reads the input message and traverse the index in order to select the profiles satisfied Finite State Machine (FSM) – Top-down approach – This paper proposes bottom-up approach XML documents has its more selective elements located at its leaves IDS Lab. Seminar - 4Center for E-Business Technology

5 Copyright  2006 by CEBT Bottom-up Filtering FSM (BUFF)  By changing the order in which document is evaluated We also need to change the order in which the query is evaluated For query 1 – NFA executes transitions to state 1, 2 and 3 eleven times before achieving the final state 4 – BUFF executes transitions to state 1, 2 and 3 only once, then achieves the final state 4 IDS Lab. Seminar - 5Center for E-Business Technology

6 Copyright  2006 by CEBT BUFF – Automaton Matching Process  Algorithm Keeping RS (runtime stack) that store the current document path being processed For each opening tag of XML document, the respective element e is pushed to RS For each closing tag, an element e is poped from RS IDS Lab. Seminar - 6Center for E-Business Technology

7 Copyright  2006 by CEBT BUFF – Automaton Matching Process  Example When a final state (in this case, 4) is reached, the algorithm is ended – Document & query (BUFF) are matched IDS Lab. Seminar - 7Center for E-Business Technology

8 Copyright  2006 by CEBT Bounding-based XML Filtering (BOXFILTER)  BoxFilter XML Filtering approach which is based on index-based Filtering technique  BoxFilter Core Module IDS Lab. Seminar - 8Center for E-Business Technology

9 Copyright  2006 by CEBT Prufer Sequence – Background  Example Initially, vertex 1 is the leaf with the smallest label – So it is removed first and “4” is put in the Prufer sequence Vertex 2 and 3 are removed next, so “4” is added twice more Vertex 4 is now a leaf and has the smallest label – So it is removed – We append “5” to the sequence – We are left with only two vertices (n-2) The tree’s sequence is 4445 IDS Lab. Seminar - 9Center for E-Business Technology

10 Copyright  2006 by CEBT BoxFilter  Sequence Envelope BoxFilter index tree is based on the concept of a Sequence Envelope Assume that all sequence of profiles have the same length l Consider a set of k prufer sequence of profiles, S1 … Sk We derive tow new sequences (with length l ) called the upper and the lower bounde, or U and L repectively – L i = min (S1 i … Sk i ) – U i = max (S1 i … Sk i ) Example – L = ABCABABABAB – U = DEDEEEDEDED – ∀ i Li ≤ S1 i … Sk i ≤ Ui SE ≡ (L, U) IDS Lab. Seminar - 10Center for E-Business Technology

11 Copyright  2006 by CEBT BoxFilter  Sequence Envelope IDS Lab. Seminar - 11Center for E-Business Technology

12 Copyright  2006 by CEBT Filtering Algorithms  We assume that there are 8 profiles submitted to the system These are represented XML format  The profiles are transformed into Prufer sequences Then these are inserted into the BoXFilter tree IDS Lab. Seminar - 12Center for E-Business Technology

13 Copyright  2006 by CEBT Filtering Algorithms  For each index node, the figure shows the sequence envelope that contains all envelopes from its children Input Document D = ABCFABABABABF IDS Lab. Seminar - 13Center for E-Business Technology

14 Copyright  2006 by CEBT Filtering Algorithm  Sequential Processing Input Document D = ABCFABABABABF We start by comparing the tree root in Document D to the query root sequence envelope Since there is a match we examine the root’s three children nodes We have a subsequence matching only with the first child (node 2) – So we ignore the subtrees at the seconde and the third children nodes We examine the children of node 2, which are now leaf nodes We do subsequence matching between the document and each of the strings in these leaf nodes There is a match between the document and leaf number 1  Batch Processing Matching process is performed by joining the BoXFitler tree and the document tree (form of BoXFilter) IDS Lab. Seminar - 14Center for E-Business Technology

15 Copyright  2006 by CEBT Experiments  Setup Dataset with 1000, 10000 and 100000 small documents (up to 8KB) Queries were specified for those datasets considering paths with 3 to 10 elements  Varying the Number of Documents IDS Lab. Seminar - 15Center for E-Business Technology

16 Copyright  2006 by CEBT Experiments  Varying the Number of Queries NFA and BUFF are linear to the number of documents and queries evaluated BoXFilter has a constant performance for a relatively small number of queries IDS Lab. Seminar - 16Center for E-Business Technology

17 Copyright  2006 by CEBT Experiments  Varying the Selectivity Selectivity means how many document satisfy any of the queries IDS Lab. Seminar - 17Center for E-Business Technology

18 Copyright  2006 by CEBT Experiments  Batching BoXFilter B-BoXFilter is advantageous when the time spent for the extra step (tree creation) is less than the benefit in the matching time IDS Lab. Seminar - 18Center for E-Business Technology

19 Copyright  2006 by CEBT Conclusion  We proposed a FSM-based approach (BUFF) It evaluates the documents in a bottom-up order  We introduced the idea of early profile pruning and proposed a sequence-based index (BoXFilter) It allows to prune out queries very efficiently First, documents and queries are transformed into sequences and grouped into envelops Then, the queries and can be pruned out by evaluating the lower and upper bounds of their envelopes – This is the first time that a concept of envelopes in employed for XML query processing IDS Lab. Seminar - 19Center for E-Business Technology


Download ppt "Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented."

Similar presentations


Ads by Google