High-Performance XML Filtering with YFilter Publish-Suscribe-System to filter XML-Streams used in SDI-Systems
Problem of XFilter filtering large numbers of query specifications (separate Finite State Machine for each query) YFilter uses an Nondeterministic Finite Automaton (NFA) - exploit commonality among path queries - combine all queries in a single machine
Constructing a combined NFA Four basic locations steps in XPath are „/a“ „//a“ „/*“ „//*“ Construct NFA fragments for these steps and combine them
NFA fragments of the four basic locations steps /* : //* : * a * * *
* * * * * * * Combining NFA Fragments a a a b b b b a a b b b b Important Note: new queries can be easily be added to an existing system
Q1=/a/b Q2=/a/c Q3=/a/b/c Q4=/a//b/c Q5=/a/*/c Q6=/a//c Q7=/a/*/*/c NFA Example Q1=/a/b Q2=/a/c Q3=/a/b/c Q4=/a//b/c Q5=/a/*/c Q6=/a//c Q7=/a/*/*/c Q8=/a/b/c
Some Comments on Efficiency reduction in machine size path-sharing easy adding of new queries
Implementation Creating a data structure for each state with ID of the state type information hash table for transitions [symbol | ID] for accepting states, ID list of queries
NFA Implementation
Execution -start of Document -start of Element -end of Element Execution in an event-driven fashion Run-time stack for backtracking -start of Document -start of Element -end of Element Important Note: NFA execution until all potential accepting states have been reached
- value-based predicates - nested-path Methods: Predicate Evalution - value-based predicates - nested-path Methods: Inline Selection Postponed (SP)
Performance Results YFilter faster than XFilter and the hybrid approach Cost and machine size affords are small For value-based predicates, the SP approach was found to perform much better than the Inline approach
Performance Test Multi-query processing time (MQPT) in ms Number of Queries (x1000)
Conclusions YFilter provides high-performance XML Filtering Dominant costs for document parsing and result collection