Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2005Daria Barger – DB Seminar 1 Efficient Incremental Validation of XML Documents Denilson Barbosa Alberto O.Mendelson Leonid Libkin Laurent Mignet.

Similar presentations


Presentation on theme: "Spring 2005Daria Barger – DB Seminar 1 Efficient Incremental Validation of XML Documents Denilson Barbosa Alberto O.Mendelson Leonid Libkin Laurent Mignet."— Presentation transcript:

1 Spring 2005Daria Barger – DB Seminar 1 Efficient Incremental Validation of XML Documents Denilson Barbosa Alberto O.Mendelson Leonid Libkin Laurent Mignet Marcelo Arenas Presented by Daria Barger

2 Spring 2005Daria Barger – DB Seminar 2 Outline  Introduction  Types of constraints  Update operations  Incremental validation  Experiments  Conclusions  Future work

3 Spring 2005Daria Barger – DB Seminar 3 Introduction  The problems of storing and querying XML documents have attracted a great deal of interest.  Other aspects of XML data management, however, have not yet been satisfactorily explored.  Among them is the problem of checking that documents are valid with respect to their specifications, and that they remain valid after updates.

4 Spring 2005Daria Barger – DB Seminar 4 DTD  One popular form of XML document specification is the Document Type Definition (DTD).  A DTD D is a grammar that defines a set of documents L(D).  Each document in L(D) is said to be valid with respect to D.

5 Spring 2005Daria Barger – DB Seminar 5 The Validation Problem The validation problem is: Given a DTD D and an XML document X, is it the case that X  L(D) ? The incremental validation problem is: Let U be some update operation. Given X  L(D), is it the case that U(X)  L(D)?

6 Spring 2005Daria Barger – DB Seminar 6 Validation of structural constraints Elements are declared in DTD by rules of the form: Content Model: Element- valid iff the string formed by concatenating its children elements belongs to L(E), the language denoted by E. Content Model: #PCDATA – validation can be done trivially

7 Spring 2005Daria Barger – DB Seminar 7 Validation of attributes Attributes validation is trivial, except for ID and IDREF attribute types. Valid XML document should hold:  Values of all ID attributes are unique  Value of each IDREF attribute must be equal to the value of some ID attribute

8 Spring 2005Daria Barger – DB Seminar 8 1-unambiguous regular expressions The specification of XML DTDs restricts the regular expression used for defining element content to be 1- unambiguous (deterministic). Marking: Position – subscripted symbol in E`. For given position x, Χ (x) denotes a corresponding (unmarked) symbol in Σ. For example: pos(E’) = {a,b 1,b 2,c} Χ (b 1 ) =b

9 Spring 2005Daria Barger – DB Seminar 9 1-unambiguous regular expressions A regular expression E is 1- unambiguous if and only if for all words u,v,w over the subscripted alphabet pos(E) and all x,y in pos(E), the conditions uxv, uyw  L(E`) and x≠y imply Χ(x) ≠ Χ(y) Which regular expression is deterministic? –(ab)|(ac) –a(b|c) –a(a+b)*ac

10 Spring 2005Daria Barger – DB Seminar 10 The Glushkov automaton for Regular Expressions set of positions that appear as the first symbol of some word in L(E’) set of positions that appear as the last symbol of some word in L(E’) set of positions that appear immediately after position x in some word in L(E’)

11 Spring 2005Daria Barger – DB Seminar 11 Update operations  Append(p,y) - insert element y as the last child of element p. A AA AA AAA AA p AA AA A y Append

12 Spring 2005Daria Barger – DB Seminar 12 Update operations (2)  InsertBefore(x,y) – insert element y as immediate left sibling of element x.(This operation is not defined if x is the root of the document). A AA AA AAA AA AA AA A y x Insert Before

13 Spring 2005Daria Barger – DB Seminar 13 Update operations(3)  Delete(x) – delete element x from the document. Note that if x is the root of the document the operation is trivially valid. A AA AA AAA AA x A AA AA Delete(x)

14 Spring 2005Daria Barger – DB Seminar 14 Observation The incremental validation concerns only the content of the element where the update takes place. For example, after an Append(p,y) operation only the content of p needs to be revalidated.

15 Spring 2005Daria Barger – DB Seminar 15  Together with the i-th child of p we store the value of for the automaton that validates the content model of p.  This requires auxiliary storage of size O(n log d), where n is a size of XML document, d is size of DTD The approach p wkwk w2w2 w1w1 w3w3 …

16 Spring 2005Daria Barger – DB Seminar 16 Append at the end Append(p,y) operation p y wkwk w2w2 w1w1 w3w3 …

17 Spring 2005Daria Barger – DB Seminar 17 Arbitrary insertions and deletions Delete(x) operation Problem: Complexity p wkwk w2w2 w1w1 wiwi … …

18 Spring 2005Daria Barger – DB Seminar 18 1,2 Conflict Free Regular Expression Let’s consider E=a(b 1 *|cb 2 *) W=acb…b. All b’s match state b 2 Delete c from w, receive w’=ab…b Now all b’s match state b 1 We should re - validate the entire string Possible solution: This condition does not hold always, e.g.

19 Spring 2005Daria Barger – DB Seminar 19 Definition of 1,2 Conflict-free Let E be regular expression over alphabet Σ Follow(E,x) – set of position in E that can follow x in some path through E. Define such that E is 1,2 conflict - free regular expression if:

20 Spring 2005Daria Barger – DB Seminar 20 Restricted forms of DTD  1,2 Conflict Free DTD  There is no “flipping” between automata states after the update.  The per update complexity for 1,2 Conflict Free DTD is O(log n + log d) time and O(n log d) auxiliary space.  Conflict-free DTD:  No repeated symbols.  The per update complexity: O(log n + log d) and constant auxiliary space.

21 Spring 2005Daria Barger – DB Seminar 21 Incremental validation of ID and IDREF for adding element Append(p,y) and InsertBefore(x,y) operations require checking that no two ID attributes are the same and every IDREF attribute in y refers to some existing document values. The complexity: O(|y|log n) time and linear auxiliary space. |y| = size of added subtree.

22 Spring 2005Daria Barger – DB Seminar 22 Incremental validation of ID and IDREF for deleting element After Delete(x) operation we have to check that there is no subtree rooted at x that contains a node that has an ID attribute referenced by some other node that is not a descendant of x. a b c Checking reference counter in delete requires O(log n) time. Updating reference counter in insert/removing IDREF attribute: O(h log n) time.

23 Spring 2005Daria Barger – DB Seminar 23 Valid Insertion 2G256M32M4M512K64K 100 10000 1e+06 1e+08 Document size Time [micro sec] Incr CF – Incr 1.2 CF – Incr Arb – Full Arb – Full CF -

24 Spring 2005Daria Barger – DB Seminar 24 Valid Deletion Time [micro sec] 100 10000 1e+06 1e+08 2G256M32M4M512K64K Document size Incr CF – Incr 1.2 CF – Incr Arb – Full Arb – Full CF -

25 Spring 2005Daria Barger – DB Seminar 25 Invalid Deletion 10 2G256M32M4M512K64K Document size 100 1000 Time [micro sec] Incr CF – Incr 1.2 CF – Incr Arb – Full Arb – Full CF -

26 Spring 2005Daria Barger – DB Seminar 26 Conclusions 1.Handled insertion and deletion of subtrees (not leaf nodes only). 2.Validated ID and IDREF attributes. 3.Characterize a class of DTDs appearing to capture most real life DTDs that admits a log time and constant space incremental validation algorithm. 4.Conducted experiments showing that the method is practical for large data documents and behaves much better than full revalidation.

27 Spring 2005Daria Barger – DB Seminar 27 Future Work Handling complex updates, involving several insertions and deletions as a single transactions.


Download ppt "Spring 2005Daria Barger – DB Seminar 1 Efficient Incremental Validation of XML Documents Denilson Barbosa Alberto O.Mendelson Leonid Libkin Laurent Mignet."

Similar presentations


Ads by Google