Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu.

Similar presentations


Presentation on theme: "Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu."— Presentation transcript:

1 Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu

2 ICDE'06Efficient Processing of Updates in XML2 Outline Background and related work Our proposals –Lexicographical order –A compact dynamic binary string encoding (CDBS) –Applying CDBS to different labeling schemes for update processing –Experimental evaluation Conclusion

3 ICDE'06Efficient Processing of Updates in XML3 Background and related work: Labeling schemes Three main categories of labeling schemes to process XML queries –(1) Containment labeling scheme [Zhang et al SIGMOD01 etc.] –(2) Prefix labeling scheme [Tatarinov et al SIGMOD02 etc.] –(3) Prime number labeling scheme [Wu et al ICDE04] In this talk, we focus on the labeling schemes to efficiently process updates

4 ICDE'06Efficient Processing of Updates in XML4 (1) Containment scheme Each node is assigned with three values, i.e. “ start ”, “ end ”, and “ level ” Based on “ start ”, “ end ”, and “ level ” to determine different relationships 1,18,1 2,3,24,9,210,11,2 12,17,2 5,6,37,8,315,16,313,14,3

5 ICDE'06Efficient Processing of Updates in XML5 Containment is bad to process updates Need to re-label all the ancestor nodes and all the nodes after the inserted node in document order 1,18,1 2,3,24,9,210,11,2 12,17,2 5,6,37,8,315,16,313,14,3

6 ICDE'06Efficient Processing of Updates in XML6 Containment is bad to process updates Need to re-label all the ancestor nodes and all the nodes after the inserted node in document order 1,20,1 2,3,24,9,212,13,2 14,19,2 5,6,37,8,317,18,315,16,3 10,11,2

7 ICDE'06Efficient Processing of Updates in XML7 Existing approaches to process the updates in containment scheme Increase the interval size and leave some values unused for the future insertions [Li et al VLDB01] –When unused values are used up, have to re-label Use float-point value [Amagasa et al ICDE03] –Float-point value represented in a computer with a fixed number of bits –Due to float-point precision, have to re-label They both can not avoid the re-labeling

8 ICDE'06Efficient Processing of Updates in XML8 (2) Prefix scheme Three main prefix schemes –DeweyID [Tatarinov et al SIGMOD02] –BinaryString [Cohen et al PODS02] –OrdPath [O'Neil et al SIGMOD04]

9 ICDE'06Efficient Processing of Updates in XML9 DeweyID (Cont.) Determine different relationships based on the prefix property 123 4 2.12.24.24.1

10 ICDE'06Efficient Processing of Updates in XML10 DeweyID is bad to process order- sensitive updates Order-sensitive updates: to maintain the document order when updates are performed –Need to re-label all the sibling nodes after the inserted node and all the descendants of these siblings 4.2 123 4 2.12.24.1

11 ICDE'06Efficient Processing of Updates in XML11 DeweyID is bad to process order- sensitive updates Order-sensitive updates: to maintain the document order when updates are performed –Need to re-label all the sibling nodes after the inserted node and all the descendants of these siblings 5.2 124 5 2.12.25.1 3

12 ICDE'06Efficient Processing of Updates in XML12 Existing approaches to process the updates in prefix scheme: OrdPath OrdPath [O'Neil et al SIGMOD04] –Similar to DeweyID –But at the beginning, use odd numbers only 135 7 3.13.37.37.1

13 ICDE'06Efficient Processing of Updates in XML13 Existing approaches to process the updates in prefix scheme: OrdPath OrdPath a bdc 1 3 5 7 3.13.37.37.1 Label of node a “ -1 ” Label of node b “ 4.1 ” Label of node c “ 4.3 ” Label of node d “ 4.2.1 ” They are siblings, but their labels look very different

14 ICDE'06Efficient Processing of Updates in XML14 (3) Prime number scheme [Wu et al ICDE04] Prime re-calculate the SC value to maintain the document order instead of re-labeling. But re-calculation is much more expensive.

15 ICDE'06Efficient Processing of Updates in XML15 Our CDBS encoding (1) Lexicographical order (2) Encoding (3) Applications and processing of updates (4) Experimental results

16 ICDE'06Efficient Processing of Updates in XML16 (1) Lexicographical order of binary string Given two binary strings “ 0011 ” and “ 01 ”, “ 0011 ” “ 01 ” lexicographically because the comparison is from left to right, and the 2 nd bit of “ 0011 ” is “ 0 ”, while the 2 nd bit of “ 01 ” is “ 1 ”. “ 0011 ” < “ 01 ” Given two binary strings “ 01 ” and “ 0101 ”, “ 01 ” “ 0101 ” lexicographically because “ 01 ” is a prefix of “ 0101 ”. “01” < “0101”

17 ICDE'06Efficient Processing of Updates in XML17 Find a binary string between two binary strings lexicographically To insert a binary string between “0011” and “01” –the size of “0011” is 4 which is larger than the size 2 of “01”; this is Case (a) (larger than or equal) –therefore we directly concatenate one more “1” after “0011”. –The inserted binary string is “00111”, and “0011” < “ 00111 ” < “ 01 ” lexicographically. To insert a binary string between “ 01 ” and “ 0101 ” –the size of “ 01 ” is 2 which is smaller than the size 4 of “ 0101 ” ; this is Case (b) (smaller than) –therefore we change the last bit “ 1 ” of “ 0101 ” to “ 01 ”, i.e. the inserted binary string is “ 01001 ” ; “ 01 ” < “ 01001 ” < “ 0101 ” lexicographically.

18 ICDE'06Efficient Processing of Updates in XML18 (2) Compact encoding Achieved the dynamic objective. Further, we need to propose a Compact Dynamic Binary String encoding, called CDBS.

19 ICDE'06Efficient Processing of Updates in XML19 Example illustration of CDBS We show how to encode 18 numbers based on our CDBS encoding This is only an example, any other numbers can be encoded with our CDBS 1,18,1 2,3,24,9,210,11,2 12,17,2 5,6,37,8,315,16,313,14,3

20 ICDE'06Efficient Processing of Updates in XML20 Integer numberV-Binary 11 210 311 4100 5101 6110 7111 81000 91001 101010 111011 121100 131101 141110 151111 1610000 1710001 1810010 Total size (bits) 64

21 ICDE'06Efficient Processing of Updates in XML21 Integer numberV-BinaryF-Binary 1100001 21000010 31100011 410000100 510100101 611000110 711100111 8100001000 9100101001 10101001010 11101101011 12110001100 13110101101 14111001110 15111101111 1610000 1710001 1810010 Total size (bits) 6490

22 ICDE'06Efficient Processing of Updates in XML22 Integer numberV-BinaryV-CDBSF-Binary 1100001 21000010 31100011 410000100 510100101 611000110 711100111 8100001000 9100101001 101010101010 11101101011 12110001100 13110101101 14111001110 15111101111 1610000 1710001 1810010 Total size (bits) 6490

23 ICDE'06Efficient Processing of Updates in XML23 Integer numberV-BinaryV-CDBSF-Binary 1100001 21000010 31100011 410000100 51010100101 611000110 711100111 8100001000 9100101001 101010101010 11101101011 12110001100 13110101101 14111001110 1511111101111 1610000 1710001 1810010 Total size (bits) 6490

24 ICDE'06Efficient Processing of Updates in XML24 Integer numberV-BinaryV-CDBSF-Binary 1100001 21000010 31100100011 410000100 51010100101 611000110 711100111 8100001101000 9100101001 101010101010 11101101011 12110001100 13110110101101 14111001110 1511111101111 1610000 171000111110001 1810010 Total size (bits) 6490

25 ICDE'06Efficient Processing of Updates in XML25 Integer numberV-BinaryV-CDBSF-Binary 1100001 210000100010 31100100011 4100001100100 51010100101 611000110 7111010100111 8100001101000 91001011101001 101010101010 11101101011 121100100101100 13110110101101 141110101101110 1511111101111 1610000110110000 171000111110001 1810010111110010 Total size (bits) 6490

26 ICDE'06Efficient Processing of Updates in XML26 Integer numberV-BinaryV-CDBSF-Binary 1100001 210000100010 31100100011 4100001100100 51010100101 61100100100110 7111010100111 8100001101000 91001011101001 101010101010 1110111000101011 121100100101100 13110110101101 141110101101110 1511111101111 1610000110110000 171000111110001 1810010111110010 Total size (bits) 64 90

27 ICDE'06Efficient Processing of Updates in XML27 Integer numberV-BinaryV-CDBSF-BinaryF-CDBS 1100001 210000100010 3110010001100100 410000110010000110 5101010010101000 6110010010011001001 711101010011101010 810000110100001100 9100101110100101110 10101010101010000 111011100010101110001 12110010010110010010 1311011010110110100 14111010110111010110 151111110111111000 161000011011000011010 17100011111000111100 181001011111001011110 Total size (bits) 64 90

28 ICDE'06Efficient Processing of Updates in XML28 (3) Applying CDBS to the containment scheme Replace the “start” and “end” values 1 to 18 with our CDBS encoding Based on the lexicographical order comparison Level is still the same 00001,1111,1 0001,001,20011,0111,21,10001,2 1001,111,2 01,01001,30101,011,311,1101,3101,1011,3

29 ICDE'06Efficient Processing of Updates in XML29 Applying CDBS to the prefix scheme The CDBS codes for 4 numbers are “001”, “01”, “1” and “11”. The CDBS codes for 2 numbers are “01” and “1”. 001011 11 01.0101.111.111.01

30 ICDE'06Efficient Processing of Updates in XML30 Applying CDBS to the prime scheme Store the document order with our CDBS codes. Based on the lexicographical order to determine the orders of nodes. The size of Prime and the query performance of Prime are bad, so we do not show the details.

31 ICDE'06Efficient Processing of Updates in XML31 Processing updates based on CDBS: for containment scheme To insert two binary strings between “0011” and “01”, the inserted two binary strings will be “00111” and “001111”. The complete label of the inserted node is “00111,001111,3” No need to re-label the existing nodes, but different relationships, e.g. ancestor-descendant etc., can be determined, and the orders can be kept. 00001,1111,1 0001,001,20011,0111,21,10001,2 1001,111,2 01,01001,30101,011,311,1101,3101,1011,3

32 ICDE'06Efficient Processing of Updates in XML32 Processing updates based on CDBS: for prefix scheme To insert a binary string before “01”, the inserted binary string will be “001” The complete label of the inserted node is “01.001” No need to re-label the existing nodes, but different relationships, e.g. ancestor-descendant etc., can be determined, and the orders can be kept. 001011 11 01.0101.111.111.01

33 ICDE'06Efficient Processing of Updates in XML33 Problem about CDBS The size of V-CDBS and F-CDBS may encounter the overflow problem when many nodes are inserted. To solve the overflow problem, we propose QED in [ Li & Ling CIKM05 ] QED uses four quaternary symbols, i.e. 0, 1, 2, and 3, and each is stored with 2 bits –0 is used as the separator or delimiter, and it will never encounter the overflow problem –QED is not as compact as CDBS, update cost is higher

34 ICDE'06Efficient Processing of Updates in XML34 (4) Experimental results Experimental setup Performance study on static XML Performance study on updates

35 ICDE'06Efficient Processing of Updates in XML35 Experimental setup All the schemes are implemented in Java and all the experiments are carried out on a 3.0 GHz Pentium 4 processor with 1 GB RAM running Windows XP Professional.

36 ICDE'06Efficient Processing of Updates in XML36 DatasetsTopics # of files Max/ave rage fan- out for a file Max/ave rage depth for a file Total # of nodes for each dataset D1Movie 49014/65/526044 D2Department 19233/814/448542 D3Actor 48037/115/556769 D4Company 24529/1355/3161576 D5Shakespeare’s play 37434/486/5179689 D6NASA 18821188/97/5370292 The following table shows the datasets we used. Experimental setup (cont.)

37 ICDE'06Efficient Processing of Updates in XML37 Performance study on static XML Our V-CDBS and F-CDBS are the most compact variable and fixed length dynamic encoding Label sizes of different schemes

38 ICDE'06Efficient Processing of Updates in XML38 The 5 cases of node updates in experiments We select one XML file Hamlet in dataset D1 to test the update performance (it is similar for other XML files). Hamlet has 5 act elements. We test the following 5 cases –inserting an act element before act[1], –inserting an act element before act[2], –···, –and inserting an act element before act[5].

39 ICDE'06Efficient Processing of Updates in XML39 Labeling schemes Number of nodes to re-label (5 cases) 12345 Float-point-Containment00000 V-Binary-Containment65965121393224311300 F-Binary-Containment65965121393224311300 V-CDBS-Containment00000 F-CDBS-Containment00000 BinaryString-Prefix 6595512039312430 1299 DeweyID(UTF8)-Prefix 6595512039312430 1299 OrdPath1-Prefix00000 OrdPath2-Prefix00000 QED-Prefix00000 Prime13201025787487261 Number of nodes to re-label in updates

40 ICDE'06Efficient Processing of Updates in XML40 Total time for node updates Several nodes inserted, main time is the I/O time, our approaches are the best to process updates. When considering processing time only, our approaches are much better, more than 300 times faster. More appropriate for updates with many nodes. Log2(Update time) of different schemes

41 ICDE'06Efficient Processing of Updates in XML41 Conclusion Our CDBS is dynamic Our CDBS is the most compact Update cost is the cheapest, only need to modify the last 1 bit of the neighbor label


Download ppt "Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu."

Similar presentations


Ads by Google