Presentation is loading. Please wait.

Presentation is loading. Please wait.

Updating SF-Tree Speaker: Ho Wai Shing.

Similar presentations


Presentation on theme: "Updating SF-Tree Speaker: Ho Wai Shing."— Presentation transcript:

1 Updating SF-Tree Speaker: Ho Wai Shing

2 Contents Introduction, and summary of SF-Tree
Possible SF-Tree update techniques and their limitations Future work

3 Introduction SF-Tree stands for Signature File Tree
it is originally designed for storing XML SPE selectivity it can be generalized to store an "object-to-count" mapping

4 SF-Tree Basics Divide objects into groups of the same (or similar) count(s) Finding the count of an object is equivalent to finding the group containing the object Signature files are used to summarize groups SFs are organized in a tree form

5 Signature Files Are bit vectors
Computed by hashing objects into bit positions and setting the bits Existence of an object can be checked by checking the hashed bit positions

6 Signature Files e.g., F has 10 bits, m = 3
"//name" is hashed to bits 2, 3, 8 "//buyer" is hashed to bits 2, 4, 9

7 SF-Tree

8 SF-Tree

9 Introduction Advantages: Time efficient: independent of database size
Space efficient: independent of original object size Accurate: has statistical accuracy guarantee on the returned counts Flexible: can tune parameters to make trade-offs among space, accuracy, speed.

10 Introduction Disadvantages:
Not 100% accurate, since original objects are discarded All information must be available before building a Shannon-Fano SF-Tree Updates may reduce its accuracy

11 Introduction Possible applications
support count storage in stream data mining data cube storage Periodic reconstructions are not always feasible (esp. data streams), so we need good update strategy for SF-Tree

12 Updating SF-Tree basic idea: count of an object changed
the object is changed from one group to another i.e., the object is deleted in a group, and inserted into another group

13 Updating SF-Tree Problems:
No deletion algorithm for deleting objects in signature files No algorithm for creating a new group in a SF-Tree No way to compute new signatures for the existing objects

14 Updating SF-Tree Solutions to the problems For deletion:
Counter-based signature files "No-deletion" scheme Look-ahead Retrieval For insertion: Dynamic expanding signature files Precomputed signature files Negative signature files

15 Counter-Based Signature Files
Motivation: deletion of an object involves resetting some bits resetting bits may cause "false negatives" (can't retrieve some objects which are present in this group)

16 Counter-Based Signature Files
e.g., "//name" is hashed to bits 2, 3, 8 "//buyer" is hashed to bits 2, 4, 9 when we remove "//buyer", "//name" can't be retrieved (since bit-2 = 0)

17 Counter-Based Signature Files
Use counters instead of bits in the signature file e.g., "//name" is hashed to bits 2, 3, 8 "//buyer" is hashed to bits 2, 4, 9

18 Counter-Based Signature Files
Advantages: deleting an objects which is in F won't cause "false negatives" Disadvantages: space requirement increases deletion due to false drop still causes troubles counters may overflow

19 "No-Deletion" Scheme Motivation:
Deletion causes troubles (false negatives) No way to completely avoid it False negatives may cause big errors in counts

20 "No-Deletion" Scheme Won't reset the bits when deleting an object (i.e., no deletion) For retrieval, return the count of the group with largest count Advantages: completely remove all the troubles caused by deletion (no more false negatives)

21 "No-Deletion" Scheme Disadvantages:
may reduce the accuracy of the signature (if the signature size is unchanged) may increase the space requirement significantly (if we maintain the accuracy) can be applied only to applications that counts are monotonic increasing (decreasing)

22 "No-Deletion" Scheme Still useful:
for updates between periodic reconstructions since retrieval time is unchanged (for positive queries) more space efficient than counters if updates are not frequent (since we can use bit-based signature files)

23 Look-Ahead Retrieval Motivation:
Reduces errors due to false negatives Retrieval stops only when two consecutive levels of signature files do not contain the query Re-insert the object if we think that it's a false negative

24 Look-Ahead Retrieval e.g.,

25 Look-Ahead Retrieval Advantages:
Reduces the probability that SF-Tree is affected by false negatives The self-healing property removes some false negatives

26 Look-Ahead Retrieval Disadvantages:
slower (more signatures to be examined) false drop rate increases more 1s created in the self-healing process one less level for false drop safe-guard

27 Insertion Involves 2 parts
insertion within a signature file inserting new signature files in SF-Tree A signature file has a capacity under an error bound more objects  more 1s  higher false drop rate

28 Dynamically Expanding Signature Files
To maintain the error (false drop prob.) bound, the size of SFs must be able to be increased However, objects are dropped  we can't recompute the signatures of previous objects Add new signatures to represent new objects

29 Dynamically Expanding Signature Files
e.g., create a new signature if the first one is full doubling the size at each creation

30 Dynamically Expanding Signature Files
Advantages: the signature files can now store arbitrary number of objects Disadvantages: less space efficient

31 Precomputed Signature Files
The previous method solves insertion within a signature file The next two methods concentrates at inserting new count groups

32 Precomputed Signature Files
Problem: adding a new count group (i.e., leaf node) may involve adding new internal nodes e.g.,

33 Precomputed Signature Files
Solution: the signature file is precomputed, i.e.,

34 Negative Signature Files
Alternative solution to the previous problem A "negative signature file" stores the deleted objects

35 Summary For deletion: For insertion: Counter-based signature files
"No-deletion" scheme Look-ahead Retrieval For insertion: Dynamic expanding signature files Precomputed signature files Negative signature files

36 Future Work Implement and check the performance of different update strategies Identify the requirements of various applications (e.g., support counting, data cube storage) and choose a suitable SF-Tree strategy


Download ppt "Updating SF-Tree Speaker: Ho Wai Shing."

Similar presentations


Ads by Google