# Fusion Trees Advanced Data Structures Aris Tentes.

## Presentation on theme: "Fusion Trees Advanced Data Structures Aris Tentes."— Presentation transcript:

Fusion Trees Advanced Data Structures Aris Tentes

Goal Fixed Universe Successor Problem We have a set of n numbers Each number has a length of at most log u bits (u=size of the fixed Universe) We want to perform the following actions: 1.Predecessor/Successor 2.Insertion/Deletion in time better than O(log n)

Model Transdichotomous RAM Memory is composed of words Each word has a length of w=log u Each item we store must fit in a word The following operations require constant time: 1.Addition, Subtraction 2.Multiplication, Division 3.AND, OR, XOR 4.left/right Shift 5.Comparison

Main Idea A fusion tree is a B-tree with fan-out and, therefore, has a height of If we find a way to determine, where a query fits among the B keys of a node in constant time, then we have an solution to our problem

In the Nodes Suppose that the keys (K) in a node are If we view them in a binary tree then we have the following picture: The black nodes are the branching nodes. For k keys, there are exactly k-1 branching nodes. However, some of them may be in the same level. Thus, less than k bits are required to distinguish the ‘s.

We construct the set B(K) with the branching levels (namely the bit positions required to distinguish the keys) Let with and Def. : PerfectSketch(x)= the extracted bits according to B(K) of x. Namely, the bits of x, which correspond to the positions If we collect the perfect sketches of all k keys, then we are able to reduce the node representation to k r-bit strings. That means that bits would be efficient. Less than a word!!

However, computing PerfectSketch(x) is difficult. Therefore, we compute an approximation, called Sketch(x). Sketch(x) contains the same bits with PerfectSketch(x), in the same order with some extra 0’s in between, but in consistent positions. This is done by multiplying x by a number m, which we will see later how we choose it.

Firstly, we compute leaving only the bits which correspond to B(K). If then we observe that All we need is to find an m such that: 1.All are distinct (no collisions) 2. (to preserve order) 3. are concentrated in a small range ( )

If we find such an m, then we compute which is long. Note that k sketches fit in a word.

Can we find such an m? Firstly, we show how to find such that whenever Suppose we have found with the desired property. We observe that implies Thus we can choose to be the least residue not represented among the fewer than residues of the form Then, by adding suitable values of we obtain the final values of m i

The set of the sketched keys of a node is denoted by S(K) Def.: We define the sketch of an entire node as follows:

Lemma Suppose y is an arbitrary number and x i an element of S (the set of keys). Let be the elements of B(S) and m-1 the most significant bit position in which PerfectSketch(y) and PerfectSketch(x i ) differ. Assume that p>b m is the most significant position in which y and x i differ. Then the rank(y) in S is uniquely determined by the interval containing p and the relative order between y and x i.

Using the previous lemma, we can reduce the computation of rank(y) in K to computing rank(Sketch(y)) in K(S). Having computed rank(Sketch(y)), we have determined the predecessor and successor Sketch(x i ) and Sketch(x i+1 ) of Sketch(y) in K(S). If x i ≤y≤x i+1, then we are done. Else we pick the one (from the sketched ones) with the longest prefix of significant bits with Sketch(y) and apply the previous lemma. Use of a look up table.

Finding the rank(Sketch(y)) in S(K) Firstly, we compute Then the substraction And finally Observing that.

Suitable multiplication sums these ones and gives the desired rank. What remains is to find a way to compute in constant time, the most significant bit, in which two numbers u,v differ. We can easily see that this problem is reduced to the problem of finding the most significant bit of u XOR v. We want to compute msb(x).

Lemma We call a number x d-sparse if the positions of its one bits belong to a set of the form Not all these positions have to be occupied by ones. If x is d-sparse, then there exist constants y,y’, such that for z=(yx)ANDy’ the i’th bit of z equals the bit in the position of a+di of x. Namely, z is a perfect compression of x.

At first consider a partitioning of the w bits of our word x into consecutive blocks of bits. The computation is divided into two phases. 1.We find the leftmost block containing a one and we extract this block 2.We find the leftmost one in this extracted block. msb(x)

Let be the number, which has ones precisely in the leftmost position of each block, namely and We compute lead(x)= the leftmost bit of each block is one iff x contains a one in this block. It is given by We observe that lead(x) is d-sparse, so we can apply the previous lemma and obtain compress(x). First Phase

Let be the set of the first b/s powers of two. We compute b’=rank(compress(x)) in P, in the same way as before. Note that b’ identifies the block number (counting from the right ) of the leftmost block of x containing a one.

The position of the most significant one in lead(x) is f=sb’ To extract the desired block we multiply by and right justify the significant portion.

Second Phase We want to find the position of the leftmost one in the extracted block. As before, we do a rank computation of these s bits with the first s powers of two. Now we have all the information needed to compute msb(x)

Conclusions In the static case, the problem of successor and predecessor, is clear to be solvable in time, since this is the height of our B-tree and the computation in each node requires constant time (the data we need is precomputed) In the dynamic case, the total time to update a node is The amortized time for insertion/deletion in a B- tree is constant.Therefore, sorting requires