# Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa.

## Presentation on theme: "Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa."— Presentation transcript:

Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa

Paolo Ferragina, Università di Pisa Generalised Rank and Select Rank(c,i) = #c in L[1,i] Select(c,i) = position of the i-th c in L L = a b a a a c b c d a b e c d... Rank( a, 7 ) = 4Select( a, 2 ) = 3

Paolo Ferragina, Università di Pisa Generalised Rank and Select  If  is small (i.e. constant)  Build binary Rank data structure for each symbol of  Rank takes O(1) time and small space  If  is large ( words ?)  Need a smarter solution: Wavelet Tree data structure Algorithmic reduction: >> Reduce Rank&Select over arbitrary strings... to Rank&Select over binary strings

Paolo Ferragina, Università di Pisa The Wavelet Tree ac br d abracadabra (Alphabetic ?) Tree

Paolo Ferragina, Università di Pisa The Wavelet Tree ac br d abracadabra aacaaabrdbr brbr rr ? aaaaa ? bb ? d ?

Paolo Ferragina, Università di Pisa The Wavelet Tree ac br d abracadabra aacaaa brdbr brbr abracadabra 01100010110 aacaaa 001000 brdbr 00100 brbr 0101 01100010110 00100000100 0101 Fact. Given the tree and the binary strings, we can recover the original string !! In any case, O(|  | log |  |) bits. Easier Alphabetic order + Heap structure

Paolo Ferragina, Università di Pisa brdbr 00100 abracadabra 01100010110 brbr 0101 aacaaa 001000 The Wavelet Tree ac br d Rank(b,8) Rank(b,3) Rank(b,2) Reduce to right symbols Reduce to left symbols It’s binary Every step can be turned to binary

Paolo Ferragina, Università di Pisa abracadabra 01100010110 Rank 1 (8)=3 Rank 0 (2) = 2 – Rank 1 (1)= 1 Rank 0 (3) = 3 – Rank 1 (3)= 2 brbr 0101 brdbr 00100 aacaaa 001000 The Wavelet Tree ac br d Generalised R&S implemented with log |  | binary R&S Rank(b,8) Right move = Rank 1 Left move = Rank 0 Left move = Rank 0 Select is similar

Paolo Ferragina, Università di Pisa Representing Trees Paolo Ferragina Dipartimento di Informatica, Università di Pisa

Standard representation Binary tree: each node has two pointers to its left and right children An n-node tree takes 2n pointers or 2n lg n bits. Supports finding left child or right child of a node (in constant time). For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each. x xxxx xxxx

Can we improve the space bound? There are less than 2 2n distinct binary trees on n nodes. 2n bits are enough to distinguish between any two different binary trees. Can we represent an n node binary tree using 2n bits?

Binary tree representation A binary tree on n nodes can be represented using 2n+o(n) bits to support: parent left child right child in constant time.

Heap-like notation for a binary tree 1 111 11 1 1 0000 0000 0 Add external nodes Label internal nodes with a 1 and external nodes with a 0 Write the labels in level order 1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 One can reconstruct the tree from this sequence An n node binary tree can be represented in 2n+1 bits. What about the operations?

Heap-like notation for a binary tree 1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 8 5764 32 1 9 17161514 13 12 1110 1 87 654 3 2 1 2 3 4 5 6 7 8 parent(x) = On red ( ⌊ x/2 ⌋ ) left child(x) = On green(2x) right child(x) = On green(2x+1) x  x: # 1’s up to x (Rank) x  x: position of x-th 1 (Select)

Download ppt "Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa."

Similar presentations