Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Shmuel Tomi Klein Bar Ilan University Back space Dana Shapira Ashkelon Academic College the Uselfulness of.

Similar presentations


Presentation on theme: "On Shmuel Tomi Klein Bar Ilan University Back space Dana Shapira Ashkelon Academic College the Uselfulness of."— Presentation transcript:

1 On Shmuel Tomi Klein Bar Ilan University Back space Dana Shapira Ashkelon Academic College the Uselfulness of

2 Extension of study of NEGATION in large IR systems United - NationsEdgar (-1:2) Po Backspace Not really a character, but can be useful

3 Three applications Three applications Handling large numbers Text compression in IR Blockwise Huffman decoding

4 Handling large numbers 1 Syntax: A (1:3) -B (1:5) C-D (1:1) E In use at the Responsa Project

5 Handling large numbers 1 Too many large numbers Break in blocks of k digits 1234567 1234 567 Problem with precision: 5678 also retrieves 123456789

6 Handling large numbers 1 Each word includes a trailing blank House  of  Lords  I declared an income of 1000000 on my last 10 1040 forms Long numbers use Backspace BS 1234567890 1234  BS 5678  BS 90  I declared an income of 1000 BS 000 on my last 10 1040 forms 1 2 3 4 5 6 7 8 9 10 11 12 13 14

7 Handling large numbers 1 234 -BS 234 To search for submit query 2000 1040 -BS 12345678 -BS 1234 BS 5678 -BS 1234567 -BS 1234 BS 567 user@addr.com user @ BS addr. BS com

8 Text Compression in IR 2 Huffword: alternating words and non-words Use single Huffman tree for: — words including a trailing blank — punctuation signs: BS ;  — Backspace, to handle exceptions

9 Text Compression in IR 2 bzipgzipBSHuffHuffwordSizeFile 4.413.283.973.913.1MBEnglish 4.633.274.033.987.1MBFrench

10 Given Alphabet with probabilities find lengths such that average length is minimized A B D C E 1 0 0 0 0 1 1 1 ABCDEABCDE 0.4 0.3 0.1 0.1 0.1 1234412344 HUFFMAN 0 11 101 1000 1001 Blockwise Huffman decoding 3

11 Table Entry Pattern Decoding 0 1 0010 0 1 A A Rem A B D C E 10 0 0 0 1 1 1 0 1 2 3 1 6 111011 10 B Rem 3 3 100011 1000 11 D B Decoding k bits together Decoding k bits together Partial decoding tables

12 Decoding k bits together Decoding k bits together Partial decoding tables 0 1 3 2 A B D C E Pattern for Table 0 Table 0 Table 1 Table 2 Table 3 WlWlWlWl 0000AAA0D0DA0DAA0 1001AA1E0D1DA1 2010A2CA0EA0D2 3011AB0C1E1DB0 4100-3BAA0CAA0EAA0 5101C0BA1CA1EA1 6110BA0B2C2E2 7111B1BB0CB0EB0 Prefix: Λ10100 1

13 Pattern for Table 0 Table 0 Table 1 Table 2 Table 3 WlWlWlWl 0000AAA0D0DA0DAA0 1001AA1E0D1DA1 2010A2CA0EA0D2 3011AB0C1E1DB0 4100-3BAA0CAA0EAA0 5101C0BA1CA1EA1 6110BA0B2C2E2 7111B1BB0CB0EB0 forto EOI (output, j ) ← T ( j, M [ f ; f + k – 1 ] ) 100101 000110 j 0 3 - 1 EA C 0 DA 2 B Decoding Algorithm 100 101 000 110 101 output A 0 B 11 C 101 D 1000 E 1001

14 Looking for new tradeoffs 0 1 3 2 A B D C E Reduced Partial decoding tables includingbackspaces 0 3

15 Pattern for Table 0 Table 0 Table 3 WlbWlb 0000AAA00DAA00 1001AA01DA01 2010A02D02 3011AB00DB00 4100-30EAA00 5101C00EA01 6110BA00E02 7111B01EB00 Revised Decoding Algorithm forto EOI (output, j ) ← T ( j, M [ f ; f + k – 1 ] ), back – back 1 0 0 1 0 1 1 1 0 0 0 0 1 0 1 EA - DA C 1 - 1 B 1 Reduced tables A 0 B 11 C 101 D 1000 E 1001

16 1 0 0 1 0 1 1 1 0 0 0 0 1 0 1 EABDAC Regular Huffman - EABDAC Partial decoding tables - EAB - DAC Reduced tables with backspace

17 Bitpartialdecodetablesreducedtables k188 WSJ bpa186.4 MB/ sec 6.6-7.6 RAM2.119734.1 Experimental results

18 Bitpartialdecodetablesreducedtables k188 KJV bpa186.4 MB/ sec 10.10.413.7 RAM0.21178.7 Experimental results

19 Conclusion 3 examples of IR applications Use of conceptual elements, like backspaces, may improve algorithms.

20


Download ppt "On Shmuel Tomi Klein Bar Ilan University Back space Dana Shapira Ashkelon Academic College the Uselfulness of."

Similar presentations


Ads by Google