Download presentation
Presentation is loading. Please wait.
Published byErika Reed Modified over 8 years ago
1
On Shmuel Tomi Klein Bar Ilan University Back space Dana Shapira Ashkelon Academic College the Uselfulness of
2
Extension of study of NEGATION in large IR systems United - NationsEdgar (-1:2) Po Backspace Not really a character, but can be useful
3
Three applications Three applications Handling large numbers Text compression in IR Blockwise Huffman decoding
4
Handling large numbers 1 Syntax: A (1:3) -B (1:5) C-D (1:1) E In use at the Responsa Project
5
Handling large numbers 1 Too many large numbers Break in blocks of k digits 1234567 1234 567 Problem with precision: 5678 also retrieves 123456789
6
Handling large numbers 1 Each word includes a trailing blank House of Lords I declared an income of 1000000 on my last 10 1040 forms Long numbers use Backspace BS 1234567890 1234 BS 5678 BS 90 I declared an income of 1000 BS 000 on my last 10 1040 forms 1 2 3 4 5 6 7 8 9 10 11 12 13 14
7
Handling large numbers 1 234 -BS 234 To search for submit query 2000 1040 -BS 12345678 -BS 1234 BS 5678 -BS 1234567 -BS 1234 BS 567 user@addr.com user @ BS addr. BS com
8
Text Compression in IR 2 Huffword: alternating words and non-words Use single Huffman tree for: — words including a trailing blank — punctuation signs: BS ; — Backspace, to handle exceptions
9
Text Compression in IR 2 bzipgzipBSHuffHuffwordSizeFile 4.413.283.973.913.1MBEnglish 4.633.274.033.987.1MBFrench
10
Given Alphabet with probabilities find lengths such that average length is minimized A B D C E 1 0 0 0 0 1 1 1 ABCDEABCDE 0.4 0.3 0.1 0.1 0.1 1234412344 HUFFMAN 0 11 101 1000 1001 Blockwise Huffman decoding 3
11
Table Entry Pattern Decoding 0 1 0010 0 1 A A Rem A B D C E 10 0 0 0 1 1 1 0 1 2 3 1 6 111011 10 B Rem 3 3 100011 1000 11 D B Decoding k bits together Decoding k bits together Partial decoding tables
12
Decoding k bits together Decoding k bits together Partial decoding tables 0 1 3 2 A B D C E Pattern for Table 0 Table 0 Table 1 Table 2 Table 3 WlWlWlWl 0000AAA0D0DA0DAA0 1001AA1E0D1DA1 2010A2CA0EA0D2 3011AB0C1E1DB0 4100-3BAA0CAA0EAA0 5101C0BA1CA1EA1 6110BA0B2C2E2 7111B1BB0CB0EB0 Prefix: Λ10100 1
13
Pattern for Table 0 Table 0 Table 1 Table 2 Table 3 WlWlWlWl 0000AAA0D0DA0DAA0 1001AA1E0D1DA1 2010A2CA0EA0D2 3011AB0C1E1DB0 4100-3BAA0CAA0EAA0 5101C0BA1CA1EA1 6110BA0B2C2E2 7111B1BB0CB0EB0 forto EOI (output, j ) ← T ( j, M [ f ; f + k – 1 ] ) 100101 000110 j 0 3 - 1 EA C 0 DA 2 B Decoding Algorithm 100 101 000 110 101 output A 0 B 11 C 101 D 1000 E 1001
14
Looking for new tradeoffs 0 1 3 2 A B D C E Reduced Partial decoding tables includingbackspaces 0 3
15
Pattern for Table 0 Table 0 Table 3 WlbWlb 0000AAA00DAA00 1001AA01DA01 2010A02D02 3011AB00DB00 4100-30EAA00 5101C00EA01 6110BA00E02 7111B01EB00 Revised Decoding Algorithm forto EOI (output, j ) ← T ( j, M [ f ; f + k – 1 ] ), back – back 1 0 0 1 0 1 1 1 0 0 0 0 1 0 1 EA - DA C 1 - 1 B 1 Reduced tables A 0 B 11 C 101 D 1000 E 1001
16
1 0 0 1 0 1 1 1 0 0 0 0 1 0 1 EABDAC Regular Huffman - EABDAC Partial decoding tables - EAB - DAC Reduced tables with backspace
17
Bitpartialdecodetablesreducedtables k188 WSJ bpa186.4 MB/ sec 6.6-7.6 RAM2.119734.1 Experimental results
18
Bitpartialdecodetablesreducedtables k188 KJV bpa186.4 MB/ sec 10.10.413.7 RAM0.21178.7 Experimental results
19
Conclusion 3 examples of IR applications Use of conceptual elements, like backspaces, may improve algorithms.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.