Information Distance from a Question to an Answer.

Slides:



Advertisements
Similar presentations
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Information Distance More Applications. 1. Information Distance from a Question to an Answer.
CY2G2 Information Theory 1
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Applied Algorithmics - week7
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
One more definition: A binary tree, T, is balanced if T is empty, or if abs ( height (leftsubtree of T) - height ( right subtree of T) )
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
T(n) = 4 T(n/3) +  (n). T(n) = 2 T(n/2) +  (n)
Information Theory Introduction to Channel Coding Jalal Al Roumy.
Data Compression.
Entropy and Shannon’s First Theorem
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Fundamental limits in Information Theory Chapter 10 :
CS Lecture 9 Storeing and Querying Large Web Graphs.
Revision of Chapter III For an information source {p i, i=1,2,…,N} its entropy is defined by Shannon’s first theorem: For an instantaneous coding, we have.
2010/5/171 Overview of graph cuts. 2010/5/172 Outline Introduction S-t Graph cuts Extension to multi-label problems Compare simulated annealing and alpha-
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
CS728 Lecture 16 Web indexes II. Last Time Indexes for answering text queries –given term produce all URLs containing –Compact representations for postings.
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Variable-Length Codes: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
Information Theory and Security
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Lecture 3. Relation with Information Theory and Symmetry of Information Shannon entropy of random variable X over sample space S: H(X) = ∑ P(X=x) log 1/P(X=x)‏,
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
Experiments in Dialectology with Normalized Compression Metrics Kiril Simov and Petya Osenova Linguistic Modelling Laboratory Bulgarian Academy of Sciences.
The Traveling Salesperson Problem Algorithms and Networks.
Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin.
Schemata Theory Chapter 11. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Theory Why Bother with Theory? Might provide performance.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Algorithmic Information Theory, Similarity Metrics and Google Varun Rao.
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Coding Theory Efficient and Reliable Transfer of Information
Information Theory Linear Block Codes Jalal Al Roumy.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Fidelity of a Quantum ARQ Protocol Alexei Ashikhmin Bell Labs  Classical Automatic Repeat Request (ARQ) Protocol  Quantum Automatic Repeat Request (ARQ)
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
HUFFMAN CODES.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
EE465: Introduction to Digital Image Processing
Applied Algorithmics - week7
Image Compression The still image and motion images can be compressed by lossless coding or lossy coding. Principle of compression: - reduce the redundant.
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Analysis & Design of Algorithms (CSCE 321)
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Let {image} Use substitution to determine which elements of S satisfy the inequality {image} Select the correct answer(s): {image}
Greedy: Huffman Codes Yin Tat Lee
Time Varying Convolutional Codes for Punctured Turbocodes
CSE 589 Applied Algorithms Spring 1999
Presentation transcript:

Information Distance from a Question to an Answer

Question & Answer  Practical concerns: Partial match only, often do not satisfy triangle inequality. When x is very popular, and y is not, x contains a lot of irrelevant information w.r.t. y, then C(x|y) << C(y|x), and d(x,y) prefers y. Neighborhood density -- there are answers that are much more popular than others. Nothing to compress: a question and an answer.

Partial match Triangle inequality does not hold: d(man,horse) ≥ d(man, centaur) + d(centaur, horse)

Separate Irrelevant Information  In max theory, we wanted smallest p, converting x,y:  Now let’s remove redundant information from p:  We now wish to minimize q+s+t. st x py x qy

The Min Theory (Li, Int’l J. TCS, 2007, Zhang et al, KDD’2007)  E min (x,y) = smallest program p needed to convert between x and y, but keeping irrelevant information out from p.  All other development similar to E(x,y). Define: min {C(x|y), C(y|x) } d min (x,y) = min {C(x),C(y)} Fundamental Theorem II: E min (x,y) = min { C(x|y), C(y|x) }

Other properties Theorem 1. d min (x,y) ≤ d max (x,y) Theorem 2. d min (x,y) is universal, does not satisfy triangle inequality is symmetric has required density properties: good guys have more neighbors.

How to approximate d max (x,y), d min (x,y)  Each term C(x|y) may be approximated by one of the following: 1. Compression. 2. Shannon-Fano code (Cilibrasi, Vitanyi): an object with probability p may be encoded by –logp + 1 bits. 3. Mixed usage of (1) and (2) – in question and answer application. This is especially useful for Q&A systems.

Shannon-Fano Code  Consider n symbols 1,2, …, N, with decreasing probabilities: p 1 ≥ p 2 ≥, … ≥ p n. Let P r =∑ i=1..r p i. The binary code E(r) for r is obtained by truncating the binary expansion of P r at length |E(r)| such that - log p r ≤ |E(r)| < -log p r +1  Highly probably symbols are mapped to shorter codes, and 2 -|E(r)| ≤ p r < 2 -|E(r)|+1  Near optimal: Let H = -∑ r p r logp r --- the average number of bits needed to encode 1…N. Then we have - ∑ r p r logp r ≤ H < ∑ r (-log p r +1)p r = 1 - ∑ r p r logp r

Query-Answer System X. Zhang, Y. Hao, X. Zhu, M. Li, KDD’2007  Adding conditions to normalized information distance, we built a Query-Answer system.  The information distance naturally measures Good pattern matches – via compression Frequently occurring items – via Shannon-Fano code Mixed usage of the above two.  Comparing to State-of-Art systems On 109 benchmark questions, ARANEA (Lin and Katz) answers 54% correct, QUANTA 75%.