Basic Concepts in Information Theory

Slides:



Advertisements
Similar presentations
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Advertisements

Entropy and Information Theory
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
The University of Manchester Introducción al análisis del código neuronal con métodos de la teoría de la información Dr Marcelo A Montemurro
Chapter 6 Information Theory
Measures of Information Hartley defined the first information measure: –H = n log s –n is the length of the message and s is the number of possible values.
An intuitive introduction to information theory Ivo Grosse Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre.
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
ENGS Lecture 8 ENGS 4 - Lecture 8 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
UCB Claude Shannon – In Memoriam Jean Walrand U.C. Berkeley
A Bit of Information Theory Unsupervised Learning Working Group Assaf Oron, Oct Based mostly upon: Cover & Thomas, “Elements of Inf. Theory”,
The 1’st annual (?) workshop. 2 Communication under Channel Uncertainty: Oblivious channels Michael Langberg California Institute of Technology.
Mutual Information for Image Registration and Feature Selection
Machine Learning CMPT 726 Simon Fraser University
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Distributed Source Coding 教師 : 楊士萱 老師 學生 : 李桐照. Talk OutLine Introduction of DSCIntroduction of DSC Introduction of SWCQIntroduction of SWCQ ConclusionConclusion.
Shannon ’ s theory part II Ref. Cryptography: theory and practice Douglas R. Stinson.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Information Theory Kenneth D. Harris 18/3/2015. Information theory is… 1.Information theory is a branch of applied mathematics, electrical engineering,
Noise, Information Theory, and Entropy
Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Noise, Information Theory, and Entropy
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Introduction to information theory
Some basic concepts of Information Theory and Entropy
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
§1 Entropy and mutual information
2. Mathematical Foundations
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
EEET 5101 Information Theory Chapter 1
§4 Continuous source and Gaussian channel
1 Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book.
Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
Problem Introduction Chow’s Problem Solution Example Proof of correctness.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
Computer Vision – Compression(1) Hanyang University Jong-Il Park.
1 Information Theory Nathanael Paul Oct. 09, 2002.
Information & Communication INST 4200 David J Stucki Spring 2015.
Information Theory Basics What is information theory? A way to quantify information A lot of the theory comes from two worlds Channel.
Coding Theory Efficient and Reliable Transfer of Information
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Essential CS & Statistics (Lecture for CS498-CXZ Algorithms in Bioinformatics) Aug. 30, 2005 ChengXiang Zhai Department of Computer Science University.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Mutual Information, Joint Entropy & Conditional Entropy
Oliver Schulte Machine Learning 726 Decision Tree Classifiers.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory II AI-lab
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Statistical methods in NLP Course 2
Essential Probability & Statistics
Shannon Entropy Shannon worked at Bell Labs (part of AT&T)
Learning Tree Structures
Introduction to Information theory
Mathematical Foundations
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Quantum Information Theory Introduction
LECTURE 23: INFORMATION THEORY REVIEW
Probabilistic reasoning
Lecture 2: Basic Information Theory
Presentation transcript:

Basic Concepts in Information Theory ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

Background on Information Theory Developed by Claude Shannon in the 1940s Maximizing the amount of information that can be transmitted over an imperfect communication channel Data compression (entropy) Transmission rate (channel capacity) Shannon was interested in the problem of Shannon wanted to determine theoretical maxima for !) Claude E. Shannon: A Mathematical Theory of Communication, Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, 1948

Basic Concepts in Information Theory Entropy: Measuring uncertainty of a random variable Kullback-Leibler divergence: comparing two distributions Mutual Information: measuring the correlation of two random variables

Entropy: Motivation Feature selection: Text compression: If we use only a few words to classify docs, what kind of words should we use? P(Topic| “computer”=1) vs p(Topic | “the”=1): which is more random? Text compression: Some documents (less random) can be compressed more than others (more random) Can we quantify the “compressibility”? In general, given a random variable X following distribution p(X), How do we measure the “randomness” of X? How do we design optimal coding for X?

Entropy H(X) measures the uncertainty/randomness of random variable X Entropy: Definition Entropy H(X) measures the uncertainty/randomness of random variable X Example: H(X) P(Head) 1.0

Entropy: Properties Minimum value of H(X): 0 What kind of X has the minimum entropy? Maximum value of H(X): log M, where M is the number of possible values for X What kind of X has the maximum entropy? Related to coding

Interpretations of H(X) Measures the “amount of information” in X Think of each value of X as a “message” Think of X as a random experiment (20 questions) Minimum average number of bits to compress values of X The more random X is, the harder to compress A fair coin has the maximum information, and is hardest to compress A biased coin has some information, and can be compressed to <1 bit on average A completely biased coin has no information, and needs only 0 bit

Conditional Entropy The conditional entropy of a random variable Y given another X, expresses how much extra information one still needs to supply on average to communicate Y given that the other party knows X H(Topic| “computer”) vs. H(Topic | “the”)?

Intuitively, H(p,q)  H(p), and mathematically, Cross Entropy H(p,q) What if we encode X with a code optimized for a wrong distribution q? Expected # of bits=? Intuitively, H(p,q)  H(p), and mathematically,

Kullback-Leibler Divergence D(p||q) What if we encode X with a code optimized for a wrong distribution q? How many bits would we waste? Properties: - D(p||q)0 - D(p||q)D(q||p) - D(p||q)=0 iff p=q Relative entropy KL-divergence is often used to measure the distance between two distributions Interpretation: Fix p, D(p||q) and H(p,q) vary in the same way If p is an empirical distribution, minimize D(p||q) or H(p,q) is equivalent to maximizing likelihood

Cross Entropy, KL-Div, and Likelihood Example: X {“H”,“T”} Y=(HHTTH) Equivalent criteria for selecting/evaluating a model Perplexity(p)

Mutual Information I(X;Y) Comparing two distributions: p(x,y) vs p(x)p(y) Properties: I(X;Y)0; I(X;Y)=I(Y;X); I(X;Y)=0 iff X & Y are independent Interpretations: - Measures how much reduction in uncertainty of X given info. about Y - Measures correlation between X and Y - Related to the “channel capacity” in information theory Examples: I(Topic; “computer”) vs. I(Topic; “the”)? I(“computer”, “program”) vs I(“computer”, “baseball”)?

What You Should Know Information theory concepts: entropy, cross entropy, relative entropy, conditional entropy, KL-div., mutual information Know their definitions, how to compute them Know how to interpret them Know their relationships