Information & Entropy. Shannon Information Axioms Small probability events should have more information than large probabilities. – “the nice person”

Slides:



Advertisements
Similar presentations
Applied Algorithmics - week7
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Random Variables ECE460 Spring, 2012.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Probability Theory Part 1: Basic Concepts. Sample Space - Events  Sample Point The outcome of a random experiment  Sample Space S The set of all possible.
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Flipping an unfair coin three times Consider the unfair coin with P(H) = 1/3 and P(T) = 2/3. If we flip this coin three times, the sample space S is the.
Some Common Binary Signaling Formats: NRZ RZ NRZ-B AMI Manchester.
CSC-2259 Discrete Structures
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
Chapter 1 Probability Theory (i) : One Random Variable
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Prof. Bart Selman Module Probability --- Part d)
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
Engineering Probability and Statistics - SE-205 -Chap 3 By S. O. Duffuaa.
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
Molecular Information Theory Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Special discrete distributions Sec
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Class notes for ISE 201 San Jose State University
Review of important distributions Another randomized algorithm
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Chapter 4. Continuous Probability Distributions
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
Information and Coding Theory
2. Mathematical Foundations
1 Bernoulli trial and binomial distribution Bernoulli trialBinomial distribution x (# H) 01 P(x)P(x)P(x)P(x)(1 – p)p ?
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Independence and Bernoulli.
Random Sampling, Point Estimation and Maximum Likelihood.
Mathematical Preliminaries. 37 Matrix Theory Vectors nth element of vector u : u(n) Matrix mth row and nth column of A : a(m,n) column vector.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
Discrete Probability CSC-2259 Discrete Structures Konstantin Busch - LSU1.
COMP 170 L2 L17: Random Variables and Expectation Page 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Mean, Variance, Moments and.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Basics on Probability Jingrui He 09/11/2007. Coin Flips  You flip a coin Head with probability 0.5  You flip 100 coins How many heads would you expect.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
Expected values of discrete Random Variables. The function that maps S into S X in R and which is denoted by X(.) is called a random variable. The name.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Probability Generating Functions Suppose a RV X can take on values in the set of non-negative integers: {0, 1, 2, …}. Definition: The probability generating.
3/7/20161 Now it’s time to look at… Discrete Probability.
2.2 Discrete Random Variables 2.2 Discrete random variables Definition 2.2 –P27 Definition 2.3 –P27.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
3.1 Discrete Random Variables Present the analysis of several random experiments Discuss several discrete random variables that frequently arise in applications.
Now it’s time to look at…
ICS 253: Discrete Structures I
Random Variables.
Conditional Probability
Binomial Distribution
The Bernoulli distribution
A Brief Introduction to Information Theory
STATISTICAL MODELS.
Now it’s time to look at…
Entropy CSCI284/162 Spring 2009 GWU.
Bernoulli Trials Two Possible Outcomes Trials are independent.
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Data Exploration and Pattern Recognition © R. El-Yaniv
Discrete Random Variables: Basics
Presentation transcript:

Information & Entropy

Shannon Information Axioms Small probability events should have more information than large probabilities. – “the nice person” (common words  lower info) – “philanthropist” (less used  more information) Information from two disjoint events should add – “engineer”  Information I 1 – “stuttering”  Information I 2 – “stuttering engineer”  Information I 1 + I 2

Shannon Information p I

Information Units log 2 – bits log e – naps log 10 – ban or a hartley Ralph Vinton Lyon Hartley ( ) inventor of the electronic oscillator circuit that bears his name, a pioneer in the field of Information Theory

Illustration Q: We flip a coin 10 times. What is the probability we come up the sequence ? Answer How much information do we have?

Illustration: 20 Questions Interval halving: Need 4 bits of information

Entropy Bernoulli trial with parameter p Information from a success = Information from a failure = (Weighted) Average Information Average Information = Entropy

The Binary Entropy Function p

Entropy Definition =average Information

Entropy of a Uniform Distribution

Entropy as an Expected Value where

Entropy of a Geometric RV then H = 2 bits when p =0.5

Relative Entropy

Relative Entropy Property Equality iff p=q

Relative Entropy Property Proof Since

Uniform Probability is Maximum Entropy Relative to uniform: Thus, for K fixed, How does this relate to thermodynamic entropy?

Entropy as an Information Measure: Like 20 Questions 16 Balls Bill Chooses One You must find which ball with binary questions. Minimize the expected number of questions.

One Method yes no yes no yes no yes no yes no yes no 7

Another (Better) Method... yes no yes no yes no Longer paths have smaller probabilities

yes no yes no yes no

Relation to Entropy... The Problem’s Entropy is

Principle... The expected number of questions will equal or exceed the entropy. There can be equality only if all probabilities are powers of ½

Principle Proof Lemma: If there are k solutions and the length of the path to the k th solution is, then

Principle Proof = the relative entropy with respect to Since the relative entropy always is nonnegative...