Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSI 661 - Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.

Similar presentations


Presentation on theme: "CSI 661 - Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability."— Presentation transcript:

1 CSI 661 - Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability distribution over a set of events Other interpretations of entropy –Expected number of yes/no questions to remove uncertainty when behaving optimally/rationally. Exploring the connection that learning is compression. –So only covering lossless compression in a noise free environment.

2 CSI 661 - Uncertainty in A.I. Lecture 202 Coding for Compression Message : “ABBCABCB” P(A) = ¼, P(B) = ½, P(C) = ¼ How should we optimally encode the message given the above? –Use non-ambiguous set of code “words” (prefix code) –When using a binary encoding scheme Length of code to represent each symbol/event corresponds to how many yes/no questions we would need to ask to remove uncertainty of its identity. Why? –If using n-ary code, allowed to ask questions with n outcomes. Fortunately we know the formulae for H[X], set code length to be approximately equal to –log k (P(x))

3 CSI 661 - Uncertainty in A.I. Lecture 203 Entropies of Two Variables Joint Entropy Conditional Entropy Chain Rule Mutual Information

4 CSI 661 - Uncertainty in A.I. Lecture 204 Parts of the Theory of Compression To obtain efficient data compression –Problem context: What are we trying to compress –Possible data compression schemes –Measurement of how good the compression scheme is. Note, once again, average case behavior, not worst-case behavior. Formalizing the process

5 CSI 661 - Uncertainty in A.I. Lecture 205 Creating Decodable Codes Instantly decodable Bounded delay Uniquely decodable

6 CSI 661 - Uncertainty in A.I. Lecture 206 Kraft and McMillan’s Inequalities There exists an instantaneous r-ary code with code-words of length l 1 …l m iff There exists an uniquely decodable r-ary code with code-words of length l 1 …l m iff Combine the two inequalities means? This implies …

7 CSI 661 - Uncertainty in A.I. Lecture 207 Building an Instantaneous Prefix Code Tree Assume the Kraft inequality holds Order lengths according to l 1  …  l m Loop through symbols from shortest to longest –Allocate a node at depth l i that isn’t in a previously used subtree. –Tag all nodes “above” the selected node as unusable. Proof why this construction is always possible.

8 CSI 661 - Uncertainty in A.I. Lecture 208 Implication The Kraft-McMillan inequality and prefix code tree construction implies what of code-word lengths? Average code word length, average coded message length. For a given source, there maybe multiple optimal/non-optimal codes Optimal code is the minimum average code length KM Inequality implies if there is an optimal code then there is an optimal instantaneous code

9 CSI 661 - Uncertainty in A.I. Lecture 209 Constructing Various Codes Shannon-Fano code Huffman code –Conditions for optimality Arithmetic coding

10 CSI 661 - Uncertainty in A.I. Lecture 2010 Efficient Coding So given a string or body of data, we can optimally compress it. What does optimally really mean? How can we improve on optimality! When considering learning, why don’t we want to overly compress the data. How can we overcome this.


Download ppt "CSI 661 - Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability."

Similar presentations


Ads by Google