Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science
Contents Information theoretic measures and their calculation Features used in the data survey Cases
Good references Claude E. Shannon and Warren Weawer: The Mathematical Theory of Communication Thomas M. Cover and Joy A. Thomas: Elements of Information Theory David J.C. MacKay: Information Theory, Probability and Neural Networks
Entropy Measure of information content or ”uncertainty”: H(x) ≥ 0, with equality iff p i =1 for one i max H(x), when p i is same for every i
Calculating entropy
BIN Y P(Y) MEASUREACTUALNORMALIZED H max (X)=H max (Y) H(x) H(y) BIN X P(X)
Joint and conditional entropies and mutual information Joint entropy H(X,Y) describes information content of the whole data Conditional entropy H(X|Y) measures the average uncertainty that remains about x when y is known Mutual information I(X;Y)=H(X)-H(X|Y) measures the amount of information that y conveys about x, or vice versa
Calculating conditional entropy BIN P(y) P(x|y) BIN P(x) P(y|x) MEASUREACTUALNORMALIZED H(X,Y) H(X|Y) H(Y|X) I(X;Y)
Relationships of entropies H(X,Y) H(X) H(Y) H(X|Y)I(X;Y)H(Y|X)
Features Entropies calculated from raw input and output signal states Signal H(X), H(Y): Indicates how much entropy there is in one data set input/output signal without regard to the output/input signal(s), ratio: sH/sH max
Features Channel H(X),H(Y): Measures the average information per signal at the input/output of the communication channel, ratio: cH/sH max Channel H(X|Y),H(Y|X): Reverse/forward entropy measures how much information is known about the input/output given the output/input, ratio: cH(|)/sH max
Features Channel H(X,Y): The average uncertainty over the data set as whole, ratio: cH(X,Y)/cH(X)+cH(Y) Channel I(X;Y): The amount of mutual information between input and output, ratio: cI(X,Y)/cH(Y)
Case 1: CARS 8 variables about different car properties (brand, weight, cubic inch size, production year etc.) Three subtasks: predicting origin, brand and weigth
Case 1: CARS
Entropic analysis confirmed a number of intuitions about the data that would be difficult to obtain by other means Only a simple model is needed
Case 1: CARS
Requires a complex model and still the prediction can’t be done with complete certainty Different brands have different levels of certainty
Case 1: CARS
Some form of generalized model has to be built The survey provides the information needed for designing the model
Case 2: CREDIT Included information from a credit card survey Objective was to build an effective credit card solicitation program
Case 2: CREDIT
It was possible determine that a model good enough to solve the problem could be built This model should be rather complex, even with the balanced data set
Case 3: SHOE Data was about the behaviour of buyers of a running shoe manifacturer Objective was to predict and target customers who fit the profile as potential members in their buyers program
Case 3: SHOE
A moderately good, but quite complex, model could be built Not useful predictor in the real-world, because of the frequently introduced new shoe styles