Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science.

Slides:



Advertisements
Similar presentations
Rick Quax Postdoctoral researcher Computational Science University of Amsterdam EU FP7 projects: Nudge control through information processing.
Advertisements

Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller.
1 exercise in the previous class Determine the stationary probabilities. Compute the probability that 010 is produced. A BC 0/0.4 0/0.5 1/0.6 0/0.81/0.5.
Chain Rules for Entropy
Protein- Cytokine network reconstruction using information theory-based analysis Farzaneh Farhangmehr UCSD Presentation#3 July 25, 2011.
Chapter 6 Information Theory
An intuitive introduction to information theory Ivo Grosse Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre.
Background Knowledge Brief Review on Counting,Counting, Probability,Probability, Statistics,Statistics, I. TheoryI. Theory.
Fundamental limits in Information Theory Chapter 10 :
Today: Entropy Information Theory. Claude Shannon Ph.D
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Distributed Source Coding 教師 : 楊士萱 老師 學生 : 李桐照. Talk OutLine Introduction of DSCIntroduction of DSC Introduction of SWCQIntroduction of SWCQ ConclusionConclusion.
June 1, 2004Computer Security: Art and Science © Matt Bishop Slide #32-1 Chapter 32: Entropy and Uncertainty Conditional, joint probability Entropy.
Molecular Information Theory Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky.
Noise, Information Theory, and Entropy
Quantum Shannon Theory Patrick Hayden (McGill) 17 July 2005, Q-Logic Meets Q-Info.
X= {x 0, x 1,….,x J-1 } Y= {y 0, y 1, ….,y K-1 } Channel Finite set of input (X= {x 0, x 1,….,x J-1 }), and output (Y= {y 0, y 1,….,y K-1 }) alphabet.
Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information.
Noise, Information Theory, and Entropy
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Basic Concepts in Information Theory
Some basic concepts of Information Theory and Entropy
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
§1 Entropy and mutual information
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
§4 Continuous source and Gaussian channel
Discrete Mathematics and Its Applications.  The English mathematician George Boole ( ) sought to give symbolic form to Aristotle's system of.
Channel Capacity.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Teacher Resources  Chapter 4 Color Teaching Transparency —Ch 4.2Ch 4.2  Laboratory Black line Masters Laboratory Black line Masters  Electronic Book.
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
1 CHAPTER 2 Decision Making, Systems, Modeling, and Support.
1 Information Theory Nathanael Paul Oct. 09, 2002.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Information Theory Basics What is information theory? A way to quantify information A lot of the theory comes from two worlds Channel.
Coding Theory Efficient and Reliable Transfer of Information
1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
Presented by Minkoo Seo March, 2006
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Lecture 3 Appendix 1 Computation of the conditional entropy.
Mutual Information, Joint Entropy & Conditional Entropy
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory II AI-lab
Computational Linguistics Seminar LING-696G Week 5.
This file contains figures from the book: Information Theory A Tutorial Introduction by Dr James V Stone 2015 Sebtel Press. Copyright JV Stone. These.
Science and Engineering Practices K–2 Condensed Practices3–5 Condensed Practices6–8 Condensed Practices9–12 Condensed Practices Developing and Using Models.
Relative complexity measures See also: R. Badii, A. Politi. Complexity. Cambridge University Press
Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Outline Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University.
Chapter 4: Information Theory. Learning Objectives LO 4.1 – Understand discrete and continuous messages, message sources, amount of information and its.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Introduction to Information theory
3.3 Working with Equations
Digital Multimedia Coding
Subject Name: Information Theory Coding Subject Code: 10EC55
Tim Holliday Peter Glynn Andrea Goldsmith Stanford University
Watermarking with Side Information
Presentation transcript:

Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science

Contents Information theoretic measures and their calculation Features used in the data survey Cases

Good references Claude E. Shannon and Warren Weawer: The Mathematical Theory of Communication Thomas M. Cover and Joy A. Thomas: Elements of Information Theory David J.C. MacKay: Information Theory, Probability and Neural Networks

Entropy Measure of information content or ”uncertainty”: H(x) ≥ 0, with equality iff p i =1 for one i max H(x), when p i is same for every i

Calculating entropy

BIN Y P(Y) MEASUREACTUALNORMALIZED H max (X)=H max (Y) H(x) H(y) BIN X P(X)

Joint and conditional entropies and mutual information Joint entropy H(X,Y) describes information content of the whole data Conditional entropy H(X|Y) measures the average uncertainty that remains about x when y is known Mutual information I(X;Y)=H(X)-H(X|Y) measures the amount of information that y conveys about x, or vice versa

Calculating conditional entropy BIN P(y) P(x|y) BIN P(x) P(y|x) MEASUREACTUALNORMALIZED H(X,Y) H(X|Y) H(Y|X) I(X;Y)

Relationships of entropies H(X,Y) H(X) H(Y) H(X|Y)I(X;Y)H(Y|X)

Features Entropies calculated from raw input and output signal states Signal H(X), H(Y): Indicates how much entropy there is in one data set input/output signal without regard to the output/input signal(s), ratio: sH/sH max

Features Channel H(X),H(Y): Measures the average information per signal at the input/output of the communication channel, ratio: cH/sH max Channel H(X|Y),H(Y|X): Reverse/forward entropy measures how much information is known about the input/output given the output/input, ratio: cH(|)/sH max

Features Channel H(X,Y): The average uncertainty over the data set as whole, ratio: cH(X,Y)/cH(X)+cH(Y) Channel I(X;Y): The amount of mutual information between input and output, ratio: cI(X,Y)/cH(Y)

Case 1: CARS 8 variables about different car properties (brand, weight, cubic inch size, production year etc.) Three subtasks: predicting origin, brand and weigth

Case 1: CARS

Entropic analysis confirmed a number of intuitions about the data that would be difficult to obtain by other means Only a simple model is needed

Case 1: CARS

Requires a complex model and still the prediction can’t be done with complete certainty Different brands have different levels of certainty

Case 1: CARS

Some form of generalized model has to be built The survey provides the information needed for designing the model

Case 2: CREDIT Included information from a credit card survey Objective was to build an effective credit card solicitation program

Case 2: CREDIT

It was possible determine that a model good enough to solve the problem could be built This model should be rather complex, even with the balanced data set

Case 3: SHOE Data was about the behaviour of buyers of a running shoe manifacturer Objective was to predict and target customers who fit the profile as potential members in their buyers program

Case 3: SHOE

A moderately good, but quite complex, model could be built Not useful predictor in the real-world, because of the frequently introduced new shoe styles