1 Empirical Similarity and Objective Probabilities Joint works of subsets of A. Billot, G. Gayer, I. Gilboa, O. Lieberman, A. Postlewaite, D. Samet, D.

Slides:



Advertisements
Similar presentations
Reasons for (prior) belief in Bayesian epistemology
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Brief introduction on Logistic Regression
Regularization David Kauchak CS 451 – Fall 2013.
Mean, Proportion, CLT Bootstrap
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 4: Reasoning Under Uncertainty
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
EC941 - Game Theory Prof. Francesco Squintani Lecture 8 1.
Chapter 4: Linear Models for Classification
Parameter Estimation using likelihood functions Tutorial #1
Certainty Equivalent and Stochastic Preferences June 2006 FUR 2006, Rome Pavlo Blavatskyy Wolfgang Köhler IEW, University of Zürich.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
1. Introduction Consistency of learning processes To explain when a learning machine that minimizes empirical risk can achieve a small value of actual.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
CS 589 Information Risk Management 6 February 2007.
Dutch books and epistemic events Jan-Willem Romeijn Psychological Methods University of Amsterdam ILLC 2005 Interfacing Probabilistic and Epistemic Update.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
Maximum likelihood (ML) and likelihood ratio (LR) test
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Presenting: Assaf Tzabari
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
© John M. Abowd 2005, all rights reserved Statistical Tools for Data Integration John M. Abowd April 2005.
Statistical Background
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Inference about a Mean Part II
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Maximum likelihood (ML)
Probability and Statistics Review Thursday Sep 11.
Logistic regression for binary response variables.
Basic Concepts and Approaches
PATTERN RECOGNITION AND MACHINE LEARNING
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
LECTURE 14 TUESDAY, 13 OCTOBER STA 291 Fall
Basic Concepts of Probability Coach Bridges NOTES.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Reasoning Under Uncertainty. 2 Objectives Learn the meaning of uncertainty and explore some theories designed to deal with it Find out what types of errors.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
- 1 - Outline Introduction to the Bayesian theory –Bayesian Probability –Bayes’ Rule –Bayesian Inference –Historical Note Coin trials example Bayes rule.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
1 Probability- Basic Concepts and Approaches Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Chapter 14, continued More simple linear regression Download this presentation.
Updating Probabilities Ariel Caticha and Adom Giffin Department of Physics University at Albany - SUNY MaxEnt 2006.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
AP Statistics From Randomness to Probability Chapter 14.
Probability and statistics - overview Introduction Basics of probability theory Events, probability, different types of probability Random variable, probability.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Oliver Schulte Machine Learning 726
Reasoning Under Uncertainty in Expert System
Bayes Net Learning: Bayesian Approaches
Presentation transcript:

1 Empirical Similarity and Objective Probabilities Joint works of subsets of A. Billot, G. Gayer, I. Gilboa, O. Lieberman, A. Postlewaite, D. Samet, D. Schmeidler

2 What is the probability that… This coin will come up Head? My car will be stolen tonight? I will survive the operation? War will erupt over the next year?

3 Methods for assigning probabilities The “classical” – Laplace’s Principle of Insufficient Reason “Objective” – empirical frequencies “Subjective” – degree of belief - Observe that the first two rely on a primitive notion of similarity

4 The subjective approach Beautiful and axiomatically-based Problems: –In many situations, preferences are not complete until probabilities are assessed. –Says nothing about the formation of beliefs and allows for beliefs we would consider ridiculous. (Bayes’s updating only aggravates the problem) –In “Rationality of Belief”, and “Is It Always Rational to Satify Savage’s Axioms?” w/ Postlewaite and Schmeidler, we argue that the Savage axioms are neither necessary not sufficient for rationality

5 Our goal To extend the definition of empirical frequencies so that they cover a larger domain of applications To retain the claim to objectivity By doing this we hope to get “objective probabilities” in more applications, but by no means in all!

6 Similarity-weighted frequencies – Set-up The data: where and We are asked about the probability that for a new data point

7 Similarity-weighted frequencies – Formula Choose a similarity function Given observations and a new data point estimate by

8 Similarity-weighted frequencies – Interpretation Special cases of –If is constant: an estimate of the expectation (in fact, “repeated experiment” is always a matter of subjective judgment of equal similarity) –If : an estimate of the conditional expectation Useful when precise updating leaves us with a sparse database Akin to interpolation But not to extrapolation!

9 Axiomatization – Setup observations (case types) A database is a multi-set of observations We will refer to a database as a sequence or a multi-set interchangeably.

10 Axiomatization I: Observables A state space Fix a new data point Databases A probability assignment function

11 database J database I + J Δ(Ω)Δ(Ω) The combination axiom database I 12...m12...m case types M p(I)p(I) p(J)p(J) States of the world Ω = {1,2,3,…,s} p(I + J).

12 The combination axiom Formally for some

13 Theorem I The combination axiom holds, and not all are collinear if and only if For each there are, not all collinear, and such that –In “Probabilities as Similarity-Weighted Frequencies” w/ Billot, Samet, Schmeidler

14 The perspective

15 For case Δ(Ω)Δ(Ω)... p1p1 p2p2 p3p3 Probability of states Probability = Frequency in perspective Frequency of cases s1p1s1p1 s3p3s3p3 s2p2s2p I. F = (F 1, F 2, F 3 ) For case 1 For case 3. F 1 s 1 p 1 + F 2 s 2 p 2 + F 3 s 3 p 3. p(F) = p(I)

16 What about a single dimension? The perspective only works for Evidently, probability is also interesting with two states

17 Axiomatization II – Observables Fix a new datapoint For each database, we assume a binary relation on ( ) is interpreted as “given database, and the new datapoint, is a more likely estimate of the probability than is ”

18 Axioms Weak order:is complete and transitive Combination: imply and Archimedean: implies s.t.

19 Axioms – cont. Averaging: if all are constant over then ranks values by their proximity to the empirical frequency

20 Theorem II The axioms hold iff there exists a function such that ranks values by their proximity to where and The function is unique up to multiplication by In “Empirical Similarity” w/Lieberman and Schmeidler

21 Exponential similarity – Axiomatization Generic notation: any component of the vector – hence a similarity-weighted average Shift: Ray Monotonicity: decreases in

22 Exponential similarity – Axiomatization (cont.) Symmetry: Ray Invariance: Self-Relevance:

23 Theorem III The axioms hold iff there exists a norm such that Satisfies “multiplicative transitivity”: In “Exponential Similarity” w/ Billot and Schmeidler

24 The Similarity – whence? –In “Empirical Similarity” w/Lieberman and Schmeidler we propose an empirical approach: –Estimate the similarity function from the data –A parametrized approach: Consider a certain functional form –Choose a criterion to measure goodness of fit –Find the best parameters

25 A functional form Consider a weighted Euclidean distance and

26 Selection criteria Find weights that would minimize Or: round off to get a prediction –and then minimize

27 How objective is it? Modeling choices that can affect the “probability”: –Choice of X’s and of sample –Choice of functional form –Choice of goodness of fit criterion As usual, objectivity may be an unattainable ideal But it doesn’t mean we shouldn’t try.

28 Statistical inference –In “Empirical Similarity” w/Lieberman and Schmeidler we also develop statistical inference tools for our estimation procedure –Assume that the data were generated by a DGP of the type –Estimate the similarity function from the data –Perform statistical inference

29 Statistical inference – cont. Estimate the weights by maximum likelihood Test hypotheses of the form Predict out-of-sample by the maximum likelihood estimators (via the similarity- weighted average formula)

30 Failures of the combination axiom Integration of induction and deduction –Learning the parameter of a coin –Linear regression Limited to case-to-case induction, generalizing empirical frequencies

31 Failures of the combination axiom – cont. Second order induction –Learning the similarity function In particular, doesn’t allow the similarity function to get more concentrated for large databases Combination restricted to periods of “no learning”.

32 Future Directions Integrate empirical similarity with: Bayesian networks – to capture Bayesian reasoning such as a chain of conditional probabilities. Logistics regression – to allow the identification of trends.

33 How close is rationality to objectivity? Rationality – behaving in a way that doesn’t lead to regret or embarrassment when faced with analysis of own choices. Objectivity – has to do with the ability to convince others. Both “rational” and “objective” have to do with reasoning and convincing.