Legacy of Ed Jaynes -- approaches to uncertainty management. Stefan Arnborg, KTH.

Legacy of Ed Jaynes -- approaches to uncertainty management. Stefan Arnborg, KTH

Applications of Uncertainty Medical Imaging/Research (Schizophrenia) Land Use Planning Environmental Surveillance and Prediction Finance and Stock Marketing into Google Robot Navigation and Tracking Security and Military Performance Tuning

Support transformation of tasks and solutions in a generic fashion Integrate different command levels and services in a dynamic organization Facilitate consistent situation awareness Project Aims

Particle filter- general tracking

Endsley: *Inference -> Situation awareness *Information picture *Understanding effects of actions *Understanding situation implies understanding best response * WIRED on Total Information Awareness WIRED (Dec 2, 2002) article "Total Info System Totally Touchy" discusses the Total Information Awareness system. The Total Information Awareness System and related efforts received ~~~ Quote: "People have to move and plan before committing a terrorist act. Our hypothesis is their planning process has a signature." Jan Walker, Pentagon spokeswoman, in Wired, Dec 2, 2002. "What's alarming is the danger of false positives based on incorrect data," Herb Edelstein, in Wired, Dec 2, 2002.

Sun Zi Om han upprättar ett läger på ett lättillgängligt ställe är det för att vinna andra fördelar. Om det rör sig i skogen är han på väg. Många uppsatta hinder på öppen mark betyder att fienden vill vilseleda. När fåglar lättar ligger fienden i bakhåll. Uppskrämda djur betyder att fienden är i rörelse. När dammet yr i höga och tydliga strängar är det vagnar som är på väg. När dammet ligger lågt och jämnt är det fotsoldater. När dammet är utspritt i tunna strängar samlar fienden ved. När dammet är tunt och yr kors och tvärs slår fienden läger

Sun Zi Den som känner sig själv och sin motpart genomgår hundra strider utan fara. Den som känner sig själv men inte sin motpart förlorar en strid för varje seger. Den som varken känner sig själv eller sin motpart är dömd att förlora varje strid.

Methods for Inference Visualisation: Florence Nightingale Expert-based, CSCW Probability based methods: Bayes, Hypothesis testing, Fiducial, Distribution independent methods, … Game theory: Harsanyi Bayesian Games Ad Hoc: Typically bio-inspired (how does the brain or DNA work?)

Methods for Inference All inference methods are based on assumptions The most common method to cope with uncertainty is to make assumptions --- and then to forget that they were made (Arnborg, Brynielsson, 2004), (Thunholm 1999) Death by Assumption: Why Great Planning Strategies Fail (latest Management Fad)

Visualization Visualize data in such a way that the important aspects are obvious - A good visualization strikes you as a punch between your eyes (Tukey, 1970) Pioneered by Florence Nightingale, first female member of Royal Statistical Society, inventor of pie charts and performance metrics

Probabilistic approaches Bayes: Probability conditioned by observation Cournot: An event with very small probability will not happen. Kolmogorov: A sequence is random if it cannot be compressed

Foundations for Bayesian Inference Bayes method, first documented method based on probability: Plausibility of event depends on observation, Bayes rule: Parameter and observation spaces can be extremely complex, priors and likelihoods also. MCMC current approach -- often but not always applicable (difficult when posterior has many local maxima separated by low density regions) Better than Numerics??

Spectacular application: PET-camera Camera geometry&noise film scene regularity scene (and any other camera or radar device)

Thomas Bayes, amateur mathematician If we have a probability model of the world we know how to compute probabilities of events. But is it possible to learn about the world from events we see? Bayes’ proposal was forgotten but rediscovered by Laplace.

An alternative to Bayes’ method - hypothesis testing - is based on ’Cournot’s Bridge’: an event with very small probability will not happen Antoine Augustine Cournot (1801--1877) Pioneer in stochastic processes, market theory and structural post-modernism. Predicted demise of academic system due to discourses of administration and excellence(cf Readings).

Fiducial Inference R A Fisher (1890--1962). In his paper Inverse Probability, he rejected Bayesian Analysis on grounds of its dependency on priors and scaling. He launched an alternative concept, 'fiducial analysis'. Although this concept was not developed after Fishers time, the standard definition of confidence intervals has a similar flavor. The fiducial argument was apparently the starting point for Dempster in developing evidence theory.

Kolmogorov and randomness Andrei Kolmogorov(1903-1987) is the mathematician best known for shaping probability theory into a modern axiomatized theory. His axioms of probability tells how probability measures are defined, also on infinite and infinite-dimensional event spaces and complex product spaces. Kolmogorov complexity characterizes a random string by the smallest size of a description of it. Used to explain Vovk/Gammerman scheme of hedged prediction. Also used in MDL (Minimum Description Length) inference.

Combining Bayesian and frequentist inference Posterior for parameter Generating testing set (Gelman et al, 2003)

Graphical posterior predictive model checking

Bayesian Decision Theory (Savage) Outcome R depends on uncertain  with prior f( ) and outcome a: Utility of R is u(R) Observe D with: f(D| ) Choose a maximizing expected utility, Estimating probability: Use Laplace’s estimator

Generalisation of Bayes/Kalman: What if: You have no prior? Likelihood infeasible to compute (imprecision)? Parameter space vague, i.e., not the same for all likelihoods? (Fuzziness, vagueness)? Parameter space has complex structure (a simple structure is e.g., a Cartesian product of reals, R, and some finite sets)?

Some approaches... Robust Bayes: replace distributions by convex sets of distributions (Berger m fl) Dempster/Shafer/TBM: Describe imprecision with random sets DSm: Transform parameter space to capture vagueness. (Dezert/Smarandache, controversial) FISST: FInite Set STatistics: Generalises observation- and parameter space to product of spaces described as random sets. (Goodman, Mahler, Ngyuen)

Ellsberg’s Paradox: Ambiguity Avoidance ? ? ? ? Urna A innehåller 4 vita och 4 svarta kulor, och 4 av okänd färg (svart eller vit) Urna B innehåller 6 vita och 6 svarta kulor Du får en krona om du drar en svart kula. Ur vilken urna vill du dra den? En precis Bayesian bör först anta hur ?-kulorna är färgade och sedan svara. Men en majoritet föredrar urna B även om svart byts mot vit

Hur används imprecisa sannolikheter? Förväntad nytta för beslutsalternativ blir intervall i stället för punkter: maximax, maximin, maximedel? u a pessimist optimist Bayesian

Ed Jaynes devoted a large part of his career to promote Bayesian inference. He also championed the use of Maximum Entropy in physics Outside physics, he received resistance from people who had already invented other methods. Why should statistical mechanics say anything about our daily human world??

Cox approach to Bayesianism Let A|C be the real-valued plausibility of A, given that we know C to be true. AB|C=F(A|BC,B|C), plausibility of a conjunction depends only on plausibilities of its constituents. F is strictly monotone. Introduce S(A|B) - plausibility of not A given B. Cox/Jaynes argument has flavour of (somewhat imprecise) theoretical physics Using several unstated assumptions, it is shown that plausibility can be scaled to probability, w(F(x,y))=w(x)w(y), w(S(x))=1-w(x))

Related Work Michael Hardy: Scaled Boolean Algebras Advances in Applied Mathematics, 2002 C.H. Kraft, J.H. Pratt and A. Seidenberg: Intuitive Probability on Finite Sets Ann Math Stat, 1959 (Similar outlook, heavier math, but not same conclusions)

Halpern’s Example: 4 Worlds A B C D E G H I J K L M D|E=H|J B|C = L|M A|C = I|JE|G = A|B H|J≈K|M D|G = K|LM

Example: F(F(x,y),z)≈F(x,F(y,z))   C D E G H I J K L M D|E=H|J=x B|C = L|M=z A|C = I|JE|G = A|B=y H|J≈K|M D|G = K|LM (Halpern 2000)

Refine:A’|A=D|E: INCONSISTENCY   C D E G H I J K L M D|E=H|J=x B|C = L|M=z A|C = I|JE|G = A|B=y H|J≈K|M D|G = K|LM A’ H|J=A’AB|C=K|M !!!!!!!!!!!!!

Proof structure: Rescalability=Consistnt Refinability (i)->(ii): rescaling on discrete set can be interpolated smoothly over (0,1). (ii)->(i) is trickier: assume that rescalability is impossible and show that existence of an inconsistent refinement follows. Find L such that ML=0 and DL>0

Duality explained If L such that ML=0 then not DL>0 F= {L:ML=0} DFDF DF has non-neg normal! d d1L1+…+d(n-1)L(n-1)= d1L2+…+d(n-1)Ln translates to F(a1,..,ak,c1,…,cm)=F(b1,…,bk,c1,…cm) with ai<bi -- and can be interpreted as inconsistent refinement!!

Inconsistency of Example: F(x4,x4)=F(x3,x5)=a +1 F(x2,x4)=F(x1,x5)=b -1 F(x4,x6)=F(x3,x7)=c -1 F(x2,x6)=F(x1,x8)=d +1 F(x7,q)=F(x8,q), where c Linear system turns out non-solvable; from dual solution we obtain c: q=F(x1,F(x2,F(x3,F(x4,F(x4,F(x5,x6)))))) Composing equations as indicated by c yields an inconsistency: This corresponds to an inconsistent refinement consisting of 9 information-independent new cases with plausibilties x1, x2, x3, x4, x4,…,x8 relative to an existing event

Probability model Counterexample Log probability i INFINITE CASE: NON-SEPARABILITY

Finite model (finite number of events): Every consistent real ordered plausibility measure can be rescaled to probability; using duality ‘like’ Purdom-Freedman (Arnborg, Sjödin, ECCAI 2000) However, this was difficult to extend to infinite models. After several failed approaches, the reason was found: It is not possible because the needed theorem is not true; However: For any (finite, enumerable, continuos family) model its plausibility measure can be embedded in an ordered field (where conjunction and disjunction correspond to * and +) (Arnborg, Sjödin, MaxEnt 2000)

Arnborg, Sjödin ca 2001 Introduce: AB|C=F(A|C,B|AC) A+B|C=G(A|C,B-A|C) ~A|C=S(A|C) The properties of propositional logic entail that F and G satisfy the axioms for  and + of a ring! And truth and falsity ( T and  ) are 1 and 0 of an integral domain Assuming the domain ordered and  and + (strictly) increasing gives us an ordered field, because inversion of  and + is possible (unless one operand of  is  ). Standard quotient constructions (first defines negative numbers and multiplication by integer, second defines rationals) but be careful since + is a partial function! By MacLane-Birkhoff, an ordered ring can be embedded in an ordered field, and there is a minimal such embedding field (a superset of Q). If the embedding field is a subset of R, we have standard probability. If superset of R, we have extended probability. Conway, in ”Numbers and Games”, showed that there is also a maximal ordered field, No. This field contains all infinitesimals and infinite numbers.

Infinitesimal probability (Adams) If Obama wins the election, McCain will retire If McCain dies before the election, Obama will win Syllogism: If McCain dies, Obama wins and McCain retires? Solution: ‘McCain dies’ has infinitesimal probability Non-Monotonic logic in AI (McCarthy) is just infinitesimal probability!!

Cox approach to Bayesianism Let A|C be the real-valued plausibility of A, given that we know C to be true. AB|C=F(A|BC,B|C), plausibility of a conjunction depends only on plausibilities of its constituents. F is strictly monotone. Similar rule for disjunction G. Cox/Jaynes argument has flavour of (somewhat imprecise) theoretical physics With some assumptions, F and G can be shown to inherit the algebraic laws of a ring from logical ’and’ and ’or’ of logic, and the monotonicity assumptions imply that F and G are * and + of a monotone field (Körper, kropp). These assumptions entail Bayesianism (possibly with infinitesimal probability) (Arnborg, Sjödin, 2000, Cox 1946) This argument does not exclude partially ordered plausibility measures like intervals of probabilities.

Robust Bayes Priors and likelihoods are convex sets of probability distributions (Berger, de Finetti, Walley,...): imprecise probability: Every member of posterior is a ’parallell combination’ of one member of likelihood and one member of prior. For decision making: Jaynes recommends to use that member of posterior with maximum entropy (Maxent estimate).

Hur används imprecisa sannolikheter? Förväntad nytta för beslutsalternativ blir intervall i stället för punkter: maximax, maximin, maximedel? u a pessimist optimist Bayesian

Dempster/Shafer/Smets Evidence is random set over over . I.e., probability distribution over. Probability of singleton: ‘Belief’ allocated to alternative, i.e., probability. Probability of non-singelton: ‘Belief’ allocated to set of alternatives, but not to any part of it. Evidences combined by random intersection conditioned to be non-empty (Dempster’s rule).

Correspondence DS-structure -- set of probability distributions For a pdf (bba) m over 2^ , consider all ways of reallocating the probability mass of non-singletons to their member atoms: This gives a convex set of probability distributions over . Example:  ={A,B,C} A: 0.1 B: 0.3 C: 0.1 AB: 0.5 A: 0.1+0.5*x B: 0.3+0.5*(1-x) C: 0.1 Can we regard any set of pdf:s as a bba? Answer is NO!! There are more convex sets of pdf:s than DS-structures for all x  [0,1] bba set of pdfs

Representing probability set as bba: 3-element universe Rounding up: use lower envelope. Rounding down: Linear programming Rounding is not unique!! Black: convex set Blue: rounded up Red: rounded down

Another appealing conjecture Precise pdf can be regarded as (singleton) random set. Bayesian combination of precise pdf:s corresponds to random set intersection (conditioned on non-emptiness) DS-structure corresponds to Choquet capacity (set of pdf:s) Is it reasonable to combine Choquet capacities by (nonempty) random set intersection (Dempster’s rule)?? Answer is NO!! Counterexample: Dempster’s combination cannot be obtained by combining members of prior and likelihood: Arnborg: JAIF vol 1, No 1, 2006

Consistency of fusion operators DS rule MDS rule Rounded robust Operands (evidence) Robust Fusion Dempster’s rule Modified Dempster’s rule Axes are probabilities of A and B in a 3-element universe P(A) P(B) P(C )=1-P(A)-P(B)

Zadeh’s Paradoxical Example Patient has headache, possible explanations are M-- Meningitis ; C-- Concussion ; T-- Tumor. Expert 1: P( M )=0 ; P( C )=0.9 ; P( T )=0.1 Expert 2: P( M )=0.9 ; P( C )=0 ; P( T )=0.1 Parallel comb: 0 0 0.01 What is the combined conclusion? Parallel normalized: (0,0,1)? Is there a paradox??

Zadeh’s Paradox (ctd) One expert (at least) made an error Experts do not know what probability zero means Experts made correct inferences based on different observation sets, and T is indeed the correct answer: f( |o1, o2) = c f(o1| )f(o2| )f( ) but this assumes f(o1,o2 | )=f(o1| ) f(o2| ) which need not be true if granularity of  is too coarse (not taking variability of f(oi| ) into account). One reason (among several) to look at Robust Bayes.

That’s all, folks!

Legacy of Ed Jaynes -- approaches to uncertainty management. Stefan Arnborg, KTH.

Similar presentations

Presentation on theme: "Legacy of Ed Jaynes -- approaches to uncertainty management. Stefan Arnborg, KTH."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Legacy of Ed Jaynes -- approaches to uncertainty management. Stefan Arnborg, KTH.

Similar presentations

Presentation on theme: "Legacy of Ed Jaynes -- approaches to uncertainty management. Stefan Arnborg, KTH."— Presentation transcript:

Similar presentations

About project

Feedback