Cosmological Model Selection David Parkinson (with Andrew Liddle & Pia Mukherjee)

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Observational constraints on primordial perturbations Antony Lewis CITA, Toronto
Observational constraints and cosmological parameters
Primordial perturbations and precision cosmology from the Cosmic Microwave Background Antony Lewis CITA, University of Toronto
When can the Planck satellite measure spectral index running? astro-ph/ & astro-ph/ Cédric Pahud, Andrew R. Liddle, Pia Mukherjee and David.
Pattern Recognition and Machine Learning
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Chapter 4: Linear Models for Classification
What is Statistical Modeling
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Non-linear matter power spectrum to 1% accuracy between dynamical dark energy models Matt Francis University of Sydney Geraint Lewis (University of Sydney)
COSMOLOGY AS A TOOL FOR PARTICLE PHYSICS Roberto Trotta University of Oxford Astrophysics & Royal Astronomical Society.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Machine Learning CMPT 726 Simon Fraser University
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Theory of Decision Time Dynamics, with Applications to Memory.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Machine Learning Queens College Lecture 3: Probability and Statistics.
Weak Lensing 3 Tom Kitching. Introduction Scope of the lecture Power Spectra of weak lensing Statistics.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Statistical Decision Theory
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone.
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Sample variance and sample error We learned recently how to determine the sample variance using the sample mean. How do we translate this to an unbiased.
Randomized Algorithms for Bayesian Hierarchical Clustering
Observational constraints and cosmological parameters Antony Lewis Institute of Astronomy, Cambridge
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
3rd International Workshop on Dark Matter, Dark Energy and Matter-Antimatter Asymmetry NTHU & NTU, Dec 27—31, 2012 Likelihood of the Matter Power Spectrum.
Sampling and estimation Petter Mostad
Selecting a mass function by way of the Bayesian Razor Darell Moodley (UKZN), Dr. Kavilan Moodley (UKZN), Dr. Carolyn Sealfon (WCU)
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Dark Conclusions John Peacock Dark Energy X 10 STScI, May          
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Model Comparison.
Lecture 1.31 Criteria for optimal reception of radio signals.
Probability Theory and Parameter Estimation I
ICS 280 Learning in Graphical Models
Maximum Likelihood Estimation
Special Topics In Scientific Computing
Review of Probability and Estimators Arun Das, Jason Rebello
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
BAYESIAN MODEL SELECTION Royal Astronomical Society
PSY 626: Bayesian Statistics for Psychological Science
Parametric Methods Berlin Chen, 2005 References:
Bayesian inference J. Daunizeau
Mathematical Foundations of BME
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
CS639: Data Management for Data Science
Bayesian Model Selection and Averaging
Applied Statistics and Probability for Engineers
Presentation transcript:

Cosmological Model Selection David Parkinson (with Andrew Liddle & Pia Mukherjee)

Outline The Evidence: the Bayesian model selection statistic Methods Nested Sampling Results

Concordance Cosmology A flat universe composed of baryons, cold dark matter and dark energy Gaussian, adiabatic and nearly scale invariant initial perturbations

Model Extensions Do we really need only 5 numbers to describe the universe (  b,  CDM, H 0, A s,  )? Extra dynamic properties: curvature (  k ), massive neutrinos (M ), dynamic dark energy (w(z)) etc.. More complex initial conditions: tilt (n s ) and running (n run ) of the adiabatic power spectrum, entropy perturbations etc.. How do we decide if these extensions are justified?

Bayesian Statistics Two identical urns A and B A contains 99 black balls and 1 white; B has 99 white and 1 black P(black|urn A) = 0.99 Now shuffle the two urns, and pull out a ball from it. Suppose it is black. What is the probability it is from urn A? Bayesian statistics allows probabilities not just of data, but also parameters and models

Bayes’ Theorem Bayes’ theorem gives the posterior probability of the parameters (  ) of a model (H) given data (D) Marginalizing over  the evidence is Evidence = average likelihood of the data over the prior parameter space of the model

Jeffrey’s Scale The evidence (or model likelihood) updates the prior model probability through Bayes’ theorem to give the posterior probability of the model. The ratio of two model posteriors is known as the Bayes’ factor: Jeffrey’s scale 0 < Log B 10 < 1No evidence 1 < Log B 10 < 2.5Weak evidence 2.5 < Log B 10 < 5Strong evidence Log B 10 >5Decisive evidence

Occam’s Razor Models are rewarded for fitting the data well, and also their predictive-ness Best fit likelihood Occam Factor

Lindley’s Paradox Consider three data sets, measuring . By sampling statistics, all three rule out  =  0 (the simpler model) at 95% confidence. But B 01 =0.5, 1.8, 18 resp. in favor of the more complex model as the data improves. Trotta 2007

Methods The Laplace Approximation –Assumes that the P(  |D,M) is a multi-dimensional Gaussian The Savage-Dickey Density Ratio –Needs separable priors and nested model –(and the reference value to be in the high likelihood region of the more complex model for accuracy) Thermodynamic Integration –Needs a series of MCMC’s at different temperatures –Accurate but computationally very intensive VEGAS –Likelihood surface needs to be “not too far” from Gaussian

Nested Sampling The prior mass is sampled uniformly The evidence is incremented using minimum likelihood point Discarding this point reduces X by a known factor A new random point is found with L > the previous minimum likelihood Nested Sampling (Skilling 2004/5) performs the integral using Monte-Carlo samples to trace the variation in likelihood with prior mass (X), and peeling away thin nested iso- surfaces of equal likelihood.

Nested Sampling Each iteration reduces X by a factor N/(N+1) (on average), where this factor is the expectation value of the largest of N sampled from U(0,1). The N ‘live’ points migrate to the high likelihood regions, always sampling uniformly from the remaining prior volume (X).

Movie

Stopping Criterion We stop when some accuracy criterion is met on the sum of the accumulated evidence from discarded points, and the evidence from the remaining points. Calculation proceeds in this direction TOTAL Numerical uncertainty is dominated by the Poisson variability in the number of steps to reach the posterior where is the logarithm of the compression ratio.

Posterior Samples The nested sampling algorithm also generates a set of posterior samples for parameter estimation:

WMAP alone cannot distinguish between HZ and a tilted (n s ) model Some evidence for n s ≠1 from WMAP3+extra, but only at odds of 8:1. Inflation predicts both scalar (n s ) and tensor (r) perturbations. HZ is preferred, unless a log prior is used on r. Applications: WMAP3 DatasetsModelln B 01 WMAP3 onlyn s ( )0.34 ± 0.26 WMAP3+extn s ( )1.99 ± 0.26 n s +r (uni: 0-1) ± 0.45 n s +r (log: ) 1.90 ± 0.24

Dark Energy Models Models  ln E Prob I :  CDM0.0 63% II: -1≤ w ≤ ± % III: -2 ≤ w ≤ ± % IV: -2 ≤ w 0 ≤ -0.33, ≤w a ≤ ± 0.1 9% V: -1 ≤w(a) ≤1-4.1 ± 0.1 1% Or 78%, 21% and 1% for models I, II & V Liddle, Mukherjee, Parkinson & Wang 2006 w(a) = w 0 + (1-a)w a for z 0 to 2

Conclusions Model selection (via Bayesian evidences) and parameter estimation are two levels of inference. The nested sampling scheme computes evidences accurately and efficiently; also gives parameter posteriors ( Applications - simple models still favoured –model selection based forecasting –Bayesian model averaging –many others… Foreground contamination, cosmic topology, cosmic strings…