Presentation on theme: "Estimation, Variation and Uncertainty Simon French"— Presentation transcript:
Estimation, Variation and Uncertainty Simon French firstname.lastname@example.org
Aims of Session gain a greater understanding of the estimation of parameters and variables. gain an appreciation of point estimation. gain an appreciation of how to assess the uncertainty and confidence levels in estimates
Cynefin and statistics Repeatable events Unique events Events? Estimation and confirmatory analysis exploratory analyses
Frequentist Statistics Key point: Probability represents a long run frequency of occurrence
Frequentist Statistics Scientific Method is based upon repeatability of experiments Parameters in a (scientific) model or theory are fixed Cannot talk of the probability of a objective quantity or parameter value Data come from repeatable experiments Can talk of the probability of a data value
Measurement and Variation of Objective Quantities Ideally we simply perform an experiment and measure the quantities that interest us But variation and experimental error mean that we cannot simply do this So we need to make multiple measurements, learn about the variation and estimate the quantity of interest
Estimation Try to find a function of the data that is tightly distributed about the quantity of interest. Distribution of data data point Quantity of interest, Distribution of mean Quantity of interest, Data mean
Confidence intervals intervals defined from the data 95% confidence intervals: calculate interval for each of 100 data sets about 95 will contain .
Uncertainty But there is more uncertainty in what we do than just variation and experimental error We do our calculations in a statistical model. But the model is not the real world So there is modelling error – which covers a multitude of sins!
Uncertainty So a 95% confidence interval may represent a much greater uncertainty! Studies have shown that the uncertainty bounds given by scientists (and others!) are often overconfident by a factor of 10.
Estimation of model parameters Sometimes the quantities that we wish to estimate do not exist! Parameters may only have existence within a model –Transfer coefficients –Release height in atmospheric dispersion –Risk aversion
Why do we want estimates? [Remember our exhortations that you should be clear on your research objectives or questions.] To measure ‘something out there’ To find the parameter to use for some purpose in a model –Evaluation of systems –Prediction of some effect –May use estimate of parameters and their uncertainty to predict how a complex systems may evolve, e.g. through Monte Carlo Methods.
Independence Many estimation methods assume that each error is probabilistically independent of the other errors… and often they are far from independent. –1700 2 ‘independent’ samples –IPCC work on climate change Dependence in data changes – increases! - the uncertainty in the estimates
Rev. Thomas Bayes 1701?-1761 Main work published posthumously: T. Bayes (1763) An essay towards solving a problem in the doctrine of chances. Phil Trans Roy. Soc. 53 370-418 Bayes Theorem – inverse probability
Bayes theorem There is a constant, but ‘easy’ to find as probability adds (integrates) to one Posterior probability likelihood prior probability p( | x) p(x | ) × p( )
18 Bayes theorem Probability distribution of parameters p( ) Posterior probability likelihood prior probability p( | x) p(x | ) × p( )
19 Bayes theorem likelihood of data given parameters p(x| ) Posterior probability likelihood prior probability p( | x) p(x | ) × p( )
20 Bayes theorem Probability distribution of parameters given data p( |x) Posterior probability likelihood prior probability p( | x) p(x | ) × p( )
On the treatment of negative intensity measurements Simon French email@example.com firstname.lastname@example.org
Crystallography data Roughly, x-rays shone at a crystal diffract into many rays radiating out in a fixed pattern from the crystal. The intensities of these diffracted rays are related to the modulus of the coefficients in the Fourier expansion of the electron density of molecule. So getting hold of the intensities gives structural information
Intensity measurement Measure X-ray intensity in a diffracted ray and subtract the background ‘near to it’ Measured intensity, I = ray strength - background But in protein crystallography most intensities are small relative to background so some are ‘measured’ as negative And theory says they are non-negative … Approaches in the early 1970s simply set negative measurements to zero … and got biased data sets
A Bayesian approach Good reason to think the likelihood for intensity measurements is near normal –Difference of Poisson (‘counting statistics’) –Further ‘corrections’ Theory gives the prior: “Wilson’s statistics” (AJC Wilson 1949) Estimate with the posterior mean Normal LikelihoodWilson’s Statistics
Simon French and Keith Wilson (1978) On the treatment of negative intensity measurements Acta Crystallographica A34, 517-525