BlueFin Best Linear Unbiased Estimate Fisher Information aNalysis Andrea Valassi (IT-SDC) based on the work done with Roberto Chierici TOPLHCWG meeting.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
The Simple Regression Model
Linear Regression.
Evaluating Classifiers
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: a Monte Carlo experiment Original citation: Dougherty, C. (2012) EC220.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Sampling: Final and Initial Sample Size Determination
Instrumental Variables Estimation and Two Stage Least Square
Econ 140 Lecture 61 Inference about a Mean Lecture 6.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Variance and covariance M contains the mean Sums of squares General additive models.
Multiple regression analysis
Experimental Uncertainties: A Practical Guide What you should already know well What you need to know, and use, in this lab More details available in handout.
The Simple Linear Regression Model: Specification and Estimation
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Point estimation, interval estimation
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Mobile Intelligent Systems 2004 Course Responsibility: Ola Bengtsson.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Example: Cows Milk Benefits –Strong Bones –Strong Muscles –Calcium Uptake –Vitamin D Have you ever seen any statistics on cow’s milk? What evidence do.
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
TOPLHCWG. Introduction The ATLAS+CMS combination of single-top production cross-section measurements in the t channel was performed using the BLUE (Best.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
597 APPLICATIONS OF PARAMETERIZATION OF VARIABLES FOR MONTE-CARLO RISK ANALYSIS Teaching Note (MS-Excel)
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Andrew Thomson on Generalised Estimating Equations (and simulation studies)
Physics 270 – Experimental Physics. Standard Deviation of the Mean (Standard Error) When we report the average value of n measurements, the uncertainty.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Modern Navigation Thomas Herring
The Common Shock Model for Correlations Between Lines of Insurance
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
Risk Analysis & Modelling Lecture 2: Measuring Risk.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Confidence Interval & Unbiased Estimator Review and Foreword.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
Unfolding in ALICE Jan Fiete Grosse-Oetringhaus, CERN for the ALICE collaboration PHYSTAT 2011 CERN, January 2011.
Machine Learning 5. Parametric Methods.
Analysis of Experiments
Tutorial I: Missing Value Analysis
The KOSMOSHOWS What is it ? The statistic inside What it can do ? Future development Demonstration A. Tilquin (CPPM)
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CHAPTER 29: Multiple Regression*
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Analytics – Statistical Approaches
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

BlueFin Best Linear Unbiased Estimate Fisher Information aNalysis Andrea Valassi (IT-SDC) based on the work done with Roberto Chierici TOPLHCWG meeting on Statistical Combination Tools 11 th November 2013

A. Valassi – BlueFin2  The code was prepared over the last ~20 months to test various ideas we got while working on our paper  “Information and treatment of unknown correlations in the combination of measurements using the BLUE method” (  Why are BLUE weights negative? Are correlations overestimated? What is the most “conservative” choice of correlations? How much “information” does each measurement contribute (what is its “relative importance”)?  The C++ code has changed enormously in time as new ideas appeared and old “new” ideas were abandoned  Started off by migrating to C++ the Fortran code used for LEPEWWG  Focus was initially on splitting up information contributions among measurements (e.g. via integrals) – now moved out of the way  SVN ( is now a cleaned up version that only contains the ideas that made it into the paperhttps://svnweb.cern.ch/trac/bluefin  Initially born as a small private test (largely out of curiosity in my free time), not as user-oriented software to be supported Why BlueFin? – the original goals

11 th November 2013A. Valassi – BlueFin3  One executable: bluefin [options] [ /].bfin  Builds an output pdf report from an input text file  For N-parameter combinations: print BLUE results  In addition, for 1-parameter combinations: information analysis (information weights and derivatives)  A small library of C++ classes for this executable:  BlueFish – one N-parameter combination  Exact BLUE (central value, weights, error split-up) via matrix algebra  BlueFish1Obs – more specific 1-parameter case  Plus some helpers: InfoAnalyzer, InfoMinimizer …  Some examples (internally used as tests)  LEP (W branching ratio, σ WW ), TOP (m t ), ad-hoc examples BlueFin – what does it do?

11 th November 2013A. Valassi – BlueFin4 Example – input file

11 th November 2013A. Valassi – BlueFin5 Example – “nominal” correlations Information derivatives w.r.t. correlations (only for 1-observable combinations) Central value weights <0 indicate a “high-correlation regime” Correlations with the highest derivative (RED in the table) are those most responsible for this effect

11 th November 2013A. Valassi – BlueFin6 Example – modified correlations Details for each “modified correlation” scenario are in the following pages of the BlueFin report -Modified input covariances (full and partial) -Detailed BLUE results, CVW, MIW, IIW -Details about minimization, onionization….

11 th November 2013A. Valassi – BlueFin7  The fact that this was a small test that grew larger over time can be seen by many limitations:  Matrix algebra is done via Boost and (hence?) is slow  ROOT only appeared later to add minimizations  Still many “assertion” exceptions to check hypotheses (“is this sum the same as this other sum?”) – precision limited  Random use of triangular vs. symmetric matrices  No CppUnit – tests done by reproducing full real examples  External dependencies assume CERN’s CVMFS or AFS  All these points could easily be addressed if needed Internals – and some limitations

11 th November 2013A. Valassi – BlueFin8  Executable exists and is very easy to use  Write an input text file and you get lots of useful information  Automatic creation of tables in a pdf report via pdflatex  Software tested on some real life examples  Used for regression testing (require reproducible results)  Examples also represent simple doc for users Internals – some of the good points

11 th November 2013A. Valassi – BlueFin9  BLUE results are Unbiased as long as the individual measurements are Unbiased  But it is true that we traditionally treat our systematic biases as randomly (and Gaussian!) distributed variables  We could think of treating random (e.g. statistical) errors and unknown systematic biases in different ways?  In my opinion, understanding inter-measurement correlations for those systematic biases would remain (and be even more!) essential in that case  And splitting the total errors into fine-grained correlated sources of uncertainty remains the first step Beyond BLUE and Gaussian errors? (1)

11 th November 2013A. Valassi – BlueFin10  Two different problems are often being discussed  1. Why are central value weights in BLUE negative? That is to say, are correlations being (“conservatively”) overestimated?  2. Should we move beyond BLUE and Gaussian approximations eventually?  Addressing the second issue will probably involve a more complex (hence more opaque) procedure  This may “hide” the first issue, but we should not ignore it!  BLUE weights (especially the ugly negative ones) may disappear, but correlations may still remain overestimated!  IMO, the analysis provided by BlueFin is useful even if one should move beyond BLUE and Gaussian errors!… Beyond BLUE and Gaussian errors? (2)