15th September 2005PHYSTAT 05, Oxford 1 Statistics in ROOT René Brun, Anna Kreshuk, Lorenzo Moneta PH/SFT group, CERN ftp://root.cern.ch/root/phystat05.ppt.

Slides:



Advertisements
Similar presentations
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Advertisements

I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Component Analysis (Review)
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
ROOT courses1 The ROOT System A Data Access & Analysis Framework June 2003 Ren é Brun/EP Histograming & Fitting.
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
Data mining and statistical learning - lecture 6
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Overview of Non-Parametric Probability Density Estimation Methods Sherry Towers State University of New York at Stony Brook.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Chapter 7 Estimation: Single Population
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
Lecture II-2: Probability Review
Modern Navigation Thomas Herring
Classification and Prediction: Regression Analysis
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Today Wrap up of probability Vectors, Matrices. Calculus
MathMore Lorenzo Moneta, Andràs Zsenei ROOT Workshop 30/9/2005.
Hydrologic Statistics
Absolute error. absolute function absolute value.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Recent and Proposed Changes to ZOOM Recent entries Intended future additions Possibilities –D0 and CDF users can affect which new “possible” additions.
New ROOT Math Libraries W. Brown 1), M. Fischler 1), L. Moneta 2), A. Zsenei 2) 1) Fermi National Accelerator Laboratory, Batavia, Illinois, USA 2) CERN.
30th September 2005ROOT2005 Workshop 1 Developments in other math and statistical classes Anna Kreshuk, PH/SFT, CERN.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Matlab: Statistics Probability distributions Hypothesis tests
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
 Candidate events are selected by reconstructing a D, called a tag, in several hadronic modes  Then we reconstruct the semileptonic decay in the system.
Practical Statistical Analysis Objectives: Conceptually understand the following for both linear and nonlinear models: 1.Best fit to model parameters 2.Experimental.
Stochastic Monte Carlo methods for non-linear statistical inverse problems Benjamin R. Herman Department of Electrical Engineering City College of New.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
New ROOT Mathematical Libraries SMatrix Package with matrix and vector classes of arbitrary type (initially developed by T. Glebe for HeraB software) complementary.
SEAL-ROOT Math Plans for 2005 Math work package Andras Zsenei, Anna Kreshuk, Lorenzo Moneta, Eddy Offermann LCG Application Area Internal Review, 30 March,
Lorenzo Moneta,LHCb Software week, 26 May Proposal for ROOT Math Libraries MathLib work package from ROOT SEAL merge new proposed structure for.
Feedback from LHC Experiments on using CLHEP Lorenzo Moneta CLHEP workshop 28 January 2003.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Lorenzo Moneta,LHCb Software week, 26 May New ROOT Math Libraries Activities MathLib work package from ROOT SEAL merge new proposed structure for.
MathMore Lorenzo Moneta, Andràs Zsenei ROOT Meeting 19/8/2005.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Biostatistics Class 3 Probability Distributions 2/15/2000.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Estimating standard error using bootstrap
Advanced Quantitative Techniques
Developments in other math and statistical classes
Chapter 7. Classification and Prediction
Background on Classification
LECTURE 10: DISCRIMINANT ANALYSIS
Statistical Methods For Engineers
Unfolding Problem: A Machine Learning Approach
Basis Expansions and Generalized Additive Models (2)
LECTURE 09: DISCRIMINANT ANALYSIS
Unfolding with system identification
Presentation transcript:

15th September 2005PHYSTAT 05, Oxford 1 Statistics in ROOT René Brun, Anna Kreshuk, Lorenzo Moneta PH/SFT group, CERN ftp://root.cern.ch/root/phystat05.ppt

PHYSTAT 05, Oxford2 15th September 2005 Contents User interface Data storage and access Analysis Visualization New Math libraries Future plans

PHYSTAT 05, Oxford3 15th September 2005 ROOT’s user interface C++ in batch mode C++ interpreted code with CINT – the C++ interpreter  in the command line:  loading a macro: C++ compiled code via CINT Python:  Access to ROOT from Python  Access to Python from ROOT >>> from ROOT import TLorentzVector >>> l = TLorentzVector root [0] TPython::LoadMacro(“MyPyClass.py”); root [1] MyPyClass mpc; root -b -q myMacro.C > myMacro.log root[0] for (int i=0; i<10; i++) cout<<“hello ”<<i<<endl; root[1].L mySmallMacro.C; root[2] myFunction(1, 2, 3); root[].L myScript.C+ Creating shared library /home/…/MyScript_C.so

PHYSTAT 05, Oxford4 15th September 2005 ROOT and external libraries Using external libraries from ROOT:  rootcint – utility to link compiled C/C++ objects with CINT C/C++ interpreter  Example: In the Makefile of MyLibrary, rootcint generates the dictionary for MyClass Load and use MyLibrary in a ROOT session: root[].L MyLibrary.so root[] MyClass *mc = new MyClass();

PHYSTAT 05, Oxford5 15th September 2005 Data storage and access Allows to analyze Terabytes of data Can select entries from different physical locations and collect them into the analysis dataset Dataset to analyze V1 V2 …………V23 ………….....V99 Branches of a TTree are read independently, so the variables not needed for the analysis are not loaded into memory TTree1 TTree2 TTreeN

PHYSTAT 05, Oxford6 15th September 2005 Histograms dimensional histograms  Errors for each bin can be computed: Default: as sqrt(bin content) As sqrt(sum of squares of weights of the bin) 1-2 dimensional profile histograms  Mean value of Y and its standard deviation for each bin in X

PHYSTAT 05, Oxford7 15th September 2005 Analysis of TTrees TTree::Draw method and TTreeViewer - an easy way to examine the tree:  Producing histograms of user-defined expressions in up to 4 dimensions  Expressions – C++ formulas  Selections – expressions, user-defined macros or graphical cuts Examples: Tree.Draw(“sqrt(x):y”, “x>0 && y<1”); Tree.Draw(“2*TMath::Log(x)”, cut1 || cut2);

PHYSTAT 05, Oxford8 15th September 2005 Fitting - interface Minimization packages: Minuit and Fumili Fitting can be done:  Directly in those packages with a user-defined function to minimize  Through the general interface of TH1::Fit (binned data) – Chisquare and Loglikelihood methods TGraph::Fit (unbinned data) TGraphErrors::Fit (data with errors) TGraphAsymmErrors::Fit (taking into account asymmetry of errors) TTree::Fit and TTree::UnbinnedFit  RooFit package for object-oriented data modeling. Distributed with ROOT starting from version

PHYSTAT 05, Oxford9 15th September 2005 Linear Fitting (1) New class TLinearFitter  Used to fit functions linear in the parameters  times faster than Minuit, depending on the fitting function  Simple to use in a multidimensional case Example: Expressions with such syntax can be used in all the Fit interface functions lfitter.SetFormula(“1 ++ x0 ++ sqrt(x1) ++ exp(x2) ++ x3 ++ x4”);

PHYSTAT 05, Oxford10 15th September 2005 Linear Fitting (2) Based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals Robust least trimmed squares fitting  High breakdown point – smallest proportion of outliers that can cause the estimator to produce values arbitrarily far from the true parameters Graph.Fit(“pol3”, “rob=0.75”, -2, 2); 2 nd parameter – fraction h of the good points

PHYSTAT 05, Oxford11 15th September 2005 Smoothing and peak finding TSpectrum class:  1 and 2-dim background estimation  smoothing  deconvolution  peak search and fitting Graph smoothers:  Kernel smoother  Lowess  “Super smoother” Splines – cubic and quintic

PHYSTAT 05, Oxford12 15th September 2005 Multivariate methods (1) Minimum Covariance Determinant Estimator – a highly robust estimator of multivariate location and scatter Class TRobustEstimator  High breakdown point  Algorithm similar to Least Trimmed Squares regression

PHYSTAT 05, Oxford13 15th September 2005 Multivariate methods (2) TPrincipal - principal components analysis TMultiDimFit – approximates a multidimensional function with monomials, Chebyshev or Legendre polynomials TMultiLayerPerceptron – a neural networks class All multivariate methods can take input data from a TTree

PHYSTAT 05, Oxford14 15th September 2005 Confidence intervals TLimit – computes 95% C.L. limits using the Likelihood ratio semi-Bayesian method TRolke – computes confidence intervals for the rate of the Poisson in the presence of background and efficiency with a fully frequentist treatment of uncertainties. TFeldmanCousins – calculate the C.L. upper limit using the Feldman-Cousins method

PHYSTAT 05, Oxford15 15th September 2005 Small useful algorithms In the namespace TMath:  Most probability distribution functions, their densities and inverses  Special functions  Mean and Median – also for weighted datasets, Variance and K-th order statistic  Kolmogorov-Smirnov test

PHYSTAT 05, Oxford16 15th September 2005 Linear algebra and quadratic programming Linear algebra package:  General, symmetric and sparse matrices  Matrix decompositions  Eigenvalue analysis Quadratic programming library:  Dense and sparse data  Gondzio and Mehrotra solving methods

PHYSTAT 05, Oxford17 15th September 2005 Graphs 1-d:  TGraph  TGraphErrors  TGraphAsymmErrors  TMultiGraph – a collection of graphs 2-d:  TGraph2D  TGraph2DErrors

PHYSTAT 05, Oxford18 15th September 2005 ROOT Math Packages

PHYSTAT 05, Oxford19 15th September 2005 MathCore Library with the basic Math functionality build-able as a standalone library  no dependency on others ROOT packages  no external dependency Main content of MathCore:  Basic and commonly used mathematical functions Special and statistics (pdf, cdf) functions  Interfaces to function and algorithm classes Basic implementation of some numerical algorithms  3D and LorentzVectors  Random numbers

PHYSTAT 05, Oxford20 15th September 2005 MathMore Library with extra mathematical functionalities Current content:  C++ interface to functions and algorithms from the Gnu Scientific Library (GSL) Mathematical functions implemented using GSL Algorithms currently present:  adaptive numerical integration, derivation, root finders, interpolation,1D minimization repository for needed and useful extra Math functionality  could include other useful math libraries

PHYSTAT 05, Oxford21 15th September 2005 Summary and Future plans First versions of MathCore and MathMore libraries are being released  Transition phase, over in 2-3 months Next addition will be new random number package Improvement of the fitting interface Statistical algorithms to add:  sPlot  Loess - locally weighted polynomial regression  Cluster analysis  Boxplot and spiderplot Interface with R?

PHYSTAT 05, Oxford22 15th September 2005 Mathematical Functions Special functions  use proposed C++ standard interface: double cyl_bessel_i (double nu, double x); Statistical functions  Probability density functions (pdf)  Cumulative dist. (lower tail and upper tail)  Inverse of cumulative distributions  Coherent naming scheme (also proposed to C++ standard) chisquared_pdf, chisquared_prob, chisquared_quant, Chisquared_prob_inv, chisquare_quant_inv

PHYSTAT 05, Oxford23 15th September 2005 Mathematical Functions (cont) New functions with better precision than old one in ROOT  Extensive tests of numerical accuracy  Comparison with other libraries (Nag, Mathematica)

PHYSTAT 05, Oxford24 15th September 2005 Numerical Algorithm New C++ classes and interfaces for describing algorithms and functions Integrator classes  Implementation based on GSL (QGS) for definite and indefinite integration Move of functionality currently in ROOT TF1 inside new classes in MathCore  Easier to use for all clients

PHYSTAT 05, Oxford25 15th September 2005 Physics and Geometry Vectors Classes for 3D Vectors and LorentzVectors with their operations and transformations  Merge old ROOT and CLHEP New classes with cleaner interfaces, generic on the scalar type and the based coordinates  (cartesian, polar, cylindrical, etc..) Classes for 3D rotations and Lorentz transformations  Have also rotations based on quaternion Work done in collaboration with Fermilab group

PHYSTAT 05, Oxford26 15th September 2005 Minimization New C++ version of Minuit being introduced in ROOT  Same algorithms translated in C++ plus some added functionality Fumili minimizer, single side bounds Going under extensive validation tests beforeafter