Overview of Non-Parametric Probability Density Estimation Methods Sherry Towers State University of New York at Stony Brook.

Slides:



Advertisements
Similar presentations
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Advertisements

Point Estimation Notes of STAT 6205 by Dr. Fan.
The Simple Regression Model
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Periodograms Bartlett Windows Data Windowing Blackman-Tukey Resources:
K Means Clustering , Nearest Cluster and Gaussian Mixture
EE-148 Expectation Maximization Markus Weber 5/11/99.
The Simple Linear Regression Model: Specification and Estimation
Pattern Recognition and Machine Learning
Sampling Attila Gyulassy Image Synthesis. Overview Problem Statement Random Number Generators Quasi-Random Number Generation Uniform sampling of Disks,
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Sampling Strategies for Narrow Passages Presented by Rahul Biswas April 21, 2003 CS326A: Motion Planning.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
1 TerraFerMA A Suite of Multivariate Analysis Tools Sherry Towers SUNY-SB TerraFerMA is now ROOT-dependent only (ie; it is CLHEP-free) www-d0.fnal.gov/~smjt/multiv.html.
Clustering.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Independent Component Analysis (ICA) and Factor Analysis (FA)
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Benefits of Minimizing the Number of Discriminators Used in a Multivariate Analysis Sherry Towers State University of New York at Stony Brook.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Lecture II-2: Probability Review
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Radial Basis Function Networks
Separate multivariate observations
How survey design affects analysis Susan Purdon Head of Survey Methods Unit National Centre for Social Research.
Helsinki University of Technology Adaptive Informatics Research Centre Finland Variational Bayesian Approach for Nonlinear Identification and Control Matti.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Kalman filtering techniques for parameter estimation Jared Barber Department of Mathematics, University of Pittsburgh Work with Ivan Yotov and Mark Tronzo.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Model Inference and Averaging
A Neural Network MonteCarlo approach to nucleon Form Factors parametrization Paris, ° CLAS12 Europen Workshop In collaboration with: A. Bacchetta.
1 LES of Turbulent Flows: Lecture 1 Supplement (ME EN ) Prof. Rob Stoll Department of Mechanical Engineering University of Utah Fall 2014.
Soft Sensor for Faulty Measurements Detection and Reconstruction in Urban Traffic Department of Adaptive systems, Institute of Information Theory and Automation,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
CY3A2 System identification1 Maximum Likelihood Estimation: Maximum Likelihood is an ancient concept in estimation theory. Suppose that e is a discrete.
An Introduction to Kalman Filtering by Arthur Pece
July 11, 2006Bayesian Inference and Maximum Entropy Probing the covariance matrix Kenneth M. Hanson T-16, Nuclear Physics; Theoretical Division Los.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Nonlinear State Estimation
Lecture 22: Quantitative Traits II
2005 Unbinned Point Source Analysis Update Jim Braun IceCube Fall 2006 Collaboration Meeting.
Various Rupak Mahapatra (for Angela, Joel, Mike & Jeff) Timing Cuts.
Jin Huang M.I.T. For Transversity Collaboration Meeting Jan 29, JLab.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (2)
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson0-1 Supplement 2: Comparing the two estimators of population variance by simulations.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
Deep Feedforward Networks
(5) Notes on the Least Squares Estimate
The Simple Linear Regression Model: Specification and Estimation
Ch8: Nonparametric Methods
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Computing and Statistical Data Analysis / Stat 7
Learning From Observed Data
TerraFerMA A Suite of Multivariate Analysis Tools
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

Overview of Non-Parametric Probability Density Estimation Methods Sherry Towers State University of New York at Stony Brook

S.Towers All kernal PDF estimation methods (PDE’s) are developed from a simple idea… If a data point lies in a region where clustering of signal MC is tight, and bkgnd MC is loose, the point is likely to be signal

S.Towers  To estimate a PDF, PDE’s use the idea that any continuous function can be modelled by sum of some “kernal” function  Gaussian kernals are a good choice for particle physics  So, a PDF can be estimated by sum of multi-dimensional Gaussians centred about MC generated points

S.Towers  Best form of Gaussian kernal is a matter of debate:  Static-kernal PDE method uses a kernal with covariance matrix obtained from entire sample  The Gaussian Expansion Method (GEM), uses an adaptive kernal; the covariance matrix used for the Gaussian at each MC point comes from “local” covariance matrix.

S.Towers

GEM vs Static-Kernal PDE  GEM gives unbiased estimate of PDF, but slower to use because local covariance must be calculated for each MC point  Static-kernal PDE methods have smaller variance, and are faster to use, but yield biased estimates of the PDF

S.Towers Comparison of GEM and static-kernal PDE:

S.Towers PDE vs Neural Networks  Both PDE’s and Neural Networks can take into account non-linear correlations in parameter space  Both methods are, in principle, equally powerful  For most part they perform similarly in an “average” analysis

S.Towers PDE vs Neural Networks  But, PDE’s have far fewer parameters, and algorithm is more intuitive in nature (easier to understand)

S.Towers Plus, PDE estimate of PDF can be visually examined:

S.Towers PDE’s vs Neural Nets…  There are some problems that are particularly well suited to PDE’s:

S.Towers PDE’s vs Neural Nets…

S.Towers PDE’s vs Neural Nets…

S.Towers PDE’s vs Neural Nets…

S.Towers Summary  PDE methods are as powerful as neural networks, and offer an interesting alternative  Very few parameters, easy to use, easy to understand, and yield unbinned estimate of PDF that user can examine in the multidimensional parameter space!