Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Managerial Economics in a Global Economy
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 10 Curve Fitting and Regression Analysis
Ch11 Curve Fitting Dr. Deshi Ye
P M V Subbarao Professor Mechanical Engineering Department
The Simple Linear Regression Model: Specification and Estimation
Curve-Fitting Regression
458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.
Statistics.
Linear fits You know how to use the solver to minimize the chi^2 to do linear fits… Where do the errors on the slope and intercept come from?
Inferences About Process Quality
Chi Square Distribution (c2) and Least Squares Fitting
Linear and generalised linear models
R. Kass/S07 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Calibration & Curve Fitting
Physics 114: Lecture 17 Least Squares Fit to Polynomial
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Table of Contents The goal in solving a linear system of equations is to find the values of the variables that satisfy all of the equations in the system.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Physics 114: Exam 2 Review Lectures 11-16
Curve-Fitting Regression
Lab 3b: Distribution of the mean
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Statistical Methods II&III: Confidence Intervals ChE 477 (UO Lab) Lecture 5 Larry Baxter, William Hecker, & Ron Terry Brigham Young University.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
NASSP Masters 5003F - Computational Astronomy Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least.
Lecture 16 - Approximation Methods CVEN 302 July 15, 2002.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
ES 07 These slides can be found at optimized for Windows)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
R. Kass/W04 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
The Maximum Likelihood Method
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Parameter Estimation and Fitting to Data
Ch3: Model Building through Regression
The Gaussian Probability Distribution Function
The Maximum Likelihood Method
The Maximum Likelihood Method
Physics 114: Exam 2 Review Material from Weeks 7-11
Statistical Methods For Engineers
CONCEPTS OF ESTIMATION
Modelling data and curve fitting
Chi Square Distribution (c2) and Least Squares Fitting
Linear regression Fitting a straight line to observations.
BA 275 Quantitative Business Methods
5.2 Least-Squares Fit to a Straight Line
Lecture 8 Hypothesis Testing
Presentation transcript:

Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8, 12, Ap. D or Barlow P106-8, 150-3) l Suppose: u We have a set of measurements {x 1, x 2, … x n }. u We know the true value of each x i (x t1, x t2, … x tn ). We would like some way to measure how “good” these measurements really are. u Obviously the closer the (x 1, x 2, … x n )’s are to the (x t1, x t2, … x tn )’s, the better (or more accurate) the measurements. Can we get more specific? l Assume: u The measurements are independent of each other. u The measurements come from a Gaussian distribution. u (  1,  2...  n ) be the standard deviation associated with each measurement. l Consider the following two possible measures of the quality of the data: u Which of the above gives more information on the quality of the data? n Both R and  2 are zero if the measurements agree with the true value. n R looks good because via the Central Limit Theorem as n   the sum  Gaussian. n However,  2 is better! Both measures give zero when the measurements are identical to the true values

Richard Kass/F02P416 Lecture 6 2 u One can show that the probability distribution function for  2 is exactly: n This is called the "Chi Square" (  2 ) probability distribution function.  is the “Gamma Function”. n This is a continuous probability distribution that is a function of two variables:  2 n = number of degrees of freedom. n = (# of data points) – (# of parameters calculated from the data points)  2 function for different degrees of freedom (v=n) p(  2,n)  2 For n  20, P(  2 > y) can be approximated using a Gaussian pdf with y = (2  2 ) 1/2 - (2n-1) 1/2

Richard Kass/F02P416 Lecture 6 3 Example: We collected N events in an experiment and histogram the data in m bins before performing a fit to the data points.  We have m data points! Example: We count cosmic ray events in 15 second intervals and sort the data into 5 bins: Although there were 36 cosmic rays in our sample we have only m=5 data points. Suppose we want to compare our data with the expectations of a Poisson distribution (R): In order to do this comparison we need to know A, the total number of intervals (=20). Since A is calculated from the data, we give up one degree of freedom: dof=m-1=5-1=4. If we also calculate the mean of our Possion from the data we lose another dof: dof=m-1=5-2=3. Example: We have 10 data points. How many degrees of freedom (n) do we have? Let  and  be the mean and standard deviation of the data. If we calculate  and  from the 10 data point then n = 8. If we know  and calculate  then n = 9. If we know  and calculate  then n = 9. If we know  and  then n = 10. Number of counts in 15 second intervals01234 Number of intervals27632 # of degrees of freedom = dof max # of degrees of freedom = # of data points

Richard Kass/F02P416 Lecture 6 4 n Like the Gaussian probability distribution, the probability integral cannot be done in closed form: We must use a table to find out (look up) the probability of exceeding a given  2 for a given number of dof. Example: What’s the probability to have  2 >10 with the number of degrees of freedom n = 4? Using Table D of Taylor or the graph on the right: P(  2 > 10, n = 4) = Thus we say that the probability of getting a  2 > 10 with 4 degrees of freedom by chance is 0.04 (4%). If instead of 4 degrees of freedom we only had one degree of freedom then the probability of getting a  2 > 10 with 1 degree of freedom by chance is ~ (0.1%).

Richard Kass/F02P416 Lecture 6 5 Least Squares Fitting (Taylor Ch 8) l Suppose we have n data points (y i,  i ). u Assume that we know a functional relationship between the points, n Assume that for each y i we know x i exactly. n The parameters a, b, … are constants that we wish to determine from out data. u A procedure to obtain a and b is to minimize the following  2 with respect to a and b. n This is very similar to the Maximum Likelihood Method. For the Gaussian case MLM and LS are identical. Technically this is a  2 distribution only if the y’s are from a Gaussian distribution. Since most of the time the y’s are not from a Gaussian we call it “least squares” rather than  2. l Example: We have a function with one unknown parameter: Find b using the least squares technique. u We need to minimize the following: u To find the b that minimizes the above function, we do the following: Least Squares Fitting is sometimes called  2 fitting

Richard Kass/F02P416 Lecture 6 6 u Each measured data point (y i ) is allowed to have a different standard deviation (  i ). The LS technique can be generalized to two or more parameters for simple and complicated (e.g. non-linear) functions. u One especially nice case is a polynomial function that is linear in the unknowns: n We can always recast problem in terms of solving n simultaneous linear equations. We use the techniques from linear algebra and invert an n x n matrix! l Example: Given the following data perform a least squares fit to find the value of b. u Using the above expression for b we calculate: b = 1.05 u Next is a plot of the data points and the line from the least squares fit: x y 

Richard Kass/F02P416 Lecture 6 7 u If we assume that the data points are from a Gaussian distribution, we can calculate a  2 and the probability associated with the fit. n From Table D of Taylor: The probability to get  2 > 1.04 for 3 degrees of freedom ≈ 80%. We call this a "good" fit since the probability is close to 100%. n If however the  2 was large (e.g. 15), the probability would be small (≈ 0.2% for 3 dof). We would say this was a “bad” fit. RULE OF THUMB A “good” fit has  2 /dof ≤ 1 b = 1.05 y x

Richard Kass/F02P416 Lecture 6 8 Some not so good things about the  2 distribution and test: Given a set of data points two different functions can have the same value of  2. method does not specify a unique form of the solution How many parameters should we fit for ?? will get a different value for slope (b) if we fit 1+bx or a+bx The value of  2 does not look at the order of the data points. The distribution ignores trends in the data points. The distribution ignores the sign of differences between the data points and “true” values. only use the square of the differences There are other distributions/statistical tests that are more powerful than  2 test: “run tests” and “Kolmogorov test” These tests do use the order of the points: