ME 411/511 General Rules for Dealing with Outlier Data Rule 1: Do NOT discard data just because “they look bad”. Rule 2: Apply a consistent rule and document.

Slides:



Advertisements
Similar presentations
Welcome to PHYS 225a Lab Introduction, class rules, error analysis Julia Velkovska.
Advertisements

Chapter 7 Statistical Data Treatment and Evaluation
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
STATISTICAL INFERENCE PART V
MARLAP Measurement Uncertainty
T scores and confidence intervals using the t distribution.
t scores and confidence intervals using the t distribution
Evaluating Hypotheses
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Inferences About Process Quality
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
Uncertainty analysis is a vital part of any experimental program or measurement system design. Common sources of experimental uncertainty were defined.
Binomial Probability Distribution.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Modern Navigation Thomas Herring
ANALYTICAL CHEMISTRY CHEM 3811
Section Differentials. Local Linearity If a function is differentiable at a point, it is at least locally linear. Differentiable.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Copyright 2008 © Silliker, All Rights Reserved Interpretation of Lab Results What am I buying? What does it mean? What do I do with it?
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistics and Quantitative Analysis Chemistry 321, Summer 2014.
Physics 114: Exam 2 Review Lectures 11-16
Lecture 4 Basic Statistics Dr. A.K.M. Shafiqul Islam School of Bioprocess Engineering University Malaysia Perlis
Uncertainty & Error “Science is what we have learned about how to keep from fooling ourselves.” ― Richard P. FeynmanRichard P. Feynman.
1 Review from previous class  Error VS Uncertainty  Definitions of Measurement Errors  Measurement Statement as An Interval Estimate  How to find bias.
Probability (Ch. 6) Probability: “…the chance of occurrence of an event in an experiment.” [Wheeler & Ganji] Chance: “…3. The probability of anything happening;
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Statistics Presentation Ch En 475 Unit Operations.
1 2 nd Pre-Lab Quiz 3 rd Pre-Lab Quiz 4 th Pre-Lab Quiz.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA
Experimental Data Analysis Prof. Terry A. Ring, Ph. D. Dept. Chemical & Fuels Engineering University of Utah
Statistical Data Analysis
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Section 7-3 Estimating a Population Mean: σ Known.
UES Specimen Heterogeneity Analysis : Revisited F. Meisenkothen Air Force Research Laboratory | AFRL Materials Characterization Facility | MCF Operated.
Statistics 300: Elementary Statistics Sections 7-2, 7-3, 7-4, 7-5.
NON-LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
Analysis of Experimental Data; Introduction
Statistics Presentation Ch En 475 Unit Operations.
Error in Measurement Precision Accuracy Error Types Significant Digits Error Propagation.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
CHAPTER- 3.2 ERROR ANALYSIS. 3.3 SPECIFIC ERROR FORMULAS  The expressions of Equations (3.13) and (3.14) were derived for the general relationship of.
CHAPTER- 3.1 ERROR ANALYSIS.  Now we shall further consider  how to estimate uncertainties in our measurements,  the sources of the uncertainties,
The T-Test Are our results reliable enough to support a conclusion?
MECH 373 Instrumentation and Measurements
Chapter 8: Estimating with Confidence
Chapter 13 Simple Linear Regression
Confidence Intervals for Proportions
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Introduction, class rules, error analysis Julia Velkovska
Physics 114: Exam 2 Review Material from Weeks 7-11
Introduction to Instrumentation Engineering
Sub:- Applied Mathematics-II Topic: Integral Calculus-I
Chapter 8: Estimating with Confidence
Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Statistical Data Analysis
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

ME 411/511 General Rules for Dealing with Outlier Data Rule 1: Do NOT discard data just because “they look bad”. Rule 2: Apply a consistent rule and document it. Rule 3: Be cautious about discarding ANY data.

ME 411/511Prof. Sailor

ME 411/511

Prof. Sailor Outlier Data Detection – one approach Calculate the probability that a single point would fall in the suspect range. Multiply this probability by the number of measurements in the sample to determine the expected number of measurements in this range. If this number is less than 0.1 then the point is an outlier.

ME 411/511Prof. Sailor Outlier Data Detection suspected outlier mean Keep in mind that if our sample is large enough we DO expect some points out beyond 3sigma. So, it is not just how far out a point appears, but rather, what the probability is (for the given sample size) that at least one point would be that far out.

ME 411/511Prof. Sailor Outlier Example 1 Consider the case of 12 replicate measurements. X= 0.45, 0.46, 0.46, 0.47, 0.47, 0.47, 0.47, 0.48, 0.48, 0.50, 0.53, and 0.58 Question: Are any of these data outliers? –By definition you suspect points at either end of the spectrum of values … perhaps 0.45 –…more likely 0.58 –…or possibly both 0.53 and 0.58… –…but how do we decide?

ME 411/511Prof. Sailor Outlier Example 1 Consider the case of 12 replicate measurements. X= 0.45, 0.46, 0.46, 0.47, 0.47, 0.47, 0.47, 0.48, 0.48, 0.50, 0.53, and 0.58 Mean = Standard Deviation = N=12 P(x>=0.58) = = and N*P= (from Table 4.3 – one-sided integral) Thus, 0.58 IS an OUTLIER! In general you would test other points WITHOUT recalculating statistics. No other points are outliers. We would then recalculate the statistics for presentation of results

ME 411/511Prof. Sailor Outlier Example 2 Consider Example 4.11 from the text. X= 28, 31, 27, 28, 29, 24, 29, 28, 18, 27 Mean = 26.9 Standard Deviation = N=10 P(x<=18) = = and N*P= (from Table 4.3 – one-sided integral) Thus, 18 IS an OUTLIER! (book gets same end result, but is casual with their roundoff and has different intermediate numbers!)

ME 411/511Prof. Sailor More on Outlier Analysis Chauvenet’s criterion is also often used for outlier detection. It is similar to the approach just presented, but with a critical number P*N of 0.5 rather than 0.1 Pierce’s criterion – more rigorous than Chauvenet’s criterion and useful for multiple suspect points. For further options and details see various statistics texts such as: –Taylor, John R. An Introduction to Error Analysis. 2nd edition. Sausolito, California: University Science Books, 1997.

ME 411/511Prof. Sailor Definition of Uncertainty (Ch. 5 in Figliola and Beasley) In most experiments, the "correct value" is not known. Rather, we are attempting to measure a quantity with less than perfect instrumentation. The uncertainty is an estimate of the likely error. As a rule of thumb, use a 95% confidence interval. In other words, if I state that I have measured the height of my desk to be 38 +/- 1 inch - I am suggesting that I am 95% sure that the desk is between 37 and 39 inches tall.

ME 411/511Prof. Sailor Uncertainty … The producer of a particular alloy claims a modulus of elasticity of 40kPa +/- 2 kPa. What does this mean? Answer: The general rule of thumb is that the +/- 2kPa would represent a 95% confidence interval. That is, if you randomly select many samples of this manufacturer's alloy you should find that 95% of the samples meet the stated limit of 40 +/- 2 kPa. This does not mean that you couldn't get a sample that has a modulus of elasticity of 43 kPa, it just means that it is very unlikely.

ME 411/511Prof. Sailor Uncertainty Uncertainty vs. Error Design Stage Uncertainty –Zero-order uncertainty: U o = ½ resolution –Instrument uncertainty: U c Can be the combination (root sum squares) of individual error components (e.g., linearity & hysteresis) –Design stage uncertainty is the combination of U o and U c : Propagation of Uncertainty –Euclidean Norm approach (similar to RSS)

ME 411/511Prof. Sailor Calculation uncertainty and the Euclidean Norm –In most experiments, several quantities are measured in order to calculate a desired quantity. For instance, if one wanted to estimate the gravitational constant by dropping a ball from a known height, the correct equation would be: g = 2L/t 2

ME 411/511Prof. Sailor Gravity Example and Propagating Uncertainties –Suppose we measure L = 50 m and t = 3.12 sec –How do we estimate the uncertainty in our calculation of g? –Suppose the uncertainties in the measurements are +/ m and +/- 0.5 sec. – Based on the equation we have g= 2(50.00)/(3.1)(3.1) or g= 10.4 m/s 2.

ME 411/511Prof. Sailor Worst Case Uncertainties One way of looking at the uncertainty is to immediately calculate the "worst cases". –g= (2)(50.01)/(2.6)(2.6) = 14.8 m/s 2 –g= (2)(49.99)/(3.6)(3.6) = 7.7 m/s 2 These would yield a confidence interval around g as: 7.7 < g <= 14.8 m/s 2 This is generally an OVERESTIMATION of uncertainty, and NOT a very good approach.

ME 411/511Prof. Sailor Need for a Norm It is unlikely for all individual measurement uncertainties in a system to simultaneously be the worst possible. So, the “worst case” approach is NOT a good one. Some average or "norm" of the uncertainties must be used in estimating a combined uncertainty for the calculation of g. The norm that we use is called the Euclidean Norm.

ME 411/511Prof. Sailor Euclidian Norm Defined In general, if the quantity Y is determined by an equation involving n independent variables Xi: Y = f(X1,X2,X3,..., Xn), and the uncertainty in each independent measurement variable Xi is called Ui, then the uncertainty in Y is given by:

ME 411/511Prof. Sailor Propagation of Uncertainty In many instances we will simply use the design-stage uncertainty for each (of n) measurement to assess uncertainty in calculated variables:

ME 411/511Prof. Sailor Euclidian Norm Applied to Our Example So g= 10+/-3 m/s 2. This is an example of a bad experiment. A much better in home experiment for estimating g is to use the physics behind an ideal pendulum.

ME 411/511Prof. Sailor Euclidian Norm Example 1 –Example: Suppose Y= AX 4, where A is some known constant and X is a measured quantity (X=300 K +/- 10%). What is Y and the uncertainty in Y? –Answer: First note that we could just as easily have specified X= 300 K +/- 30 K. The estimate for Y is given by Y= A(300^4) = A* 8.1e9. –For the Euclidean norm we need to calculate one partial derivative: dY/dX. –dY/dX = 4*A*X^3. –The uncertainty in Y then is UY = sqrt ( [4*A*X^3*30 K]^2 ) –or UY = sqrt ( [4*A*(300 K)^3*30 K]^2 ) –so UY = sqrt ( 1.050e19*A^2) = A * 3.24e9 K^4 –Thus, Y= 8.1e9*A +/- 3.2e9*A, or Y= 8.1e9*A +/- 40% (units here are in K^4)