Chapter 7 Statistical Data Treatment and Evaluation

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Managerial Economics in a Global Economy
Lesson 10: Linear Regression and Correlation
Errors in Chemical Analyses: Assessing the Quality of Results
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Objectives (BPS chapter 24)
Chapter 7: Statistical Analysis Evaluating the Data.
Evaluating Hypotheses
SIMPLE LINEAR REGRESSION
Inferences About Process Quality
SIMPLE LINEAR REGRESSION
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression and Correlation
Chapter 9: Introduction to the t statistic
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Chapter 6 Random Error The Nature of Random Errors
Statistical Inference for Two Samples
SIMPLE LINEAR REGRESSION
AM Recitation 2/10/11.
1 c. The t Test with Multiple Samples Till now we have considered replicate measurements of the same sample. When multiple samples are present, an average.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 5 Errors In Chemical Analyses Mean, arithmetic mean, and average (x) are synonyms for the quantity obtained by dividing the sum of replicate measurements.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Introduction to Analytical Chemistry
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Analysis of Experimental Data; Introduction
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Chapter 10: The t Test For Two Independent Samples.
Chapter 7: Statistical Analysis Data Treatment and Evaluation.
Chapter 9 Introduction to the t Statistic
7 Statistical Data Treatment and Evaluation CHAPTER.
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression
Chapter 9 Hypothesis Testing.
Introduction to Instrumentation Engineering
6-1 Introduction To Empirical Models
Chapter 10 Correlation and Regression
Interval Estimation and Hypothesis Testing
Correlation and Simple Linear Regression
Presentation transcript:

Chapter 7 Statistical Data Treatment and Evaluation Experimentalist use statistical calculations to sharpen their judgments concerning the quality of experimental measurements. These applications include: Defining a numerical interval around the mean of a set of replicate analytical results within which the population mean can be expected to lie with a certain probability. This interval is called the confidence interval (CI).

Determining the number of replicate measurements required to ensure at a given probability that an experimental mean falls within a certain confidence interval. Estimating the probability that (a) an experimental mean and a true value or (b) two experimental means are different. Deciding whether what appears to be an outlier in a set of replicate measurements is the result of a gross error or it is a legitimate result. Using the least-squares method for constructing calibration curves.

Finding the Confidence Interval when s Is a Good Estimate of  CONFINENCE LIMITS Confidence limits define a numerical interval around x that contains  with a certain probability. A confidence interval is the numerical magnitude of the confidence limit. The size of the confidence interval, which is computed from the sample standard deviation, depends on how accurately we know s, how close standard deviation is to the population standard deviation . Finding the Confidence Interval when s Is a Good Estimate of  A general expression for the confidence limits (CL) of a single measurement CL = x  z For the mean of N measurements, the standard error of the mean, /N is used in place of  CL for  = x  z/N _ _ _

Finding the Confidence Interval when  Is Unknown We are faced with limitations in time or the amount of available sample that prevent us from accurately estimating . In such cases, a single set of replicate measurements must provide not only a mean but also an estimate of precision. s calculated from a small set of data may be quite uncertain. Thus, confidence limits are necessarily broader when a good estimate of  is not available.

…continued… To account for the variability of s, we use the important statistical parameter t, which is defined in the same way as z except that s is substituted for . t = (x - ) / s t depends on the desired confidence level, but t also depends on the number of degrees of freedom in the calculation of s. t approaches z as the number of degrees of freedom approaches infinity. The confidence limits for the mean x of N replicate measurements can be calculated from t by an equation CL for  = x  ts/N _ _

Comparing an Experimental Mean with the True Value A common way of testing for bias in an analytical method is to use the method to analyze a sample whose composition is accurately known. Bias in an analytical method is illustrated by the two curves shown in Fig. 7-3, which show the frequency distribution of replicate results in the analysis of identical samples by two analytical methods. Method A has no bias, so the population mean A is the true value xt. Method B has a systematic error, or bias, that is given by bias = B - xt = B - A bias effects all the data in the set in the same way and that it can be either positive or negative.

The critical value for rejecting the null hypothesis is calculated by …continued… The difference x – xt is compared with the difference that could be caused by random error. If the observed difference is less than that computed for a chosen probability level, the null hypothesis that x and xt are the same cannot be rejected. It says only that whatever systematic error is present is so small that it cannot be distinguished from random error. If x –xt is significantly larger than either the expected or the critical value, we may assume that the difference is real and that the systematic error is significant. The critical value for rejecting the null hypothesis is calculated by x – xt=  ts /N _ _ _ _

Comparing Two Experimental Means The results of chemical analyses are frequently used to determine whether two materials are identical. The chemist must judge whether a difference in the means of two sets of identical analyses is real and constitutes evidence that the samples are different or whether the discrepancy is simply a consequence of random errors in the two sets. Let us assume that N1 replicate analyses of material 1 yielded a mean value of x1 and that N2 analyses of material 2 obtained by the same method gave a mean of x2. If the data were collected in an identical way, it is usually safe to assume that the standard deviations of two sets of measurements are the same. We invoke the null hypothesis that the samples are identical and that the observed difference in the results, (x1 – x2), is the result of random errors. _ _ _ _

The standard deviation of the mean x1 is …continued… The standard deviation of the mean x1 is and like wise for x2, Thus, the variance s2d of the difference (d = x1 – x2) between the means is given by s2d = s2m1 + s2m2 _ _ _ _

…continued… By substituting the values of sd, sm1, and sm2 into this equation, we have If we then assume that the pooled standard deviation spooled is a good estimate of both sm1 and sm2, then and 2 2 2 2 2 2 2 1/2

Substituting this equation, we find that or the test value of t is given by We then compare our test value of t with the critical value obtained from the table for the particular confidence level desired. If the absolute value of the test statistic is smaller than the critical value, the null hypothesis is accepted and no significant difference between the means has been demonstrated. A test value of t greater than the critical value of t indicates that there is a significant difference between the means. _ _ _ _

DETECTING GROSS ERRORS A data point that differs excessively from the mean in a data set is termed an outlier. When a set of data contains an outlier, the decision must be made whether to retain or reject it. The choice of criterion for the rejection of a suspected result has its perils. If we set a stringent standard that makes the rejection of a questionable measurement difficult, we run the risk of retaining results that are spurious and have an inordinate effect on the mean of the data. If we set lenient limits on precision and thereby make the rejection of a result easy, we are likely to discard measurements that rightfully belong in the set, thus introducing a bias to the data. No universal rule can be invoked to settle the question of retention or rejection.

Using the Q Test The Q test is a simple and widely used statistical test. In this test, the absolute value of the difference between the questionable result xq and its nearest neighbor xn is divided by the spread w of the entire set to give the quantity Qexp: This ratio is then compared with rejection values Qcrit found in Table. If Qexp is greater the Qcrit, the questionable result can be rejected with the indicated degree of confidence.

How Do We Deal with Outliers? Reexamine carefully all data relating to the outlying result to see if a gross error could have affected its value. If possible, estimate the precision that can be reasonably expected from the procedure to be sure that the outlying result actually is questionable. Repeat the analysis if sufficient sample and time are available. If more data cannot be obtained, apply the Q test to the existing set to see if the doubtful result should be retained or rejected on statistical grounds. If the Q test indicates retention, consider reporting the median of the set rather than the mean.

ANALYZING TWO-DIMENSIONAL DATA: THE LEAST-SQUARES METHOD Many analytical methods are based on a calibration curve in which a measured quantity y is plotted as a function of the known concentration x of a series of standards. The typical calibration curve shown in Fig. 8-9. The ordinate is the dependent variable and the abscissa is the independent variable. As is typical (and desirable), the plot approximates a straight line. However, because of the indeterminate errors in the measurement process, not all the data fall exactly on the line. Thus, the investigator must try to draw the “best” straight line among the points. A statistical technique called regression analysis provides the means for objectively obtaining such a line and also for specifying the uncertainties associated with its subsequent use.

Assumptions of the Least-Squares Method When the method of least squares is used to generate a calibration curve, two assumptions are required. The first is the there is actually a linear relationship between the measured variable (y) and the analyte concentration (x). The mathematical relationship that describes this assumption is called the regression model, which may be represented as y = mx + b where, b is the y intercept (value of y when x is zero) and m is the slope of the line. We also assume that deviation of individual points from the straight line results from error in the measurement. That is, we must assume that there is no error in the x values of the points. We assume that exact concentrations of the standards are known. Both of these assumption are appropriate for many analytical methods.

Computing the Regression Coefficients and Finding the Least-Squares Line The vertical deviation of each point from the straight line is called a residual. The line generated by the least-squares method is the one that minimizes the sum of the squares of the residuals for all the points. In addition to providing the best fit between the experimental points and the straight line, the method gives the standard deviations for m and b.

We define three quantities Sxx, Syy, and Sxy as follows: …continued… We define three quantities Sxx, Syy, and Sxy as follows: Sxx = (xi – x)2 = x2i – (xi)2 / N Syy = (yi – y)2 = y2i – (yi)2 / N Sxy = (xi – x)(yi – y) = xiyi– [(xiyi)] / N where, xi and yi are individual pairs of data for x and y, N is the number of pairs of data used in preparing the calibration curve, and x and y are the average values for the variables; that is, x = xi / N and y = yi / N