Announcements Today’s Lecture –Chapter 4 Material Probability within Limits Confidence Intervals Statistical Tests
Chapter 4 – Gaussian Distributions Now for a “real” limit problem example: A man wants to get life insurance. If his measured cholesterol level is over 240 mg/dL (2,400 mg/L), his premium will be 25% higher. His level is measured and found to be 249 mg/dL. His uncle, a biochemist who developed the test, tells him that a typical standard deviation on the measurement is 25 mg/dL. What is the chance that a second measurement (with no crash diet or extra exercise) will result in a value under 240 mg/dL (e.g. beat the test)?
Graphical view of example X-axis 249 240 Desired area Table area Equivalent Area
Chapter 4 – Calculation of Confidence Interval 1.Confidence Interval = x + uncertainty 2.Calculation of uncertainty depends on whether σ is “ well known ” 3.When is not well known (covered later) 4.When is well known (not in text) Value + uncertainty = Z depends on area or desired probability At Area = 0.45 (90% both sides), Z = 1.65 At Area = 0.475 (95% both sides), Z = 1.96 => larger confidence interval
Chapter 4 – Calculation of Uncertainty Example: The concentration of NO 3 - in a sample is measured 2 times and found to give 18.6 and 19.0 ppm. The method is known to have a constant relative standard deviation of 2.0% (from past work). Determine the concentration and 95% confidence interval.
Chapter 4 – Calculation of Confidence Interval with Not Known Value + uncertainty = t = Student’s t value t depends on: - the number of samples (more samples => smaller t) - the probability of including the true value (larger probability => larger t)
Chapter 4 – Calculation of Uncertainties Example Measurement of lead in drinking water sample: –values = 12.3, 9.8, 11.4, and 13.0 ppb What is the 95% confidence interval?
Chapter 4 – Ways to Reduce Uncertainty 1.Decrease standard deviation in measurements (usually requires more skill in analysis or better equipment) 2.Analyze each sample more time (this increases n and decreases t) 3.Understand variability better (so that is known and Z-based uncertainty can be used)
Overview of Statistical Tests t-Tests: Determine if a systematic error exists in a method or between methods or if a difference exists in sample sets F-Test: Determine if there is a significant difference in standard deviations in two methods or sample sets (which method is more precise/which set is more variable) Grubbs Test: Determine if a data point can be excluded on a statistical basis
Statistical Tests Possible Outcomes Outcome #1 – There is a statistically significant result (e.g. a systematic error) –this is at some probability (e.g. 95%) –can occasionally be wrong (5% of time possible if test barely valid at 95% confidence) Outcome #2 – No significant result can be detected –this doesn’t mean there is no systematic error –it does mean that the systematic error, if it exists, is not detectable (e.g. not observable due to larger random errors) –It is not possible to prove a null hypothesis beyond any doubt
Statistical Tests Example from Research This Week Goal of Work: be able to consistently use high resolution mass spectrometer to measure mass with error less than 5 ppm (limit set for publication in several journals) Measurement is challenging and could be subject to poor data treatment (e.g. selection of “good” vs. “bad” data) Do any measurements within 5 ppm limit meet the requirement? No. We couldn’t just pick 1 out of 4 repeated measurements that meets the standard. We want to be 95+% certain true measured value is within the 5 ppm limit So we need to use statistics to set rules for meeting the limit In this case (different than tests in this class), measured value is acceptable if furthest 90% limit is within 5 ppm limit and closest 95% limits is within 5 ppm limit Example compound: expected mass = 809.4587 amu To meet 5 ppm limit, meas. mass = 809.4547 to 809.4628 Measured Mass = 809.4569 amu
Statistical Tests Example from Research This Week Graphical Explanation of Mass Measurement –multiple mass measurements made – giving: mean value +/- 90% and 95% CIs –not only mean but 90%/95% limits need to be within limit –in example, >5% chance of error mean measured mass expected mass (from mass of each atom) + and – 5 ppm 90% high limit out of range expected distribution – based on SD
Statistical Tests t Tests Case 1 –used to determine if there is a significant bias by measuring a test standard and determining if there is a significant difference between the known and measured concentration Case 2 –used to determine if there is a significant differences between two methods (or samples) by measuring one sample multiple time by each method (or each sample multiple times) Case 3 –used to determine if there is a significant difference between two methods (or sample sets) by measuring multiple sample once by each method (or each sample in each set once)
Case 1 t test Methylmannopyranoside (MMP) example Added as an internal standard at 5 ppm Analysis will tell if sample causes a bias compared to standard
Case 2 t test Example A winemaker found a barrel of wine that was labeled as a merlot, but was suspected of being part of a chardonnay wine batch and was obviously mis-labeled. To see if it was part of the chardonnay batch, the mis- labeled barrel wine and the chardonnay batch were analzyed for alcohol content. The results were as follows: –Mislabeled wine: n = 6, mean = 12.61%, S = 0.52% –Chardonnay wine: n = 4, mean = 12.53%, S = 0.48% Determine if there is a statistically significant difference in the ethanol content.
Case 3 t Test Example Case 3 t Test used when multiple samples are analyzed by two different methods (only once each method) Useful for establishing if there is a constant systematic error Example: Cl - in Ohio rainwater measured by Dixon and PNL (14 samples)
Case 3 t Test Example – Data Set and Calculations Conc. of Cl - in Rainwater (Units = uM) Sample #Dixon Cl - PNL Cl - 19.917.0 22.311.0 323.828.0 48.013.0 51.77.9 62.311.0 71.99.9 84.211.0 93.213.0 103.910.0 112.79.7 123.88.2 132.410.0 142.211.0 7.1 8.7 4.2 5.0 6.2 8.7 8.0 6.8 9.8 6.1 7.0 4.4 7.6 8.8 Calculations Step 1 – Calculate Difference Step 2 - Calculate mean and standard deviation in differences ave d = (7.1 + 8.7 +...)/14 ave d = 7.49 S d = 2.44 Step 3 – Calculate t value: t Calc = 11.5
Case 3 t Test Example – Rest of Calculations Step 4 – look up t Table –(t(95%, 13 degrees of freedom) = 2.17) Step 5 – Compare t Calc with t Table, draw conclusion –t Calc >> t Table so difference is significant
t- Tests Note: These (case 2 and 3) can be applied to two different senarios: –samples (e.g. sample A and sample B, do they have the same % Ca?) –methods (analysis method A vs. analysis method B)
F - Test Similar methodology as t tests but to compare standard deviations between two methods to determine if there is a statistical difference in precision between the two methods (or variability between two sample sets) As with t tests, if F Calc > F Table, difference is statistically significant S 1 > S 2