1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
Heteroskedasticity The Problem:
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Objectives (BPS chapter 24)
Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Chapter 12 Simple Regression
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Introduction to Probability and Statistics Linear Regression and Correlation.
Interpreting Bi-variate OLS Regression
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
Correlation and Regression Analysis
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
SPH 247 Statistical Analysis of Laboratory Data April 9, 2013SPH 247 Statistical Analysis of Laboratory Data1.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Introduction to Linear Regression and Correlation Analysis
Returning to Consumption
BPS - 3rd Ed. Chapter 211 Inference for Regression.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Econ 314: Project 1 Answers and Questions Examining the Growth Data Trends, Cycles, and Turning Points.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Lecture 5. Linear Models for Correlated Data: Inference.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Inference for Least Squares Lines
The slope, explained variance, residuals
Slides by JOHN LOUCKS St. Edward’s University.
Simple Linear Regression
CHAPTER 12 More About Regression
Essentials of Statistics for Business and Economics (8e)
EPP 245 Statistical Analysis of Laboratory Data
St. Edward’s University
Presentation transcript:

1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 2 Quantitative Prediction Regression analysis is the statistical name for the prediction of one quantitative variable (fasting blood glucose level) from another (body mass index) Items of interest include whether there is in fact a relationship and what the expected change is in one variable when the other changes

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 3 Assumptions Inference about whether there is a real relationship or not is dependent on a number of assumptions, many of which can be checked When these assumptions are substantially incorrect, alterations in method can rescue the analysis No assumption is ever exactly correct

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 4 Linearity This is the most important assumption If x is the predictor, and y is the response, then we assume that the average response for a given value of x is a linear function of x E(y) = a + bx y = a + bx + ε ε is the error or variability

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 5

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 6

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 7 In general, it is important to get the model right, and the most important of these issues is that the mean function looks like it is specified If a linear function does not fit, various types of curves can be used, but what is used should fit the data Otherwise predictions are biased

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 8 Independence It is assumed that different observations are statistically independent If this is not the case inference and prediction can be completely wrong There may appear to be a relationship even though there is not Randomization and control prevents this in general

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 9

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 10

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 11 Note no relationship between x and y These data were generated as follows:

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 12 Constant Variance Constant variance, or homoscedacticity, means that the variability is the same in all parts of the prediction function If this is not the case, the predictions may be on the average correct, but the uncertainties associated with the predictions will be wrong Heteroscedacticity is non-constant variance

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 13

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 14

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 15 Consequences of Heteroscedacticity Predictions may be unbiased (correct on the average) Prediction uncertainties are not correct; too small sometimes, too large others Inferences are incorrect (is there any relationship or is it random?)

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 16 Normality of Errors Mostly this is not particularly important Very large outliers can be problematic Graphing data often helps If in a gene expression array experiment, we do 40,000 regressions, graphical analysis is not possible Significant relationships should be examined in detail

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 17

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 18 Statistical Lab Books You should keep track of what things you try The eventual analysis is best recorded in a file of commands so it can later be replicated Plots should also be produced this way, at least in final form, and not done on the fly

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 19 Example Analysis Standard aqueous solutions of fluorescein (in pg/ml) are examined in a fluorescence spectrometer and the intensity (arbitrary units) is recorded What is the relationship of intensity to concentration Use later to infer concentration of labeled analyte

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 20 Stata Regression Commands list concentration intensity scatter intensity concentration graph export fluor1.wmf, replace regress intensity concentration scatter intensity concentration || lfit intensity concentration graph export fluor2.wmf, replace rvfplot graph export fluor3.wmf, replace

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 21. do fluor1. list concentration intensity | concen~n intens~y | | | 1. | | 2. | 2 5 | 3. | 4 9 | 4. | | 5. | | | | 6. | | 7. | |

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 22

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 23. regress intensity concentration Source | SS df MS Number of obs = F( 1, 5) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = intensity | Coef. Std. Err. t P>|t| [95% Conf. Interval] concentrat~n | _cons |

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 24. regress intensity concentration Source | SS df MS Number of obs = F( 1, 5) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = intensity | Coef. Std. Err. t P>|t| [95% Conf. Interval] concentrat~n | _cons | Slope = change in intensity for a unit change in concentration

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 25. regress intensity concentration Source | SS df MS Number of obs = F( 1, 5) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = intensity | Coef. Std. Err. t P>|t| [95% Conf. Interval] concentrat~n | _cons | Intercept = intensity at zero concentration

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 26. regress intensity concentration Source | SS df MS Number of obs = F( 1, 5) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = intensity | Coef. Std. Err. t P>|t| [95% Conf. Interval] concentrat~n | _cons | ANOVA Table

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 27. regress intensity concentration Source | SS df MS Number of obs = F( 1, 5) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = intensity | Coef. Std. Err. t P>|t| [95% Conf. Interval] concentrat~n | _cons | Test of overall model

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 28. regress intensity concentration Source | SS df MS Number of obs = F( 1, 5) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = intensity | Coef. Std. Err. t P>|t| [95% Conf. Interval] concentrat~n | _cons | Variability around the regression line

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 29 scatter intensity concentration || lfit intensity concentration graph export fluor2.wmf, replace rvfplot graph export fluor3.wmf, replace The first of these plots shows the data points and the regression line. The second shows the residuals vs. fitted values, which is better at detecting nonlinearity

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 30

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 31

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 32 Use of the calibration curve

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 33

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 34 Measurement and Calibration Essentially all things we measure are indirect The thing we wish to measure produces an observed transduced value that is related to the quantity of interest but is not itself directly the quantity of interest Calibration takes known quantities, observes the transduced values, and uses the inferred relationship to quantitate unknowns

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 35 Measurement Examples Weight is observed via deflection of a spring (calibrated) Concentration of an analyte in mass spec is observed through the electrical current integrated over a peak (possibly calibrated) Gene expression is observed via fluorescence of a spot to which the analyte has bound (usually not calibrated)

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 36 Correlation Wright peak-flow data set has two measures of peak expiratory flow rate for each of 17 patients in l/min. Both are subject to measurement error In ordinary regression, we assume the predictor is known For two measures of the same thing with no error-free gold standard, one can use correlation to measure agreement

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 37 input std mini end

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 38. correlate std mini (obs=7) | std mini std | mini |

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 39

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 40 Issues with Correlation For any given relationship between two measurement devices, the correlation will depend on the range over which the devices are compared. If we restrict the Wright data to the range , the correlation falls from 0.94 to Correlation only measures linear agreement

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 41

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 42 Exercises Download data on measurement of zinc in water by ICP/MS (“Zinc.raw”) Conduct a regression analysis in which you predict peak area from concentration Which of the usual regression assumptions appears to be satisfied and which do not? What would the estimated concentration be if the peak area of a new sample was 1850? From the blanks part of the data, how big should a result be to indicate the presence of zinc with some degree of certainty?

October 18, 2007EPP 245 Statistical Analysis of Laboratory Data 43