FTP Biostatistics II Model parameter estimations: Confronting models with measurements.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Probability & Statistical Inference Lecture 9
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Greg C Elvers.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Ch11 Curve Fitting Dr. Deshi Ye
Simple Linear Regression and Correlation
Statistics for the Social Sciences
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Simple Linear Regression Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Linear Regression Example Data
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Simple Linear Regression In the previous lectures, we only focus on one random variable. In many applications, we often work with a pair of variables.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft® Excel 5th Edition
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Simple Linear Regression
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Product moment correlation
Presentation transcript:

FTP Biostatistics II Model parameter estimations: Confronting models with measurements

2 Fitting models to data We all know how to do this But do we all know how to do this?

3 Overview Objectives: Introduce the basic concept and the basic procedure for fitting a mathematical model to a set of measurements The method for obtaining the parameter estimates (“best fit”) is the same irrespective of the model complexity The small print: Applies only when the error model is the same Outline Goodness of fit: The concept of minimum sums of squares The concept of minimization explained with linear regression examples Model violation

4 Model fitting procedure Three essential requirements: Observations from a population A formal predictive model with parameters to be estimated A criterion to judge the goodness of fit of the model to the observations for any combination of parameter values. The criterion is often called an objective function.  Y i = Ŷ i +  i Y i value of observation i Ŷ i predicted value of observation i, “the mathematical model”  i the residual of observation i, this value is used as to calculate some criterion to judge the goodness of fit Parameter estimation: Statistical analysis of observations (variables) where the parameters of a certain model are estimated such that the measurements are as close as possible to the predicted value given certain criteria for goodness of fit.

5 Goodness of fit Observed value = Predicted value + residual Predicted value: Based on some formal mathematical model Note synonyms: observed value = measurement predicted value = fitted value = expected value residual error = deviation = random error = residual = error = noise Observed value = Predicted value +  i  Y i = Ŷ i +  i Each residual is added to the predicted value i stands here for a certain observation, i = 1,2,3, …n  i  = Observed value – Predicted value  i  = Y i – Ŷ i Since the residual is a measure of distance of the prediction from that of the observed, it is an obvious candidate for measure of goodness of fit

6 A visual representation of residuals Expected ŶiŶi YiYi XiXi Observed ii

7 Goodness of fit: Sum of squared residuals  i = Observed value – Predicted value The deviations are both positive and negative We can thus not minimize the sum of the difference. Squaring the difference solves the problem of negative deviations SS =   i 2 =  (Observed –Predicted) 2 The criterion in the model fitting is to minimize SS There are 2 major assumption when using the sums of squares as the criterion of fit; The residuals are: normally distributed about the predicted variable with equal variance (  2) for all values of the observed variable.

8 Variance of residuals independent of value of x Often do not have sufficient data to test this assumption

9 The squared residuals

10 The sum of the squared residuals

11 The principle behind the criterion Weather the model is simple or complex the principal of the criterion for the goodness of fit for the least square is always the same, i.e. minimize: SS =  (Observed – Predicted)2 The only complexity is the algorithm used to obtain the best parameter estimates that describe the predicted value With computers it easy to search numerically for values of the parameters to find the ones that fulfill the condition of minimum sums of squares Grid search: Try different values for the model parameter and calculate SS for each case. The condition is still the same: Search the value for the parameters in the model that give the lowest SS. Inbuilt minimization routines: Most statistical programs have these routines. They are for all practical purposes “black boxes”, how it is done is not important, the principal understanding is the issue. In Excel the black box is called Solver

FTP Example 1: Linear regression model with two parameters

13 The observations Ten fish (n=10) with measurements of two variables: Body weight Egg number Simple observation The heavier the fish the more number of eggs Proposed that a simple linear model would suffice to describe the relationship of egg number to that of fish weight.

14 The proposed model In mathematical notation we have: Observed = Predicted + residuals Y i = Ŷ i +  i Y i = a + b * X i +  i No. Eggs i = a + b * Body weight i + residual Thus for this model the goodness of fit is: SS=  (Observed i – Predicted i ) 2 =  ( Y i – Ŷ i ) 2 =  ( Y i – [a+ b * X i ] ) 2 =  (No. Eggs i – [a + b * Weight i ]) 2 Different values of the parameters a and b result in different values of SS. The objective is to find the combination of a and b that give the lowest SS value. SS: Sum of Squares

15 Pencil and paper calculation

16 The value of a and b affect the value of SS

17 The value of SS as a function of b For a given intercept (a), there is only one value for the slope (b) that gives the lowest SS. Changing the value of the slope (b) results in a different SS value.

18 The value of SS as a function of a For a given slope (b), there is only one value for the intercept (a) that gives the lowest SS. Changing the value of the intercept (a) results in a different SS value.

19 The value of SS as a function of a and b Only one combination of a and b that gives the lowest SS.

20 The best fit Lowest SS (=1455) is obtained for Intercept = Slope = 2.74

FTP Technical implementation: Find the value of the parameters that full fills the condition of minimum sum of square of the difference

22 Analytical solution for linear regression It can be shown that the value of a and b that fulfil the condition of minimum sums of squares is the following: Although an analytical solution for finding the value of parameters of interest, fulfilling the criterion of best fit, are available for a simple linear models such is not always the case for more complex models

23 Numerical search routines Analytical solution normally only available in the most simplest models. In other cases the only way is a numerical search for the parameter values that fulfils the condition of least sum of squares. In Excel this is done by the use of Solver In other programs: Similar methodology, just use a different name Disadvantages of numerical searches: Can sometimes take some time This is not really a problem today, high computer speed, efficient search routines Can get false estimates due to local minima Try different starting values for the parameters to see if the search gives same/different answeres

24 Local minima Global minima

FTP So how good is the goodness of fit?

26 Partitioning the sums of squares ii

27 Partitioning the total variance Variance about the regression line Total variance in the y-variable Variance explained by the regression Note: Total variance = Explained variance + Unexplained variance

28 Partitioning the total variance Total variance Red: Unexplained Green: Explained by the linear model

29 r 2 - Coefficient of determination Coefficient of determination is the fraction of the total variance of the dependent variable that is explained by the model. The value of r 2 is between 0 and 1 If the explained part is low compared with the total variance then r 2 will be close to 0 If the explained part is a high proportion of the total variance then r 2 will approach 1 Note: Other statistics of interest, e.g. confidence limits of the slope see statistical textbooks

30 Goodness of fit, standard error and r 2

31 Multiple regression Instead of having only one variable x, explaining the observations y, we may have two variables, x and z. A possible model may be: The goodness of fit for this model is: SS=  (Observed i – Predicted i ) 2 =  ( Y i – Ŷ i ) 2 =  ( Y i – [a + b*X i + c*Z i ]) 2 Different values of the parameters a, b and c result in different values of SS. The objective to find the combination of a, b and c that give the lowest SS value.

32 The first example extended Lets imagine that in our first example we also had information on temperature and that there were only two temperature regimes, 20 and 25 ºC Ten fish (n=10) with measurements of three variables: Body weight Temperature Egg number Simple observation: The heavier the fish the more number of eggs For a given body weight, fish collected at lower temperature have lower number of eggs High temperature Low temperature

33 The proposed model In mathematical notation we have: Observed = Predicted + residuals Y i = Ŷ i +  i Y i = a + b * X i + c Z i +  i No. Eggs i = a + b * Body weight i + Temperature + residual Thus for this model the goodness of fit is: SS=  (Observed i – Predicted i ) 2 =  ( Y i – Ŷ i ) 2 =  ( Y i – [a + b * X + c * Z i ] ) 2 =  (No. Eggs i – [a + b * Weight i + c * Temp i ]) 2 Different values of the parameters a, b and c result in different values of SS. The objective to find the combination of a, b and c that give the lowest SS value. SS: Sum of Squares

34 Results from the model

35 What changed? Simple model More complex model The simple model is a special case of the more complex model, the former being a case where we are effectively assuming that c is zero. Note that the slopes b (the weight parameter) has not changed that much, but the intercept has (why?). The biggest change is a much lower SS in the more complex model. The question is if the more complex model is an improvement over the simpler model??

36 Is the added parameter significant? Full (more complex) model: Reduced (simpler) model: Note that we always have: if we have not managed to reduce the unexplained variance around the regression line. Formal test

FTP Model violations, model transformations

38 Model violations: Residuals vs. independent value OK, residuals random OK, residuals random Problem, residuals a function of x Problem, variance increases with x Y X X Yi-Y

39 Beyond simple statistical models If model assumptions appear to be violated there are sometimes remedies: Try alternative model if there is a systematic pattern in the residuals (often add more parameters) Transformation of data if constant variance assumption is violated Alternative formulation of the objective function, i.e. use some other criterion than the minimum sums of squares

40 Example of alternative model fitting N: Population numbers U: Survey index The measurements

41 Lets check three alternative models

42 Models confronted with data Which model has the least detectable residual problems?

43 Residuals as a function of N Power model is tends to have the “best” residual scatter pattern

44 Suggested further reading Haddon, M Modeling and quantitative methods in fisheries.