Slide 1 Non-linear regression All regression analyses are for finding the relationship between a dependent variable (y) and one or more independent variables.

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

Things to do in Lecture 1 Outline basic concepts of causality
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
Correlation and regression
This PowerPoint presentation shows you how to use the NRM 1.0.xls Excel Workbook to fit several popular regression models to experimental data. The models.
1 Functions and Applications
Xuhua Xia Slide 1 Non-linear regression All regression analyses are for finding the relationship between a dependent variable (y) and one or more independent.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Experimental Design, Response Surface Analysis, and Optimization
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Exponential Smoothing Methods
Section 4.2 Fitting Curves and Surfaces by Least Squares.
PHYS2020 NUMERICAL ALGORITHM NOTES ROOTS OF EQUATIONS.
1 MF-852 Financial Econometrics Lecture 2 Matrix Operations in Econometrics, Optimization with Excel Roy J. Epstein Fall 2003.
Business Statistics - QBM117 Least squares regression.
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Lorelei Howard and Nick Wright MfD 2008
1 1 Slide Simple Linear Regression Chapter 14 BA 303 – Spring 2011.
Correlation and Regression
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Regression Maarten Buis Outline Recap Estimation Goodness of Fit Goodness of Fit versus Effect Size transformation of variables and effect.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time and body weight in the males of a mammalian species.
Summarizing Bivariate Data
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time (Time) and body weight (Wt) in the males of a mammalian.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Intro to Regression POL 242. Summary Regression is the process by which we fit a line to depict the relationship between two variables (usually both interval.
Discussion of time series and panel models
Simple & Multiple Regression 1: Simple Regression - Prediction models 1.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Regression David Young & Louise Kelly Department of Mathematics and Statistics, University of Strathclyde Royal Hospital for Sick Children, Yorkhill NHS.
Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many.
Curve Fitting Pertemuan 10 Matakuliah: S0262-Analisis Numerik Tahun: 2010.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 18 Introduction to Simple Linear Regression (Data)Data.
Engineers often: Regress data to a model  Used for assessing theory  Used for predicting  Empirical or theoretical model Use the regression of others.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
3-1Forecasting Weighted Moving Average Formula w t = weight given to time period “t” occurrence (weights must add to one) The formula for the moving average.
1 Linear Regression Model. 2 Types of Regression Models.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 2 Linear regression.
The simple linear regression model and parameter estimation
Regression and Correlation
HOW TO USE.
Correlation and Simple Linear Regression
The Maximum Likelihood Method
Multiple Regression.
Understanding Standards Event Higher Statistics Award
Relationship with one independent variable
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
LESSON 21: REGRESSION ANALYSIS
REGRESSION.
Relationship with one independent variable
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Slide 1 Non-linear regression All regression analyses are for finding the relationship between a dependent variable (y) and one or more independent variables (x), by estimating the parameters that define the relationship. Functional form known –Non-linear relationships whose parameters can be estimated by linear regression: e.g, y = ax b, y = ab x, y = ae bx –Non-linear relationships whose parameters can be estimated by non- linear regression, e.g, Functional form unknown: lowess/loess. While lowess and loess are often treated as synonyms, some people do insist that they are different as prescribed below: –lowess: a locally weighted linear least squares regression, generally involving a single IV –loess: a locally weighted linear or quadratic least squares regression, involving one or more IVs

Xuhua Xia Logistic growth Time N Commonly Encountered Funtions

Rationale of nonlinear regression Both linear and non-linear regression aim to find the parameter values that minimize the residual sum of squared deviation, RSS =  [y – E(y)] 2 For linear regression, a solution exists for intercept (a) and slope (b); for non-linear regression, such a solution often does not exist and we need to try various combination of parameter values. Let's us first pretend that we do not know the solution for a and b in linear regression and try different a and b to find the best parameter estimates that minimize RSS. Xuhua Xia Slide 3

Get slope and intercept the hard way Xuhua Xia Slide 4 The data set has been used before in our first lecture on regression. X is humidity and Y is weight loss. Double-click it and copy to an EXCEL sheet. We will try different combination of intercept (a) and slope (b) to find the best combination that minimizes RSS. From the plot we can guess that a  9 and b  The 3 rd column is the predicted value: E(Y) = a – bx The 4 th column is squared deviation: [Y – E(Y)] 2 You may first try different a and b values. Better ones will make RSS smaller. Now use EXCEL solver to automate this process. You may do an ordinary linear regression to check the parameter estimates. Summary: Guestimate parameter values Try different parameter values to minimize RSS EXCEL solver will try parameter values from 0 up. If a parameter is negative as the slope in our case, express the predicted value E(Y) as a – bx.

By using nls Xuhua Xia Slide 5 TimeN Time N Initial values of the parameters to estimate: K: carrying capacity: ? N 0 : 10? r: 1.35?

Use EXCEL solver to do estimates Xuhua Xia Slide 6 These (K, N0, and r) are our guestimates. Now refine them by using EXCEL solver (or by hand if you so wish

nls output Xuhua Xia Slide 7 md<-read.table("nlinLogistic.txt",header=T) attach(md) fit<-nls(N~N0*K/(N0+(K-N0)*exp(-r*Time)),start=c(K=150000,N0=10,r=1.35)) plot(Time,N) lines(Time,fitted(fit)) Parameters: Estimate Std. Error t value Pr(>|t|) K 1.232e e e-14 N e e r 1.151e e e-08

Xuhua Xia Slide 8 Fitting another equation In rapidly replicating unicellular eukaryotes such as the yeast, highly expressed intron-containing genes requires more efficient splicing sites than lowly expressed genes. GE: gene expression Natural selection will operate on the mutations at the slicing sites to optimize splicing efficiency (SE). Observation: SE increases with GE non-linearly, then levels off and appears to have reached a maximum. GESE

Xuhua Xia Slide 9 Guesstimate initial values The minimum of E(SE) is  when GE = 0.   4 The maximum of E(SE) is  /  when GE is large, e.g., 15),  /   8, i.e.,   8  The relationship is almost linear when GE is small. When GE = 6, SE  6.5.   8   8*0.278  2.22

Using EXCEL Solver Xuhua Xia Slide 10 These (K, N0, and r) are our guestimates. Now refine them by using EXCEL solver (or by hand if you so wish

R functions and output Xuhua Xia Slide 11 md<-read.table("nlinGESE.txt",header=T) attach(md) fit<-nls(SE~(a+b*GE)/(1+g*GE),start=c(a=4,b=2.22,g=0.278)) summary(fit) plot(GE,SE) lines(GE,fitted(fit)) Parameters: Estimate SE t P a b g GE SE

Xuhua Xia Slide 12 A general approach Sometimes we do not know the functional form. So here is a general approach. Same problem as before, but we are not sure of the exact relationship between SE and GE GESE

A general approach Xuhua Xia Slide 13 1.y increases with x at decreasing rate: use a polynomial to approximate, e.g., y = a + bx + cx 2 when x < x 0 2.When x reaches a certain level (x 0 ), y reaches its maximum and does not increase any more, y = y max for x  x 0

Xuhua Xia Slide 14 Guesstimate initial values When GE=0 then SE = , so   4 For a short segment of GE, the relationship between SE and GE is approximately linear, i.e., SE  a + bGE. When GE increases from 2 to 8, SE increases from 4.7 to 7.5, so   ( )/(8-2)  0.47 Given the linear approximation, with   0.4 and   0.47, then SE for GE = 12 should be  12 = 9.6, but the actual SE is only about 7.7. This must be due to the quadratic term  GE 2, i.e., (7.7 – 9.6) =   12 2, so  

Xuhua Xia Slide 15 A few more twists The continuity condition requires that The smoothness condition requires that The two conditions implies that We will find α, β, and  that minimise RSS =  [SE-E(SE)] 2 We tell R to substitute various values for α, β, and , and find the set of values that minimizes RSS Note that GE 0 and c are not parameters because they are functions of α, β, .

R statements to do the job md<-read.table("nlinGESE.txt",header=T) attach(md) # Function for estimating the parameters by minimizing RRS # a: alpha, b: beta, g: gamma, x0: GE0 myF <- function(x) { a<- x[1] b<- x[2] g<- x[3] x0<- -b/2/g c<- a-b^2/4/g seg1Data<-subset(md,subset=(md$GE < x0)) EY<- a+b*seg1Data$GE+g*seg1Data$GE*seg1Data$GE sumD2<-sum((seg1Data$SE-EY)^2) seg2Data = x0)) sumD2<-sumD2 + sum((seg2Data$SE-c)^2) } # obtain solution by supplying the initial values for a, b, g, and the function sol<-optim(c(4,0.47,-0.02),myF) a<-sol$par[1] b<-sol$par[2] g<-sol$par[3] x0<- -b/2/g c<- a-b^2/4/g seg1Data<-subset(md,subset=(md$GE < x0)) EY1<- a+b*seg1Data$GE+g*seg1Data$GE*seg1Data$GE PredY<- c(EY1,rep(c,length(GE)-length(seg1Data$GE))) plot(GE,SE) lines(GE,PredY, col="red") abline(v=x0)

Output $par [1] $value [1] $counts function gradient 150 NA $convergence [1] 0 c [1] x0 [1] GE SE RSS α, β, and  0 means success

Xuhua Xia Slide 18 Robust regression LOWESS: robust local regression between Y and X, with linear fitting LOESS: robust local regression between Y and one or more Xs, with linear or quadratic fitting Used with relations that cannot be expressed in functional forms SAS: proc loess Data: –Data set: monthly averaged atmospheric pressure differences between Easter Island, Chile and Darwin, Australia for a period of 168 months (NIST, 1998), suspected to exhibit 12-month (annual), 42-month (El Nino), and 25-month (Southern Oscillation) cycles (From Robert Cohen of SAS Institute)

lowess in R Xuhua Xia Slide 19 md<-read.table("nlinGESE.txt",header=T) attach(md) fit<-loess(SE~GE,span=0.75,degree=1|2) summary(fit) pred<-predict(fit,GE,se=TRUE) OR pred<-predict(fit,c(3,6),se=TRUE) plot(GE,SE) lines(GE,pred$fit,col="red") par(mfrow=c(2,3)) for(span in seq(0.4,0.9,0.1)) { fit<-loess(SE~GE,span=span) pred<-predict(fit,GE) sTitle<-paste0("span = ",span) plot(GE,SE,main=sTitle) lines(GE,pred,col="red") } smooth parameter α (proportion of data points used): larger = more smooth, default=0.75 linear or quadratic, default is 1 tricubic weighting (proportional to (1 - (dist/maxdist) 3 ) 3 ) How would I know which span value to use?

Plotting the fitted values > fit<-loess(SE~GE,span=0.8) > pred<-predict(fit,GE,se=T) > pred $fit [1] $se.fit [1] $residual.scale [1] $df [1] t<-qt(0.975,pred$df) ub<-pred$fit+t*pred$se.fit lb<-pred$fit-t*pred$se.fit plot(GE,SE) lines(GE,pred$fit) lines(GE,lb,col="red") lines(GE,ub,col="red") plot(GE,SE,ylim=c(min(lb),max(ub))) GE SE