Presentation on theme: "Xuhua Xia Slide 1 Non-linear regression All regression analyses are for finding the relationship between a dependent variable (y) and one or more independent."— Presentation transcript:
Xuhua Xia Slide 1 Non-linear regression All regression analyses are for finding the relationship between a dependent variable (y) and one or more independent variables (x), by estimating the parameters that define the relationship. Non-linear relationships whose parameters can be estimated by linear regression: e.g, y = ax b, y = ab x, y = ae bx Non-linear relationships whose parameters can be estimated by non-linear regression, e.g, Non-linear relationships that cannot be represented by a function: loess
Xuhua Xia 0 2 4 6 8 10 12 14 16 135 X Y y=x 1.5 y=x 0.5 y=x 1 0 5 10 15 20 25 0246810 X Y y=e x y=100e -x y=100 0.5 x Commonly Encountered Funtions
Xuhua Xia Slide 3 Growth curve of E. coli A researcher wishes to estimate the growth curve of E. coli. He put a very small number of E. coli cells into a large flask with rich growth medium, and take samples every half an hour to estimate the density (n/L). 14 data points over 7 hours were obtained. What is the instantaneous rate of growth (r). What is the initial density (N 0 )? As the flask is very large, he assumed that the growth should be exponential, i.e., y = a·e bx (Which parameter correspond to r and which to N 0 ?) Three approaches –Log-Transform to linear relationship –Direct least-square solution (EXCEL solver) –Direct least-absolute-difference solution (EXCEL solver) TimeDensity 1 20.023 2 39.833 3 80.571 4 161.102 5 317.923 6 635.672 7 1284.54 8 2569.43 9 5082.65 10 10220.8 11 20673.9 12 40591.4 13 81374.6 14 163964
Xuhua Xia Slide 4 Scatter plot In EXCEL: Log-transform D Run linear regression Obtain D 0 and r
Xuhua Xia Slide 5 EXCEL solver Get initial value for r: Initial value for D 0 is obtained with t = 0
Xuhua Xia Slide 6 Body weight of wild elephant A researcher wishes to estimate the body weight of wild elephants. He measured the body weight of 13 captured elephants of different sizes as well as a number of predictor variables, such as leg length, trunk length, etc. Through stepwise regression, he found that the inter-leg distance (shown in figiure) is the best predictor of body weight. He learned from his former biology professor that the allometric law governing the body weight (W) and the length of a body part (L) states that W = aL b Use the three approaches to fit the equation
Xuhua Xia Slide 7 Scatter plot W = aL b In EXCEL: Log-transform W and L Run linear regression Obtain a and b
Xuhua Xia Slide 9 DNA and protein gel electrophoresis How to estimate the molecular mass of a protein? –A ladder: proteins with known molecular mass –Deriving a calibration curve relating molecular mass (M) to migration distance (D): D = F(M) –Measure D and obtain M The calibration curve is obtained by fitting a regression model
Xuhua Xia Slide 10 Protein molecular mass The equation D=ae bM appears to describe the relationship between D and M quite well. This relationship is better than some published relationships, e.g., D = a – b ln(M) The data are my measurement of D and M for a subset of secreted proteins from the gastric pathogen Helicobacter pylori (Bumann et al., 2002). Homework: use the data and the three approaches to estimate parameters a and b (You don’t need to submit) MassD 514.5 1012.6 209.4 307.1 405.3 503.9 603.05 702.3 801.75 Bumann, D., Aksu, S., Wendland, M., Janek, K., Zimny-Arndt, U., Sabarth, N., Meyer, T.F., and Jungblut, P.R., 2002, Proteome analysis of secreted proteins of the gastric pathogen Helicobacter pylori. Infect. Immun. 70: 3396-3403.
Xuhua Xia Area and Radius What is the functional relationship between the area and the radius? Homework (you do not need to submit): Measure the area A (by counting the squares) and radius r for each circle and estimate the parameters c and d in the equation A = cr d by using the three approaches.
Xuhua Xia Slide 12 Toxicity study: pesticide What transformation to use?
Xuhua Xia Slide 13 Probit and probit transformation Probit has two names/definitions, both associated with standard normal distribution: –the inverse cumulative distribution function (CDF) –quantile function CDF is denoted by (z), which is a continuous, monotone increasing sigmoid function in the range of (0,1), e.g., (z) = p (-1.96) = 0.025 = 1 - (1.96) The probit function gives the 'inverse' computation, formally denoted -1 (p), i.e., probit(p) = -1 (p) probit(0.025) = -1.96 = -probit(0.975) [probit(p)] = p, and probit[ (z)] = z.
Xuhua Xia Logistic growth 0 10 20 30 40 50 0102030 Time N Commonly Encountered Funtions
Xuhua Xia Slide 16 Non-linear regression In rapidly replicating unicellular eukaryotes such as the yeast, highly expressed intron-containing genes requires more efficient splicing sites than lowly expressed genes. Natural selection will operate on the mutations at the slicing sites to optimize splicing efficiency. Designate splicing efficiency as SE and gene expression as GE. Certain biochemical reasoning suggests that SE and GE will follow the following relationships: GESE 10.46 20.47 30.57 40.61 50.62 60.68 70.69 80.78 90.7 100.74 110.77 120.78 130.74 130.8 150.8 160.78
Xuhua Xia Slide 17 Scatter plot Initial values: 0.4 (inferred when GE = 0) / 1 or (inferred when GE is very large) When GE = 8, we have (0.4+8 )/(1+8 ) = 0.78