Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unit 2b: Dealing “Rationally” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 1

Similar presentations


Presentation on theme: "Unit 2b: Dealing “Rationally” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 1"— Presentation transcript:

1 Unit 2b: Dealing “Rationally” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 1 http://xkcd.com/314/

2 Introducing a theory-driven approach to fitting nonlinear models to data Fitting nonlinear model and interpreting results Polynomial regression © Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 2 Multiple Regression Analysis (MRA) Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If time is a predictor, you need discrete- time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Use Factor Analysis: EFA or CFA? Course Roadmap: Unit 2b Today’s Topic Area

3 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 3 Two General Approaches to Fitting Nonlinear Relationships  Use theory, or knowledge of the field, to postulate a non-linear model for the hypothesized relationship between outcome and predictor.  Use nonlinear regression analysis to fit the postulated trend, and conduct all of your statistical inference there.  Interpret the parameter estimates directly, and produce plots of findings.  Use theory, or knowledge of the field, to postulate a non-linear model for the hypothesized relationship between outcome and predictor.  Use nonlinear regression analysis to fit the postulated trend, and conduct all of your statistical inference there.  Interpret the parameter estimates directly, and produce plots of findings. This Class Harder to apply, easier to interpret Theory-Driven, “Rational” Approach  Find an ad-hoc transformation of either the outcome or the predictor, or both, that renders their relationship linear.  Use regular linear regression analysis to fit a linear trend in the transformed world, and conduct all statistical inference there.  De-transform fitted model to produce plots of findings, and tell the substantive story in the untransformed world.  Find an ad-hoc transformation of either the outcome or the predictor, or both, that renders their relationship linear.  Use regular linear regression analysis to fit a linear trend in the transformed world, and conduct all statistical inference there.  De-transform fitted model to produce plots of findings, and tell the substantive story in the untransformed world. Last Class Easier to apply, harder to interpret Data-Driven, “Empirical” Approach

4 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 4 Theory-Driven, “Rational” Approach  Use theory, or knowledge of the field, to postulate a non-linear model for the hypothesized relationship between outcome and predictor.  Use nonlinear regression analysis to fit the postulated trend, and conduct all of your statistical inference there.  Interpret the parameter estimates directly, and produce plots of findings. Theory-Driven, “Rational” Approach  Use theory, or knowledge of the field, to postulate a non-linear model for the hypothesized relationship between outcome and predictor.  Use nonlinear regression analysis to fit the postulated trend, and conduct all of your statistical inference there.  Interpret the parameter estimates directly, and produce plots of findings. Theory: Pioneers in mathematical psychology, in the mid- 20 th century, theorized that human learning was state- dependent – that the rate at which individuals learned was proportional to the amount that they had left to learn. This led psychologists, like Nancy Bayley, to hypothesize that IQ had a negative exponential trajectory with age: Under this theory, the shape of the IQ/AGE trend in the BAYLEY data would look like this: IQ AGE Because the meaning of model parameters is not immediately obvious, we need to build intuition about the shape of the negative exponential curves … by sketching a few plots. http://www.foundalis.com/lan/hw/grkhandw.htm http://www.livingwaterbiblegames.com/greek-alphabet-handwriting.html lambda: Greek “l” gamma: Greek “g”

5 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 5 You can build intuition about how the shape of a negative exponential curve depends on the values of its parameters by sketching curves for prototypical parameter values, fixing all but one and varying others. Let’s start with parameter λ… AGE IQ when λ=200  =.04 IQ when λ=200  =.04 IQ when λ=250  =.04 IQ when λ=250  =.04 IQ when λ=300  =.04 IQ when λ=300  =.04 Conclusion? Parameter λ is the upper asymptote -- larger λ, higher the asymptote. Conclusion? Parameter λ is the upper asymptote -- larger λ, higher the asymptote. Sliders in Excel Properties: 1) Linked Cell, 2) Min/Max

6 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 6 And here’s how the values of parameter  affect the shape … AGE IQ when λ=200  =.01 IQ when λ=200  =.01 IQ when λ=200  =.04 IQ when λ=200  =.04 IQ when λ=200  =.07 IQ when λ=200  =.07 Conclusion? Parameter  determines the rate at which the asymptote is approached – the higher the value of , the more rapid the approach (see later). Conclusion? Parameter  determines the rate at which the asymptote is approached – the higher the value of , the more rapid the approach (see later).

7 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 7 Fitting a hypothesized negative exponential curve to the BAYLEY data, using nl, proceeds by an iterative process of informed guessing … if your were to do it by hand, here is an initial (pretty bad) guess. What might the next step be? Observed Child IQ & AGE Observed Child IQ & AGE Observed Data Initial guess for fitted IQ In the next step, would you …  Increase or decrease the initial estimate of ?  Increase or decrease the initial estimate of  ? In the next step, would you …  Increase or decrease the initial estimate of ?  Increase or decrease the initial estimate of  ? http://www.dynamicgeometry.com/JavaSketchpad/Gallery/Other_Explorations_and_Amusements/Least_Squares.html

8 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 8 Step 0 There’s another useful way of looking at the iterative journey to a final fitted model …think of it as a hike through a mountainous region of SSELAND, whose map grid is laid out in units of and , and we keep going downhill. SSE   Step 1 Step 2 Step 3 Step 4 Step 5 ??? The problem: How do we know our “local minimum” is our “global minimum”? You might try a number of different starting points and see if you converge to the same answer. Also, always visualize fit if you can. The problem: How do we know our “local minimum” is our “global minimum”? You might try a number of different starting points and see if you converge to the same answer. Also, always visualize fit if you can.

9 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 9 Unit 2b.do File Unit 2b.do File... programming STATA to conduct a non-linear regression analysis … *-------------------------------------------------------------------------------- * Hypothesize and fit a nonlinear relationship directly. *-------------------------------------------------------------------------------- * Specify the hypothesized non-linear model and conduct nonlinear regression * analysis, providing some sensible initial guesses ("start values") for the * parameter estimates. nl (IQ = {lambda}*(1-exp(-{gamma}*AGE))), initial (lambda 225 gamma 1) * Output the predicted values and raw residuals for brief diagnosis: predict PREDICTED, yhat predict RESID, resid * Other standard diagnostic statistics can also be output. nl is the STATA routine for fitting nonlinear regression models by least squares You not only have to identify the outcome and predictors, you also have to provide the hypothesized model. STATA recognizes the variable names in the model (here “IQ” & “AGE”) and assumes that other “names” in the model (here “lambda” & “gamma”) are parameters you want to estimate. You have to provide some sensible initial guesses (“starting values”) for the parameter estimates. Where your hike begins. You can output diagnostic datasets, as in linear regression analysis, including diagnostic statistics, although they are limited due to the nonlinear fit (I choose not to do a full accounting and output only residuals and fits, to retain focus on the nonlinear modeling itself. But much of what you already know still applies) Warning. The hypothesized model is fitted to the data ITERATIVELY, by a process of guessing parameter estimates and then successively refining that guess, while attending to a best-fit criterion. The process stops when parameter estimates have “converged” on the “best” answer. With difficult problems, this can sometimes take a lot of steps, lead to loops, or, worse, lead you to a suboptimal answer. Adjusting starting values and convergence criteria can help.

10 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 10 Iteration 0: residual SS = 70735.96 Iteration 1: residual SS = 13439.45 Iteration 2: residual SS = 5794.627 Iteration 3: residual SS = 695.1685 Iteration 4: residual SS = 670.2241 Iteration 5: residual SS = 670.2171 Iteration 6: residual SS = 670.2171 Iteration 0: residual SS = 70735.96 Iteration 1: residual SS = 13439.45 Iteration 2: residual SS = 5794.627 Iteration 3: residual SS = 695.1685 Iteration 4: residual SS = 670.2241 Iteration 5: residual SS = 670.2171 Iteration 6: residual SS = 670.2171 Here is the actual sequence of refinements to the Sum of Squared Residuals made by Stata as it iterated towards a final fitted negative exponential curve for the BAYLEY data … STATA began the iterative fitting process at “Step Zero” by computing the SSE associated with the initial guesses that I had provided … … clearly, my initial guesses were not good! STATA began the iterative fitting process at “Step Zero” by computing the SSE associated with the initial guesses that I had provided … … clearly, my initial guesses were not good! The computer regards the fitting process as having “converged” when SSE is reduced by less than one millionth between any two contiguous steps … you can modify this criterion, and choose your own. Over the next three steps, STATA focused rapidly on better estimates of the parameters, and SSE plummeted from over 70,000 to just under 700. Iteration step # Sum of squared errors, SSE Then, STATA spent a couple of steps trying to refine the final estimates, without much luck … making only a marginal improvement to SSE. And it quit, when between Step #5 and Step #6, it could not reduce SSE any further …

11 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 11 Source | SS df MS -------------+------------------------------ Number of obs = 21 Model | 388067.783 2 194033.891 R-squared = 0.9983 Residual | 670.217063 19 35.2745823 Adj R-squared = 0.9981 -------------+------------------------------ Root MSE = 5.939241 Total | 388738 21 18511.3333 Res. dev. = 132.3201 ------------------------------------------------------------------------------ IQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- /lambda | 248.0641 6.051146 40.99 0.000 235.3989 260.7293 /gamma |.0412756.0019789 20.86 0.000.0371337.0454174 ------------------------------------------------------------------------------ Here are the t-statistics and p-values for each predictor. They test the usual marginal null hypotheses of no population effect on the outcome, for the respective predictor variable, given all else in the model A familiar quantity? Approximate standard errors: 95% confidence intervals on each regression parameter. Final parameter estimates: The “rational approach” provides parameter estimates that have an intuitive meaning in the context of the theory that provided the hypothesized regression model …

12 © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 12 Does it Fit? R 2 statistic = 0.9983 Pretty darn good, but don’t forget this is one individual with time series data. Does it Fit? R 2 statistic = 0.9983 Pretty darn good, but don’t forget this is one individual with time series data.

13 © Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 13 Residual Diagnostics, Normality Insufficient evidence to reject the null hypothesis that the residuals are normally distributed in the population. A bit of a heavy lower tail in the residual distribution, but there’s not much to say given the low sample size… Insufficient evidence to reject the null hypothesis that the residuals are normally distributed in the population. A bit of a heavy lower tail in the residual distribution, but there’s not much to say given the low sample size…

14 © Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 14 Because we have time series data, we might begin to ask about autocorrelation… Residual Diagnostics, Heteroscedasticity, Autocorrelation A look at the residuals seems to hint at heteroscedasticity, but it is difficult to claim with this small sample size. Consistent with greater measurement error at the center of raw-score test scales (test theory) whereas error is reduced towards the asymptote? Adjacent residuals do show signs of being correlated, as negatives tend to predict adjacent negatives and positives tend to predict adjacent positives. A look at the residuals seems to hint at heteroscedasticity, but it is difficult to claim with this small sample size. Consistent with greater measurement error at the center of raw-score test scales (test theory) whereas error is reduced towards the asymptote? Adjacent residuals do show signs of being correlated, as negatives tend to predict adjacent negatives and positives tend to predict adjacent positives.

15 Polynomial Regression: Interacting Variables with Themselves © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 15 X Y “I test the following hypotheses… wives’ percentage of income is associated with divorce in an inverted U-shaped curve such that the odds of divorce are highest when spouses’ economic contributions are similar” Source: Rogers, SJ (2004). Dollars, dependency, and divorce: Four perspectives on the role of wives’ income. Journal of Marriage and Family, 66, 59-74. Quadratic model We allow a predictor’s effect to differ according to levels of that predictor. The test on  2 provides a test of whether the quadratic term (model) is necessary All quadratics are non-monotonic— they both rise and fall (or fall and rise) However, quadratic regression can fit monotonic curves as well: As with all interactions, we have to be careful about extrapolation.

16 © Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 16 Residual Diagnostics, Heteroscedasticity, Autocorrelation

17 Higher-Order Polynomials: Less Rational Than Empirical Linear Quadratic Cubic Quartic A quadratic model may have a loose argument for being theory-driven, but polynomial regression is largely a data-driven exercise. An advantage of polynomial regression over Box-Cox is a built-in framework for testing the hypothesis that an additional order added to the polynomial is useful for prediction. © Andrew Ho, Harvard Graduate School of Education Unit 2b– Slide 17


Download ppt "Unit 2b: Dealing “Rationally” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 1"

Similar presentations


Ads by Google