Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to template model builder, an improved tool for maximum likelihood and mixed-effects models in R James Thorson.

Similar presentations


Presentation on theme: "Introduction to template model builder, an improved tool for maximum likelihood and mixed-effects models in R James Thorson."— Presentation transcript:

1 Introduction to template model builder, an improved tool for maximum likelihood and mixed-effects models in R James Thorson

2 Initial acknowledgements
Kasper Kristensen1, Hans Skaug2, Anders Nielsen3, Casper Berg3 1 Department of Mathematics, Bergen 2 DTU Compute, University of Denmark 3 DTU Aqua, University of Denmark [Show GitHub]

3 Outline Likelihood statistics Automatic differentiation TMB examples

4 Likelihood statistics
Outline Likelihood statistics Automatic differentiation TMB examples Thorson, J.T., Hicks, A.C., and Methot, R. In press. Random effect estimation of time-varying factors in Stock Synthesis. ICES J. Mar. Sci. doi: /icesjms/fst211. Thorson, J.T., and Minto, C. In press. Mixed effects: a unifying framework for modelling in fisheries biology. ICES J. Mar. Sci.

5 Likelihood statistics
Postulate probability of data D given model M and parameters θ Pr 𝐷 𝑀,𝜃 =𝑓(𝐷,𝑀,𝜃) 2. Switch around terms to make likelihood ℒ 𝜃 𝐷,𝑀 =𝑓 𝐷,𝑀,𝜃 +𝑐 3. Maximize likelihood w.r.t. fixed parameters 𝜃 = 𝑚𝑎𝑥𝑎𝑟𝑔 𝜃 (ℒ 𝜃 𝐷,𝑀 ) Uncertainty measures Information (inverse-Hessian) matrix Likelihood profile Bootstrapping

6 Likelihood statistics
Sometimes (almost always!) parameters have a hierarchy Hierarchical modelling – Formulating a model where parameters are estimated to arise from a process governed by other parameters Parameters arising from hierarchy are no longer “fixed”! Unclear how to write the probability of the data

7 Likelihood statistics
Solution → “data augmentation” Introduce “latent” variables Pr 𝐷 𝑀,𝜃,𝜀 =𝑓(𝐷,𝑀,𝜃,𝜀) where ε is a latent (augmented) variable Calculate the marginal likelihood of parameters when integrating across random effects Pr 𝐷 𝑀,𝜃 = 𝜀 𝑓(𝐷,𝑀,𝜃,𝜀)dε

8 Likelihood statistics
Given joint probability of data and random effects 𝑓(𝐷,𝑀,𝜃,𝜀) Estimate ‘fixed’ parameters θ via maximization Marginalize across random variables 𝜃 = argmax 𝜃,𝜏 ( 𝜀 ℒ 𝜃 1 𝐷,𝑀,𝜀 Pr⁡[𝜀|𝑀, 𝜃 2 ]𝑑𝜀) Marginalize – take a weighted average of likelihoods, where weights are given according to the probability of random effects 𝑃𝑟⁡[𝜀|𝑀,𝜃]

9 Likelihood statistics
Why would you make a hierarchy of parameters Stein’s paradox and shrinkage – Pooling parameters towards a mean will be more accurate on average 10 batting average + 1 proportions US car sales (Efron and Morris 1977) Biological intuition – Formulate models based on knowledge of constituent parts (Burnham and Anderson 2008) Variance partitioning – Separate different sources of variability (e.g., measurement errors!)

10 Likelihood statistics
Predicting random variables Empirical Bayes – Predict random variables ε via fixed values for θ 𝜀 = argmax 𝜀 ( Pr 𝐷 𝜀,𝑀, 𝜃 1 Pr⁡[𝜀|𝑀, 𝜃 2 ]) … so fisheries has historically used “penalized likelihood” (Ludwig and Walters 1981) 𝜃 , 𝜀 = argmax 𝜃,𝜀 (ℒ 𝜃 𝐷,𝑀, 𝜃 1 Pr⁡[𝜀|𝑀, 𝜃 2 ]) … but this has very little statistical justification (de Valpine and Hilborn 2005)

11 Likelihood statistics
Estimation 𝐿 𝜃|𝐷 = 𝜀 𝐿 𝜃 1 ,𝜀 𝐷 𝜋 𝜀 𝜃 2 𝑑𝜀 𝐿 𝜃 1 |𝐷 is the likelihood 𝜋 𝜀 𝜃 2 is the hyper-distribution 𝐿 𝜃 1 ,𝜀 𝐷 𝜋 𝜀 𝜃 2 is the “penalized likelihood”

12 Likelihood statistics
Separability Factoring 1 integral into 2+ smaller integrals 𝜀 𝐿 𝜃 1 ,𝜀 𝐷 𝜋 𝜀 𝜃 2 𝑑𝜀 = 𝑖=1 𝑛 𝜀 𝑖 𝐿 𝜃 1 , 𝜀 𝑖 𝐷 𝜋 𝜀 𝑖 𝜃 2 𝑑 𝜀 𝑖 Uses Meta-analysis: species are often independent Time series: years are often “conditionally” independent

13 Likelihood statistics
Laplace approximation 𝑓 𝑧 ≈𝑓 𝑧 + 𝑓 ′ 𝑧 𝑧 −𝑧 𝑓 ′′ 𝑧 ( 𝑧 −𝑧) 2 Where f(z) is a function to be approximated 𝑒 𝑓(𝑧) ≈ 𝑒 𝑓 𝑧 − 1 2 𝑓′′( 𝑧 ) ( 𝑧 −𝑧) 2 If f(z) is the log of the joint likelihood, then: 𝑧 𝑒 𝑓(𝑧) 𝑑𝑧 ≈ 𝑒 𝑓 𝑧 𝑧 𝑒 − 1 2 𝑓′′( 𝑧 ) ( 𝑧 −𝑧) 2 𝑑𝑧

14 Likelihood statistics
Chi-squared example Pr 𝑥 = 𝑥 𝑘 2 −1 𝑒 −𝑥 2 𝑐 Taking derivatives: 𝑙 𝑥 ∝ 𝑘 2 −1 log 𝑥 − 𝑥 2 𝑙 ′ 𝑥 ∝ 𝑘 2 −1 𝑥 −1 − 1 2 𝑙 ′′ 𝑥 ∝− 𝑘 2 −1 𝑥 −2 Solving for mode and Hessian: 𝑙 ′ (𝑥)=0→ 𝑥 =𝑘−2 𝑙 ′′ 𝑥 =− 1 2 𝑘−2 Hence: 𝑝(𝑥)∝𝑁𝑜𝑟𝑚𝑎𝑙(𝑘−2,2 𝑘−2 )

15 James Thorson (Feb. 28, 2010)

16 Likelihood statistics
Bottom line log⁡ 𝐿 𝜃 𝐷 ≅ log 𝑃𝑟(𝐷|𝜃,𝜀) − 1 2 log det⁡(𝐻) Where 𝑃𝑟 𝐷 𝜃,𝜀 =𝐿 𝜃 1 ,𝜀 𝐷 𝜋 𝜀 𝜃 2 And 𝐻= 𝜕 2 𝜕 𝜃 2 𝑃𝑟 𝐷 𝜃,𝜀

17 Likelihood statistics
Examples of hierarchical models from ecology Avoiding pseudo-replication Split-plot designs Tag-resighting Tag histories are “random” and marginalized across State-space R. Millar, deValpine, etc… Meta-analyses Myers demonstrations in the late 1990s

18 Automatic differentiation and TMB
Outline Likelihood statistics Automatic differentiation and TMB TMB examples Fournier, D.A., Skaug, H.J., Ancheta, J., Ianelli, J., Magnusson, A., Maunder, M.N., Nielsen, A., and Sibert, J AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optim. Methods Softw. 27: 1–17. doi: /

19 Automatic differentiation
Automatic differentiation is NOT: Symbolic differentiation E.g., what we learned in intro. calculus… … because this is computationally inefficient Numerical differentiation e.g., finite differences… … because this results in rounding errors that propogate in complicated models

20 Automatic differentiation
Automatic differentiation has two modes…. Forward mode Intuitive way of calculating derivatives via chain rule Reverse mode Programming voodoo …and TMB uses both

21 James Thorson (Feb. 28, 2010)

22 TMB Steps during optimization
Choose initial values for fixed θ and random ε “Inner optimization” – Optimize random effects with θ held constant 𝜀 = argmax 𝜀 𝐿 𝜃 1 ,𝜀 𝐷 𝜋 𝜀 𝜃 2 Calculate Laplace approx. for marginal likelihood of fixed effects 𝜃 = argmax 𝜃 log 𝑃𝑟(𝐷|𝜃, 𝜀 ) − 1 2 log det⁡(𝐻) TMB also provides the gradient of the penalized likelihood with respect to fixed effects “Outer optimization” – Repeat steps 2-3 Outer optimization is done in R using the function value and gradient provided by TMB

23 Outline Likelihood statistics Automatic differentiation and TMB
TMB examples Examples Linear mixed model Spatial Gompertz model Catch-curve stock reduction analysis

24 TMB examples [See example 1 – linear mixed model] SEE: “Example1_linear_mixed_model.R”

25 [See example 2 – spatial Gompertz model]
TMB examples [See example 2 – spatial Gompertz model] SEE: [github]\adcomp\tmb_examples\spatial_gompertz_estimation_exa mple.R Thorson, J.T., Skaug, H., Kristensen, K., Shelton, A.O., Ward, E.J., Harms, J., and Benante, J. In press. The importance of spatial models for estimating the strength of density dependence. Ecology. doi:

26 Random fields Correlelogram!

27 Geostatistical tool “Conventional” state-space models
𝑛 𝑡+1 =𝑓 𝑛 𝑡 + 𝜀 𝑡 𝜀 𝑡 ~𝑁𝑜𝑟𝑚𝑎𝑙(0, 𝜎 2 ) State-space models using random fields 𝒩 𝑡+1 =𝑓 𝒩 𝑡 + ℰ 𝑡 ℰ 𝑡 ~𝐺𝑅𝐹 0,𝒞 Which means that… 𝒏 𝑡+1 𝐱,𝐲 =𝑓 𝒏 𝑡 𝐱,𝐲 + 𝜺 𝑡 𝜺 𝑡 𝐱,𝐲 ~𝑀𝑢𝑙𝑡𝑖𝑣𝑎𝑟𝑖𝑎𝑡𝑒.𝑁𝑜𝑟𝑚𝑎𝑙 0,𝚺 𝐱,𝐲

28 𝚬 𝑡 ~𝑀𝑢𝑙𝑡𝑖𝑣𝑎𝑟𝑖𝑎𝑡𝑒.𝑁𝑜𝑟𝑚𝑎𝑙 0,𝚺
Where the inverse-covariance matrix Σ-1 𝚺 −1 =𝐊 𝐌 0 −1 𝐊 𝐊= 𝜅 2 𝐌 0 + 𝐌 1 Where: K is a diffusion operator M0 is the area around each point M1 is a measure of adjacency and distance Which can reduce to a linear form: 𝚺 −1 = 𝜅 4 𝐌 0 +2 𝜅 2 𝐌 1 + 𝐌 2 𝐌 2 = 𝐌 1 𝐌 0 −1 𝐌 1

29 TMB examples [See example 3 – catch curve stock reduction analysis]
Thorson, J.T., and Cope, J.M. In press. Catch curve stock-reduction analysis: an alternative solution to the catch equation. Fish. Res.

30 Acknowledgements Kasper Kristensen and Hans Skaug for development of core TMB code Dave Fournier for original application of AD in fisheries Eric Ward, Kotaro Ono, Ole Shelton, Darcy Webber, and others for help with spatial modelling and TMB


Download ppt "Introduction to template model builder, an improved tool for maximum likelihood and mixed-effects models in R James Thorson."

Similar presentations


Ads by Google