Presentation on theme: "Introduction to Program MARK"— Presentation transcript:
1 Introduction to Program MARK Stephen J. DinsmoreIowa State University
2 Lecture outline Introduction – modeling and inference Parameters Model structureThe input filePIMs, design matrices and more in MARKAnalysis tips
3 Motivation for modeling What are your goals? To just “analyze data”, or do you seek a deeper understanding of a complex process?What questions are you interested in answering?How will you use the information?By definition, a model is an approximation of truth and not truth itself!
4 Introduction Models, modeling, and estimation Process: capture, tag, release, recaptureThe “art” is balancing effort across each of these categories
5 Population characteristics Open versus closed populationsAssumptionsResults of assumption violationsUnderstanding this distinction is a critical step in the modeling process.
6 Methods for marking Leg bands, neck collars Standard i.d. tags PIT tagsRadio collars/transmittersCamera “traps”
7 Encounter techniques Live resightings (mainly birds) Live captures (sturgeon, many others)Dead recoveries (waterfowl)
8 Summarizing encounters Release and recapture data for each animal are summarized in an encounter history.A separate encounter history should be constructed for each animal.Encounter histories consist of strings of 1’s (animal was encountered) and 0’s (not encountered) in most cases.
9 What can we estimate? Survival (S; or apparent survival ) Population size (N)Emigration/immigration (γ″, γ′)Movement probabilities (δ)Reproduction/recruitment (F)Rate of population change (λ)Occupancy rate (ψ)
10 Models in MARK Live encounters (Cormack-Jolly-Seber) Dead recoveries (band recovery)Joint live and dead encountersKnown fate (radio telemetry)Closed capturesRobust designMulti-strataPradel lambda modelsPatch occupancyNest survivalAnd the list is still growing…
11 Features of MARK Parameter estimation (model averaging) Multiple attribute groups (age, sex classes)Individual, group, and time covariatesUnequal time intervalsAIC model selectionQuasi-likelihood theory (over-dispersion)
13 Basic data – encounter histories LLLLLive recaptures, known fateThis example codes for 4 occasionsLDLDJoint live-dead recoveriesThis example codes for 2 occasions
14 Live encounters Example of possible outcomes Seen Release Dead or emigratedLiveNot seenp1-p1-
15 Live encounters Example – 5 encounter occasions LLLLL (1=encountered, 0=not encountered)Estimate (apparent survival; time t to time t-1) and p (conditional capture probability; time 2 to time t).Last and last p are confounded without some constraint on one of them (MARK reports them as a product).
16 Live encounters Example encounter histories 11111 - 1p22p33p44p5
17 Model assumptionsTagged individuals are representative of the population of interest.Numbers of releases are known.Tagging is accurate, no tag loss, no misread tags, etc.Releases are “instantaneous” (relative to time interval between releases).Fates of individuals and cohorts are independent.Individuals in each identifiable group (age or sex class, etc.) have the same survival and capture probability.
18 Dead recoveries Example of possible outcomes Reported Not reported ReleaseLiveDieS1-Sr1-r
19 Dead recoveries Example – 3 encounter occasions LDLDLD Estimate survival (S) and reporting probability (r).
20 Dead recoveries Example encounter histories 100001 - S1S2(1-S3)r3 (1-S3)(1- r3)+S3
21 Joint live-dead Example of possible outcomes Release Live Die S 1-S Reportedr1-rNot reportedSeenp1-pNot seen
22 Joint live-dead Example – 3 encounter occasions LDLDLD Estimate survival (S), reporting probability (r), capture probability (p), and fidelity (F).
23 Known fateMainly used for radio telemetry data where capture probability is 1.0.Example of possible outcomesLiveReleaseDieS1-S
24 Known fate Example – 4 encounter occasions LDLDLDLD Estimate survival (S) only
25 Known fate Example encounter histories 10101010 - S1S2S3S4
26 Closed captures Example of possible outcomes Seen c Seen Not seen p ReleaseSeenp1-pNot seen1-pNot seen
27 Closed captures Example – 4 encounter occasions LLLL Estimate initial capture (p) and recapture (c) probabilities and population size (N).
28 Closed captures Example encounter histories 1111 - p1c2c3c4 (1-p1)p2(1-c3)c4
29 Robust designUseful model that incorporates features of open and closed C-R theory.Can estimate all survival rates (i-1), not just i-2 as with CJS model.Estimate of population size for each primary sampling period.Can estimate temporary emigration (γ).
33 Getting started in MARK The input fileRequired: encounter history, frequency, always ends with a ;Optional: comment area (/* comment */), covariatesCan be coded as 1 individual per line, or summarized with multiple individuals per line or in m-arrays
34 Examples of input data CJS model 1111110 1 0 ; Robust design ;Robust design;Nest survival/*BYWG B */ ;
35 Program MARKMany of the procedures I’ll demonstrate can be done in more than one way in MARK!Examples include building models using PIMs or design matrices (I prefer the latter) and selecting model(s) for inference.
36 MARK vocabulary PIMs (Parameter Index Matrices) Design matrices Link functionsAIC (Akaike’s Information Criterion)Model selection
37 A priori biological hypotheses Model building in MARKA priori biological hypothesesPIMdesign matrix (β)link functionreal parameters
38 About PIMsPIMs provide one means of constraining the parameters in a model.Each PIM indexes a different parameter for each group (for live recaptures data with 3 groups, there will be 3 PIMs for apparent survival and 3 PIMs for recapture probability).Remember: the values in the PIMs correspond to estimable parameters and NOT the number of occasions.I recommend leaving the PIMs alone, unless you need to add age effects.
43 Design matricesA useful way to further “constrain” the parameters as they appear in the PIMs.The only way to introduce time trends (linear, etc.) and covariates into models.The structure of the design matrix will depend on constraints placed in the PIMs.
44 Design matricesBasic concept: MARK allows the user to apply a linear regression model as a constraint on any parameter (e.g., survival) with the use of a design matrix.Here, the response variable (a rate such as survival) is expressed as a linear regression function of 1 or more factors.
45 Design matrices Basic linear model is Y = Xβ + ε Y is the response variable (e.g., survival)X is a vector of “dummy” variables (1’s and 0’s)β is the slopeε is a vector of random error terms
46 Design matrix - example Suppose we want to determine if male and female Mourning Doves have different survival rates.In linear regression, we have Yi = β0 + β1xi + εiEach variate Yi is the sum of the intercept (β0), the product of the slope (β1) and the variable x (xi), and the random error term (εi).But what is xi? It is a “dummy” variable specifying sex (for example, 0 for male, 1 for female).The test is whether the slope (β1) is different from zero.If β1 is not different from 0, then no sex effect.
47 Design matricesIn MARK, rows in the design matrix correspond the parameters set in the PIMs and columns correspond to the βi.You cannot add any structure to the design matrix that is missing from the PIMs (hence my preference to leave the PIMs alone except to add age effects).
48 Link functionsThe rates (e.g., survival) in the linear regression model must be transformed in MARK.Several transformations are available, each having different properties.In MARK, we will primarily use the logit and sin link functions.MARK is good at setting the default link function for you.
50 CovariatesIndividual - some unique characteristic of each individual in the population such as body mass at capture or fork length.Group– a characteristic of the entire group such as sex or age class.Time – a unique time-specific characteristic such as river flow data or temperature.
51 Covariates Every study should be incorporating covariates! Some recommendations:Individual covariates often apply to survival (e.g., mass at capture, size measures, fitness measures, habitat, etc.).Group covariates can affect survival (e.g., weather) or capture probability (e.g., effort).Time covariates often influence capture.
52 Goodness-of-fit This is an area where further work is needed. Best overall goodness-of-fit test is in Program RELEASE (included in MARK).GOF for CJS model based on results of Tests 2 and 3 in RELEASE.No good GOF tests for complex models.Ad hoc procedure for robust design.
53 AIC model selection AIC = Akaike’s Information Criterion From Information Theory, which is one of many ways to objectively assess the relative importance of a set of models.Remember – the AIC best model is not “the model”, but rather is the model within the set that had the best support, given the data.
54 AIC model selection AIC = -2ln(L) +2K L is the likelihood of the model and K is the number of parameters in that model.A smaller log likelihood means a better fit.The +2K term is a “penalty” for adding more parameters, although this is balanced by an improved model fit.Message: There is an important trade-off between fit and # of parameters, and AIC provides an objective means of balancing this.
55 Quasi-likelihood theory QAIC - a way to account for over-dispersion in the data.Over-dispersion results from a lack of independence, e.g., animals that travel in family groups.In MARK, we use ad hoc procedures to estimate c (a variance inflation factor).Result: variance is inflated.
56 Model averagingWhich parameter estimate do you report when you have estimates from 10 models? The estimate from the best model? All estimates? Or, some “average”?Model averaging incorporates this model selection uncertainty into parameter estimates.Best used when there are several competing models (Δ-AIC <2).
57 Number of parametersWith complex models, MARK has a hard time correctly counting the number of parameters when parameter estimates are close to a boundary (e.g., near 0 or 1).Sin link function is best, logit link function sometimes performs poorly.Message: always check MARK to be sure parameters are counted correctly.
58 Model notation Describe models concisely (limited space in MARK). Some basic nomenclature:Full time variation (t)Linear time trends (T)No variation (.)Additive effects (t+temperature)Multiplicative effects (group*t)
59 Model notation Examples (.) means 1 = 2 = 3 … (t) means 1, 2, … t-1 (t+Mass) means 1, 2, … t-1 are each a function of body mass
60 Model notation Keep model names simple, but descriptive Parameters are written with sources of variation listed in parentheses. (t) p (t) (T+Weight) p (t+effort) (t*group) p (T)
61 How do we “test” for effects? For example, how would we know that weight influenced the survival of bird?Need to consider models with and without weight.Model selection results:Are models with weight among the “best” in the model set?Look at the β for weight – does its confidence interval overlap zero?Likelihood ratio testsRemember this is all conditional on the model set.
62 Developing candidate models Inference is conditional on the set of models we consider.Considerable effort should go into developing a concise set of models for consideration. How many? Typically, 5-20 models will suffice.Models should address realistic questions and should not include factors known to be unimportant.Message: use what you already know.
63 Study design considerations Trade-off between sample marked and recapture probability.What is an adequate sample size?Consider question to be asked – estimate population size, or survival, or lambda?
64 Discussion Additional discussion topics: Model assumptions and the results of assumption violations.Developing the set of candidate models.Selecting the appropriate model for analyses.Others?This afternoon – an example in MARK.
66 Patch occupancy Presence-absence data Define a “patch” – ponds, islands, plots, etc.Multiple visits to each siteAssumes closure during sampling periodParameters:Ψpε and γ (robust design only)
67 Patch occupancy Modeling details: Robust design formulations: Handles missing visits (coded as “.” in EH)CovariatesRobust design formulations:Psi and epsilonPsi and gammaPsi(1), epsilon, gamma
68 Nest survival Required data for each nest: The day the nest was found (k).The last day the nest was checked alive (l).The last day the nest was checked (m).Nest fate (0 = successful, 1 = failure) (f).The number of nests with this encounter history.
70 Coding the data Coding the triplet k, l, and m: k=1, l=3, m=5, fate=1 → S1S2 [1-S3S4]k=1, l=3, m=3, fate=0 → S1S2k=1, l=3, m=3, fate=1 is invalid (can’t be alive and dead on day 3)k=1, l=1, m=3, fate=1 → [1-S1S2]k=1, l=1, m=1, fate=0 or 1 is invalid (nest was active only on day 1)See MARK help file for more details
71 Model assumptions Homogeneity of daily nest survival rates. Nest fates are independent.All visits to nests are recorded.Nest discovery and subsequent checks do not influence nest survival.Nest checks are independent of fate.Nest fates are correctly determined.Age of nest of at discovery.
72 Estimate nesting success? For constant nest survival, period success is DSR exponentiated to period lengthWhat happens if there is:Temporal variation in nest survival?Covariates?A combination of both?
73 Temporal variation Which 10-day interval provides the “best” estimate of nest success?10 days10 days
74 Getting the “best” estimate Need a start date for “best” estimate – when?Does simple mean work?What about bias between observed and true nest initiation dates?Use Horvitz-Thompson estimator to correct for this bias.
75 Other considerations Stage-specific survival Divide EH into parts: Incubation – ;Nestling – ;Nest age
76 Model-based predictions MARK provides a regression equation that can be used for predictionsLogit Smale = *1, Smale =Logit Sfemale = *0, Sfemale =