Presentation on theme: "Introduction to Program MARK"— Presentation transcript:
1Introduction to Program MARK Stephen J. DinsmoreIowa State University
2Lecture outline Introduction – modeling and inference Parameters Model structureThe input filePIMs, design matrices and more in MARKAnalysis tips
3Motivation for modeling What are your goals? To just “analyze data”, or do you seek a deeper understanding of a complex process?What questions are you interested in answering?How will you use the information?By definition, a model is an approximation of truth and not truth itself!
4Introduction Models, modeling, and estimation Process: capture, tag, release, recaptureThe “art” is balancing effort across each of these categories
5Population characteristics Open versus closed populationsAssumptionsResults of assumption violationsUnderstanding this distinction is a critical step in the modeling process.
6Methods for marking Leg bands, neck collars Standard i.d. tags PIT tagsRadio collars/transmittersCamera “traps”
7Encounter techniques Live resightings (mainly birds) Live captures (sturgeon, many others)Dead recoveries (waterfowl)
8Summarizing encounters Release and recapture data for each animal are summarized in an encounter history.A separate encounter history should be constructed for each animal.Encounter histories consist of strings of 1’s (animal was encountered) and 0’s (not encountered) in most cases.
9What can we estimate? Survival (S; or apparent survival ) Population size (N)Emigration/immigration (γ″, γ′)Movement probabilities (δ)Reproduction/recruitment (F)Rate of population change (λ)Occupancy rate (ψ)
10Models in MARK Live encounters (Cormack-Jolly-Seber) Dead recoveries (band recovery)Joint live and dead encountersKnown fate (radio telemetry)Closed capturesRobust designMulti-strataPradel lambda modelsPatch occupancyNest survivalAnd the list is still growing…
11Features of MARK Parameter estimation (model averaging) Multiple attribute groups (age, sex classes)Individual, group, and time covariatesUnequal time intervalsAIC model selectionQuasi-likelihood theory (over-dispersion)
13Basic data – encounter histories LLLLLive recaptures, known fateThis example codes for 4 occasionsLDLDJoint live-dead recoveriesThis example codes for 2 occasions
14Live encounters Example of possible outcomes Seen Release Dead or emigratedLiveNot seenp1-p1-
15Live encounters Example – 5 encounter occasions LLLLL (1=encountered, 0=not encountered)Estimate (apparent survival; time t to time t-1) and p (conditional capture probability; time 2 to time t).Last and last p are confounded without some constraint on one of them (MARK reports them as a product).
16Live encounters Example encounter histories 11111 - 1p22p33p44p5
17Model assumptionsTagged individuals are representative of the population of interest.Numbers of releases are known.Tagging is accurate, no tag loss, no misread tags, etc.Releases are “instantaneous” (relative to time interval between releases).Fates of individuals and cohorts are independent.Individuals in each identifiable group (age or sex class, etc.) have the same survival and capture probability.
18Dead recoveries Example of possible outcomes Reported Not reported ReleaseLiveDieS1-Sr1-r
19Dead recoveries Example – 3 encounter occasions LDLDLD Estimate survival (S) and reporting probability (r).
20Dead recoveries Example encounter histories 100001 - S1S2(1-S3)r3 (1-S3)(1- r3)+S3
21Joint live-dead Example of possible outcomes Release Live Die S 1-S Reportedr1-rNot reportedSeenp1-pNot seen
22Joint live-dead Example – 3 encounter occasions LDLDLD Estimate survival (S), reporting probability (r), capture probability (p), and fidelity (F).
23Known fateMainly used for radio telemetry data where capture probability is 1.0.Example of possible outcomesLiveReleaseDieS1-S
24Known fate Example – 4 encounter occasions LDLDLDLD Estimate survival (S) only
25Known fate Example encounter histories 10101010 - S1S2S3S4
26Closed captures Example of possible outcomes Seen c Seen Not seen p ReleaseSeenp1-pNot seen1-pNot seen
27Closed captures Example – 4 encounter occasions LLLL Estimate initial capture (p) and recapture (c) probabilities and population size (N).
28Closed captures Example encounter histories 1111 - p1c2c3c4 (1-p1)p2(1-c3)c4
29Robust designUseful model that incorporates features of open and closed C-R theory.Can estimate all survival rates (i-1), not just i-2 as with CJS model.Estimate of population size for each primary sampling period.Can estimate temporary emigration (γ).
33Getting started in MARK The input fileRequired: encounter history, frequency, always ends with a ;Optional: comment area (/* comment */), covariatesCan be coded as 1 individual per line, or summarized with multiple individuals per line or in m-arrays
34Examples of input data CJS model 1111110 1 0 ; Robust design ;Robust design;Nest survival/*BYWG B */ ;
35Program MARKMany of the procedures I’ll demonstrate can be done in more than one way in MARK!Examples include building models using PIMs or design matrices (I prefer the latter) and selecting model(s) for inference.
36MARK vocabulary PIMs (Parameter Index Matrices) Design matrices Link functionsAIC (Akaike’s Information Criterion)Model selection
37A priori biological hypotheses Model building in MARKA priori biological hypothesesPIMdesign matrix (β)link functionreal parameters
38About PIMsPIMs provide one means of constraining the parameters in a model.Each PIM indexes a different parameter for each group (for live recaptures data with 3 groups, there will be 3 PIMs for apparent survival and 3 PIMs for recapture probability).Remember: the values in the PIMs correspond to estimable parameters and NOT the number of occasions.I recommend leaving the PIMs alone, unless you need to add age effects.
43Design matricesA useful way to further “constrain” the parameters as they appear in the PIMs.The only way to introduce time trends (linear, etc.) and covariates into models.The structure of the design matrix will depend on constraints placed in the PIMs.
44Design matricesBasic concept: MARK allows the user to apply a linear regression model as a constraint on any parameter (e.g., survival) with the use of a design matrix.Here, the response variable (a rate such as survival) is expressed as a linear regression function of 1 or more factors.
45Design matrices Basic linear model is Y = Xβ + ε Y is the response variable (e.g., survival)X is a vector of “dummy” variables (1’s and 0’s)β is the slopeε is a vector of random error terms
46Design matrix - example Suppose we want to determine if male and female Mourning Doves have different survival rates.In linear regression, we have Yi = β0 + β1xi + εiEach variate Yi is the sum of the intercept (β0), the product of the slope (β1) and the variable x (xi), and the random error term (εi).But what is xi? It is a “dummy” variable specifying sex (for example, 0 for male, 1 for female).The test is whether the slope (β1) is different from zero.If β1 is not different from 0, then no sex effect.
47Design matricesIn MARK, rows in the design matrix correspond the parameters set in the PIMs and columns correspond to the βi.You cannot add any structure to the design matrix that is missing from the PIMs (hence my preference to leave the PIMs alone except to add age effects).
48Link functionsThe rates (e.g., survival) in the linear regression model must be transformed in MARK.Several transformations are available, each having different properties.In MARK, we will primarily use the logit and sin link functions.MARK is good at setting the default link function for you.
50CovariatesIndividual - some unique characteristic of each individual in the population such as body mass at capture or fork length.Group– a characteristic of the entire group such as sex or age class.Time – a unique time-specific characteristic such as river flow data or temperature.
51Covariates Every study should be incorporating covariates! Some recommendations:Individual covariates often apply to survival (e.g., mass at capture, size measures, fitness measures, habitat, etc.).Group covariates can affect survival (e.g., weather) or capture probability (e.g., effort).Time covariates often influence capture.
52Goodness-of-fit This is an area where further work is needed. Best overall goodness-of-fit test is in Program RELEASE (included in MARK).GOF for CJS model based on results of Tests 2 and 3 in RELEASE.No good GOF tests for complex models.Ad hoc procedure for robust design.
53AIC model selection AIC = Akaike’s Information Criterion From Information Theory, which is one of many ways to objectively assess the relative importance of a set of models.Remember – the AIC best model is not “the model”, but rather is the model within the set that had the best support, given the data.
54AIC model selection AIC = -2ln(L) +2K L is the likelihood of the model and K is the number of parameters in that model.A smaller log likelihood means a better fit.The +2K term is a “penalty” for adding more parameters, although this is balanced by an improved model fit.Message: There is an important trade-off between fit and # of parameters, and AIC provides an objective means of balancing this.
55Quasi-likelihood theory QAIC - a way to account for over-dispersion in the data.Over-dispersion results from a lack of independence, e.g., animals that travel in family groups.In MARK, we use ad hoc procedures to estimate c (a variance inflation factor).Result: variance is inflated.
56Model averagingWhich parameter estimate do you report when you have estimates from 10 models? The estimate from the best model? All estimates? Or, some “average”?Model averaging incorporates this model selection uncertainty into parameter estimates.Best used when there are several competing models (Δ-AIC <2).
57Number of parametersWith complex models, MARK has a hard time correctly counting the number of parameters when parameter estimates are close to a boundary (e.g., near 0 or 1).Sin link function is best, logit link function sometimes performs poorly.Message: always check MARK to be sure parameters are counted correctly.
58Model notation Describe models concisely (limited space in MARK). Some basic nomenclature:Full time variation (t)Linear time trends (T)No variation (.)Additive effects (t+temperature)Multiplicative effects (group*t)
59Model notation Examples (.) means 1 = 2 = 3 … (t) means 1, 2, … t-1 (t+Mass) means 1, 2, … t-1 are each a function of body mass
60Model notation Keep model names simple, but descriptive Parameters are written with sources of variation listed in parentheses. (t) p (t) (T+Weight) p (t+effort) (t*group) p (T)
61How do we “test” for effects? For example, how would we know that weight influenced the survival of bird?Need to consider models with and without weight.Model selection results:Are models with weight among the “best” in the model set?Look at the β for weight – does its confidence interval overlap zero?Likelihood ratio testsRemember this is all conditional on the model set.
62Developing candidate models Inference is conditional on the set of models we consider.Considerable effort should go into developing a concise set of models for consideration. How many? Typically, 5-20 models will suffice.Models should address realistic questions and should not include factors known to be unimportant.Message: use what you already know.
63Study design considerations Trade-off between sample marked and recapture probability.What is an adequate sample size?Consider question to be asked – estimate population size, or survival, or lambda?
64Discussion Additional discussion topics: Model assumptions and the results of assumption violations.Developing the set of candidate models.Selecting the appropriate model for analyses.Others?This afternoon – an example in MARK.
66Patch occupancy Presence-absence data Define a “patch” – ponds, islands, plots, etc.Multiple visits to each siteAssumes closure during sampling periodParameters:Ψpε and γ (robust design only)
67Patch occupancy Modeling details: Robust design formulations: Handles missing visits (coded as “.” in EH)CovariatesRobust design formulations:Psi and epsilonPsi and gammaPsi(1), epsilon, gamma
68Nest survival Required data for each nest: The day the nest was found (k).The last day the nest was checked alive (l).The last day the nest was checked (m).Nest fate (0 = successful, 1 = failure) (f).The number of nests with this encounter history.
70Coding the data Coding the triplet k, l, and m: k=1, l=3, m=5, fate=1 → S1S2 [1-S3S4]k=1, l=3, m=3, fate=0 → S1S2k=1, l=3, m=3, fate=1 is invalid (can’t be alive and dead on day 3)k=1, l=1, m=3, fate=1 → [1-S1S2]k=1, l=1, m=1, fate=0 or 1 is invalid (nest was active only on day 1)See MARK help file for more details
71Model assumptions Homogeneity of daily nest survival rates. Nest fates are independent.All visits to nests are recorded.Nest discovery and subsequent checks do not influence nest survival.Nest checks are independent of fate.Nest fates are correctly determined.Age of nest of at discovery.
72Estimate nesting success? For constant nest survival, period success is DSR exponentiated to period lengthWhat happens if there is:Temporal variation in nest survival?Covariates?A combination of both?
73Temporal variation Which 10-day interval provides the “best” estimate of nest success?10 days10 days
74Getting the “best” estimate Need a start date for “best” estimate – when?Does simple mean work?What about bias between observed and true nest initiation dates?Use Horvitz-Thompson estimator to correct for this bias.
75Other considerations Stage-specific survival Divide EH into parts: Incubation – ;Nestling – ;Nest age
76Model-based predictions MARK provides a regression equation that can be used for predictionsLogit Smale = *1, Smale =Logit Sfemale = *0, Sfemale =