Presentation on theme: "1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating."— Presentation transcript:
1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating the number of chinook redds on a river network*? –estimation of status – number of spring-chinook redds in Middle Fork Salmon River one year –Measurement design – we are not really thinking about the measurement design, we assume we have some way to identify and count redds once you get to a location.
3 Redd data – the Truth IDFG Dataset (Russ Thurow) counted the number of redds in the Middle Fork Salmon River via helicopter All spawning reaches were censused each year sampling was done by helicopter and where necessary by foot Six years of data 1995-1998, 2001, and 2002 These data can be considered the truth year199519961997199820012002 Total redds208342466117891730
4 Objectives Compare several designs to see if one estimates the number of redds (and only redds) the best –unbiased designs (estimators) –best determined by standard error of estimator coverage probability (how many times 95% confidence interval actually contains the number of redds) cost –Keep things fair by sampling the same total length of stream, the index covers 976 segments or 195.2 km. of stream. Does not imply equal cost Although some standard errors can be calculated analytically the coverage needs to be addressed via simulation.
5 Methods Compare sampling strategies using IDFG data as the truth. Sampling strategies include sampling design and estimator...... sample design Estimator for the total And confidence interval
6 Methods Use simulation by resampling the population over and over......
7 Cost & Crew-trips Each segment gets an access pt. Travel to access sites based on whether –airplane –Auto Travel from access sites to sampling reaches is the maximum distance from access site to furthest sampling reach in each direction along a tributary Cost = Fn(km by foot) 4 round trips required
8 distances in 5km intervals. Many areas require over 20 km hike Maximum distance is 33 km.
9 The sampling designs Index – sample the index reaches Simple random sampling – using the unbiased estimator Systematic sampling – sort tributaries in random order, systematically sample along resulting line. Stratify by Index – Sample independently within and outside the index regions. Adaptive cluster sampling – Choose segments with a simple random sample. If sampled sites have redds sample adjacent segments. Spatially balanced design – Based on EMAP design, though selecting segments within primary sampling units rather than points (not yet implemented)
10 Index sampling When the sample size is smaller than the overall size of the index region a simple random sample of the segments within the index is assumed. Two possibilities to estimate the number of redds from the index sample: 1.Assume there are no redds outside of the index – estimates will be too small. 2.Assume that the average number of redds per segment outside the index is the same inside and simply inflate the index estimator – estimates will be too large.
11 Bias of Inflating Estimator from Index Sample Redds
12 Systematic sampling Order the tributaries in random order along a line Choose sampling interval, k, so that final sample size is approximately n Select a random number, r, between 1 and k Sample reaches r, r+k, r+2k, …, r+(n-1)k Systematic sampling is cluster sampling where clusters are made up of units far apart in space and one cluster is sampled k rr+kr+2kr+4kr+3k
13 Stratify by Index Stratify by index and oversample index reaches Simple random sample in each stratum Allocation: –Equal allocation: Usually does not perform well –Proportional allocation: Does not oversample index sites so will probably not have good precision –Optimal allocation: need to know the standard deviation year199519961997199820012002 proportion in index 0.760.540.48 0.420.46
14 Adaptive cluster sampling Original sample is simple random sample If sampled site meets criteria also sample sites in neighborhood –Criteria: presence of redds –Neighborhood: segments directly upstream and downstream Continue until sites do not meet criteria –Both legs of confluences 1 3 2 5 4 6 in original sample 2 Meets criteria 4 include neighbor 6 and do not meet criteria 13 Final sample includes: 21346
15 Design208342466117891730 SRS39.018.69.98.27.37.1 Cluster (1km)47.722.715.112.712.512.1 SYS24.917.19.28.05.06.1 STRS (optimal)18.104.22.168.26.76.2 ADAPT39.318.810.57.68.78.4 Results: Normalized standard error of estimators Run size SRS86.693.094.595.393.394.3 Cluster (1km)89.091.292.592.692.994.3 SYS96.495.297.095.099.097.3 STRS (optimal)92.095.094.395.592.993.5 ADAPT87.694.894.794.994.793.9 Coverage Probability (.95)
17 Precision per cost (10% sampling fraction) big is good: high precision per km traveled run size Precision per cost
18 Conclusions Stratifying by index results in the most precise estimates except in the large runs where systematic sampling seems to work best. The index sites should be oversampled in the stratified design. Proportional allocation (based on the size of the strata) results in poor precision. Although the systematic sampling strategy often is the most precise, there is not a good estimator for the variance. The estimator that assumes a simple random sample is conservative. Same pattern for different sampling fractions.
19 Conclusions The cluster sampling design is not very precise but reduces costs significantly. Adaptive cluster sampling is not as precise as other designs. –It is optimal for rare clustered populations –during small years the redds are not clustered enough –during large years they are not rare enough –only during the medium years does it compete with other designs. When cost and precision are analyzed together –small runs – either stratified by index or SRS-1km work best –large runs – either systematic or stratified by index work best
20 Not yet finished EMAP type design. successive difference variance estimator for the systematic sampling Adaptive sampling with same initial sample size (cost function does not penalize this much) Cost function –including road travel –crew trips/day units
21 Points vs. Lines Pick points -- points are picked along stream continuum and the measurement unit is constructed around the point advantages: –different size measurement units are easily implemented disadvantages: –difficulty with overlapping units –inadvertent variable probability design because of confluences and headwaters –Analysis may be complicated Pick Segments – Universe is segmented before sampling and segments are picked from population of segments advantages: –simple to implement –simple estimators disadvantages: –Difficult frame construction before sampling –Cannot accommodate varying lengths of sampling unit
22 Adaptive Cluster Sampling Use the draw-by-draw probability estimator: –Let w i be the average number of redds in the network of which segment i belongs, then –with variance Thompson 1992
23 COSTS Our costs are based on the number of kilometers traveled by foot. Each segment in the MF is assigned to an access point (this is not optimized in some rare instances the assigned access point is not the closest) and the distance along the stream from that access point is calculated There are two types of access points air fields and trailheads. For this exercise they both have the same price. Because we are tallying the number of km. along the streams, this cost function also models other types of sampling including via helicopter and raft.
30 Index sample Not sure how to build estimates for total number of redds in Middle fork. –expand current estimator (assume same density outside of index) –use current estimate (assume 0 redds outside of index) year199519961997199820012002 Number counted in Index 196229044811781199 Total number of redds 208342466117891730
32 Stratify by Index Oversample index sites where most redds are located Simple random sample in each stratum Equal allocation: Proportional allocation: year199519961997199820012002 5.3312.6836.6147.34121.43106.98 coverage 90.494.694.294.892.993.4 year199519961997199820012002 7.7715.2641.0852.37124.90115.56 coverage 88.094.795.094.994.493.6
33 Stratify by index Optimal allocation Using year199519961997199820012002 proportion in index 0.760.540.48 0.420.46 n index 746530475464407445 n other 230446501512569531 year199519961997199820012002 5.4912.7636.6047.26120.50106.58 coverage 92.095.094.395.592.993.5
34 Stratify by index Using year199519961997199820012002 n index 746530475464407445 n other 230446501512569531