Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparison of Design-Based and Model-Based Techniques for Selecting Spatially Balanced Samples of Environmental Resources. Don L. Stevens, Jr. Department.

Similar presentations


Presentation on theme: "Comparison of Design-Based and Model-Based Techniques for Selecting Spatially Balanced Samples of Environmental Resources. Don L. Stevens, Jr. Department."— Presentation transcript:

1 Comparison of Design-Based and Model-Based Techniques for Selecting Spatially Balanced Samples of Environmental Resources. Don L. Stevens, Jr. Department of Statistics Oregon State University

2 The research described in this presentation has been funded by the U.S. Environmental Protection Agency through the STAR Cooperative Agreement CR Program on Designs and Models for Aquatic Resource Surveys at Oregon State University. It has not been subjected to the Agency's review and therefore does not necessarily reflect the views of the Agency, and no official endorsement should be inferred R

3 Preview Two conceptual frameworks to support inference from sample properties to population characteristics: model-based & design-based Both encompass inference and sample selection methodologies Both sets of selection methodologies have techniques to incorporate prior information and knowledge

4 Preview Conjecture: With same prior information & knowledge, probabilistic samples can be near-optimal judged by model-based criteria Conjecture: Probabilistic samples can be more robust than optimal model-based samples

5 Preview Claim: With same prior information & knowledge, probabilistic samples can be near-optimal judged by model-based criteria Claim: Probabilistic samples can be more robust than optimal model-based samples There’s a catch: what is “optimal”?

6 Study Context Environmental monitoring and assessment application, particularly aquatics Response is a condition measure –Water quality –Chemical contamination –Biological quantity, e.g., IBI –Physical habitat metric –Salmon population levels

7 Study Context Populations distributed over space Sample sites will be visited more than once, possible over a period of many years Overall sample may be split into panels

8 Study Context Environmental populations have spatial structure –Things close together tend to be influenced by same set of factors –Things close together tend to share similar substrates

9 Study Context Environmental populations have spatial structure –Things close together tend to be influenced by same set of factors –Things close together tend to share similar substrates But the spatial structure is almost certainly not stationary

10 Study Context Structure may be patchy rather than smoothly changing –Localized management practices –Localized contamination –Localized development –Natural discontinuities Slope Substrate: soil, geology Watercourse

11 Study Context Most populations of interest have existing samples in place –Frequently convenience samples –Preservation of historical continuity important Most large populations (e.g., covering a substantial portion of a state) will have accessibility issues

12 Study Context Some portions of the population will require a higher intensity sample –Scientific, economic, or political interest Sample allocation may need to be modified –Emerging issues –Problems solved

13 Study Strategy Compare some techniques for optimal model-based design to Generalized Random Tessellation Stratified (GRTS) design Various scenarios: –Variety of optimality criteria –Existing sample points –Variable interest  variable spatial density –Inaccessible regions a priori & a posteriori

14 Optimality Criteria Statisticians think “best” means minimum variance, so optimum design is one that gives the minimum variance estimator, but…..

15 Optimality Criteria Statisticians think “best” means minimum variance, so optimum design is one that gives the minimum variance estimator, but….. Not always straightforward to decide on appropriate variance!

16 Optimality Criteria For example, suppose we need a value for –Usual approach in spatial statistics is to use kriging to “predict” a mean, so we need the prediction variance –But we need a variogram to krige, which usually has to be estimated assuming some model, so we should include variogram parameter uncertainty –But what about variogram model itself?

17 Optimality Criteria For example, suppose we need a value for –Usual design-based is to minimize sampling variance over repeated sample selections –But even the design-based variance is dependent on spatial structure –So we could adopt super-population model, and minimize expected variance Which puts us back in the spatial stats arena

18 Optimality Criterion Minimal assumptions: Points that are close together contain redundant information, so we want a design that gives maximal dispersion A point pattern that is “regular” in the stochastic point process sense gives maximal dispersion Thus, we need to look at regularity criteria to select optimality criterion

19 Study Strategy Compare using several optimality criteria –Regularity of point process K-function Von Groenigen & Stein MMSD Fractal dimension Mean square deviation of distance to side, vertex, boundary of Voronoi polygon –Variance of estimated population mean Over replicated sample selection Over replicated population realizations With models for non-stationary mean structure

20 Optimal Design Number of recent papers have used spatial simulated annealing to locate optimal sampling points –Begin with a random set of points –Cycle through points,perturbing one at a time –At each step, calculate an optimality criterion –If better than old optimum, keep –If worse, accept with some probability that decreases with the number of cycles

21 Optimal Design Van Groenigen & Stein MMSD –Minimized the Mean Shortest Distance: S a set of sample points, x a point in target domain D, let d(x,S) be the distance from x to the nearest point in S. Then –Note that for C(s) the Voronoi polygon of s

22 Optimal Design Ripley’s K function: K(r) : average number of additional sites within radius r of a site divided by the intensity of the process

23 Optimal Design Di Zio, Fontanella & Ippoliti used a measure related to the fractal dimension: Let D be the slope of the best fitting line produced when log(K(r)) is regressed against log(r ) As sites become more evenly dispersed, D should approach 2, so 2-D is a measure of irregularity.

24 Optimal Design Proposed criterion: Let B(C(s)) be the boundary of the Voronoi polygon of s. Define SVB is approximated by the MSD distance from a sample point to Sides, Vertices, and Boundaries relative to a nominal value (Side is an edge that separates two sample points; a boundary is an edge determined by the domain)

25

26 Point Pattern Comparison Examine point patterns with 50 points in the unit square Show Voronoi polygons for each point pattern

27 Optimal Design MMSD seemed very slow to compute SVB seems to be comparable to MMSD, but much quicker to compute

28

29

30

31

32

33

34

35

36

37 Existing Points SSA can optimize placement of new sample points given some existing points Can do something similar with GRTS: –Determine limits on grid resolution & placement such that existing points are all in distinct cells –Do GRTS design conditional on those limits, and “select” cells with existing points

38 Existing Points Illustrate with 25 point design –Unrestricted –5 points fixed, 20 unrestricted

39

40

41

42

43

44

45

46

47 Simulation Study Model-based approach: vary the surface, not the sample Created a patchy surface by “mixing” 3 smooth surfaces: a plane, a normal density, and a surface with several bumps, plus random noise

48

49

50

51 Simulation Study Patches were random tessellations of the unit square, generated as Voronoi polygons of 10 random points

52

53

54 Simulation Study Generated 1000 replicates of the random surface Sample each replicate with the Uniform Random, Fractal, SVB, and GRTS design points Calculate mean for each replicate, & variance of estimated mean over all replicates

55

56 Simulation Study UniformGRTSLogK Mean Variance SVB

57

58 Mean Structure Model Express the response as where m( s) is mean structure, and z(s) is a random field (hopefully stationary) Following a suggestion by Cressie, we’ll use a model based on applying a median polish to determine mean structure

59 Mean Structure Model Median polish is analogous to ANOVA, in that the mean is expressed as sum of overall, row, & column effects Effects are estimated in an iterative procedure: –Extract row-wise medians –Extract column-wise medians –Add sum of median of row medians & median of column medians to overall effect –Iterate several times.

60 Mean Structure Model Median polish will extract some kinds of structure, but doesn’t handle a patch-like response Try CART, with x,y coordinates as “classifying” variables

61 Example Data Set ODFW Coho Salmon spawners –Basic response is density (fish/km) of adult fish at a site –Pooled data set over five years –Normalized each year by total number of fish counted that year –Response is then proportion of total run at the site

62

63 Example To fit into median polish/CART framework, we binned x,y coordinates and straightened the coastline

64


Download ppt "Comparison of Design-Based and Model-Based Techniques for Selecting Spatially Balanced Samples of Environmental Resources. Don L. Stevens, Jr. Department."

Similar presentations


Ads by Google