Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

Copula Regression By Rahul A. Parsa Drake University &
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Ch11 Curve Fitting Dr. Deshi Ye
Discrete Probability Distributions
The General Linear Model. The Simple Linear Model Linear Regression.
Review of Basic Probability and Statistics
Chapter 4 Discrete Random Variables and Probability Distributions
1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)
Simulation Modeling and Analysis
Statistics 350 Lecture 14. Today Last Day: Matrix results and Chapter 5 Today: More matrix results and Chapter 5 Please read Chapter 5.
Prediction and model selection
2. Random variables  Introduction  Distribution of a random variable  Distribution function properties  Discrete random variables  Point mass  Discrete.
Inferences About Process Quality
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
Some standard univariate probability distributions
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Lecture II-2: Probability Review
Introduction to Regression Analysis, Chapter 13,
Correlation and Regression
Review of Lecture Two Linear Regression Normal Equation
1 Terminating Statistical Analysis By Dr. Jason Merrick.
Regression and Correlation Methods Judy Zhong Ph.D.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Analysis (2)
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Moment Generating Functions
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Basic Business Statistics.
Some standard univariate probability distributions Characteristic function, moment generating function, cumulant generating functions Discrete distribution.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
1 Statistical Distribution Fitting Dr. Jason Merrick.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Lecture 15: Statistics and Their Distributions, Central Limit Theorem
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
The Mean of a Discrete RV The mean of a RV is the average value the RV takes over the long-run. –The mean of a RV is analogous to the mean of a large population.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
Chapter 01 Probability and Stochastic Processes References: Wolff, Stochastic Modeling and the Theory of Queues, Chapter 1 Altiok, Performance Analysis.
Problem: 1) Show that is a set of sufficient statistics 2) Being location and scale parameters, take as (improper) prior and show that inferences on ……
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Lecture 10: Correlation and Regression Model.
Chapter 01 Probability and Stochastic Processes References: Wolff, Stochastic Modeling and the Theory of Queues, Chapter 1 Altiok, Performance Analysis.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
STOCHASTIC HYDROLOGY Stochastic Simulation of Bivariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
Stochastic Hydrology Random Field Simulation Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Computacion Inteligente Least-Square Methods for System Identification.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Introduction to Probability - III John Rundle Econophysics PHYS 250
CH 5: Multivariate Methods
Spatial Prediction of Coho Salmon Counts on Stream Networks
Ch11 Curve Fitting II.
Algebra Review The equation of a straight line y = mx + b
Presentation transcript:

Simulation of spatially correlated discrete random variables Dan Dalthorp and Lisa Madsen Department of Statistics Oregon State University Outline I. Generating one pair of correlated discrete random variables. (a) Lognormal-Poisson hierarchy (b) Overlapping sums II. Generating a vector of correlated discrete random variables by overlapping sums III. Examples

Introduction Generate Y 1, Y 2 where Y 1, Y 2 have specified means variances and correlation  Y  0 Y 1, Y 2 are count r.v.'s i.e., y = 0, 1, 2,... Distributions of Y 1, Y 2 are unimodal, Poisson-like If  2 < , then both  2 and  are small

Lognormal-Poisson Method For Generating Y 1 and Y 2 Generate correlated normal RVs Z 1, Z 2 Transform to lognormals X i = exp(Z i ) Y 1 and Y 2 resemble negative binomial RVs. Generate conditionally independent Y i ~ Poisson(X i )

Obtaining the Right Moments To get with corr(Y 1, Y 2 ) =  Y, generate lognormals X 1, X 2 with This requires normals Z 1, Z 2 with and

Constraints on Moments of Y 1, Y 2 with Lognormal-Poisson Method

Upper Bound for Correlation–Lognormal Poisson

Overlapping Sums Method For Generating Y 1 and Y 2 Generate independent, discrete RVs X 1, X 2, X Let Y 1 = X + X 1 Y 2 = X + X 2 Holgate (1964): Correlated Poissons We are not concerned with the exact distribution of Y 1 and Y 2, but we require them to be ecologically plausible.

Obtaining the Right Moments To get with corr(Y 1, Y 2 ) =  Y, Generate independent X 1, X 2, X with and

Choose distributions for Xs based on relationship between variance and mean: If, use X ~ Negative binomial(  X,  X 2 ) If, use X ~ Poisson(  X ) If, use X ~ Bernoulli(  X ) If and, use, where B~Bernoulli(p), and P~Poisson( ), with and then X cannot be simulated—by any method. If

Constraints on Moments of Y 1, Y 2 with Overlapping Sums Method No constraints on means of Y i, but we require ▪ Relationship between and ecologically plausible ▪

Upper Bound for Correlation–Overlapping Sums

Comparing Methods

Step 1: Find variances and means of X's Y 1 = X + X 1 Y 2 = X + X 2 where X, X 1, and X 2 are independent count random variables with... Variances: Means: A quick example: Simulate Y 1 and Y 2 with and  = 0.2 Two equations, three unknowns... Try so X would be Bernoulli.

Step 2: Define distributions for X's X ~ Bernoulli(0.0921) since by design X 1 ~ Negative binomial with  = and  2 = X 2 = Bernoulli(p) + Poisson( ) with p = 0.05 and = Step 3: Simulate Y 1 = X + X 1 Y 2 = X + X 2

Generalizing to n > 2: 1. Park & Shin (1998) algorithm gives variances for X's: Find n  m matrix T consisting of 0’s and 1’s and m-vector such that and 2. Linear programming gives reasonable means for X's: Find m-vector that solves subject to constraints: (i)  i > 0 for all i; and (ii)when  i 2  Generate independent X's with the appropriate distributions and multiply by T: where X is a vector of independent r.v.’s, and T is a matrix of 0’s and 1’s

Park & Shin (1998) algorithm gives variances of X's E.g., Suppose for the common component of Y 3 and Y 4

Grub population density as a function of several covariates

Fitted Values (quartiles) Variance of Residuals st2nd3rd4th Correlation of Residuals Lag distance (feet) Are the conditions for multiple regression met? 1. Non-normal response variable 2. Variance not constant 3. Observations not independent

with quasi-likelihood estimation (Wedderburn, 1974) Generalized linear model (Fisher 1935; Dempster 1971; Berk 1972; Nelder and Wedderburn 1972) adapted for spatially dependent observations (Liang and Zeger 1986; McCullagh amd Nelder 1989; Albert and McShane 1995; Gotway and Stroup 1997; Dalthorp 2004 ) A. Accommodates response variables with distribution in exponential family (including normal, binomial, Poisson, gamma, exponential, chi-squared, etc.) B. Allows for non-constant variance A. Accommodates response variables that are not in an exponential family (including negative binomial, unspecified distributions) B. Requires only that the variance of the response variable be expressed as a function of the mean A. Accounts for spatial autocorrelation in the residuals B. The statistical theory for the model is not well-developed

Example: Japanese beetle grub population density vs. soil organic matter Organic matter content (%) Grubs per soil sample Means s2s Correlation Lag distance (feet) VariancesCorrelations Means (via GLM): Variances (via TPL): Correlations (via spherical model):

X’s are independent, count-valued random variables -- variances from Park & Shin’s algorithm -- means from linear programming ### PROBLEM ### No solution found! Choice between one of the following: i. One Y mean off-target but no impossible X r.v.'s Need: Y with  = Can only do:  = ii. One impossible X r.v. ( ) We need: r.v. with  = ,  2 = Can do Bernoulli:  = ,  2 = Consequences? Var(Y 16 ) = vs. target of The simulation 1000 reps with n = 143:

Results for 1000 simulation runs: 3720 X's consisting of: -- Negative binomial: Bernoulli: Bernoulli + Poisson: Impossible: 1 (simulated  2 slightly larger than target) Target mean Simulated mean Means Target variance Simulated variance Variances

Lag distance Correlation Correlations

Example: Diamond back moth dispersal Release point Traps Means Variances Mean Variance Lag Distance Correlation Correlation

The simulation 1000 reps with n = 114: X’s are independent negative binomials -- variances from Park & Shin’s algorithm -- means from linear programming T is a matrix of zeros and ones that defines the common components of the Y’s  22 s2s2 Results Means Variances

Lag distance Correlation Correlation: Simulated vs. target * Circles are averages for 1000 sims

Example: Weed counts (Chenopodium polyspermum) vs. soil magnesium Weed counts and soil [Mg] in random quadrats in a field... Means s2s2 Variances

Correlation ### Infeasible correlations ### Highest possible correlation between Y i, Y j is: With 49 pairs of points in the weed data, target  i,j is too high.

Summary Correlated count r.v.'s can be simulated by overlapping sums of independent negative binomials, Bernoullis, and Poissons The simulated r.v.'s are very close to negative binomial where   2 Negative correlations and strong positive correlations between r.v.’s with very different variances are not attainable, but... The method can accommodate a wide variety of ecologically important scenarios that the hierarchical lognormal-Poisson model balks at, including: -- underdispersed count r.v.'s -- moderately strong correlations where  1   2 and  1 2   2 2