Simulate Multi-Season Single-Species Detection-Nondetection Data

The function simTOcc simulates multi-season single-species occurrence data for simulation studies, power assessments, or function testing. Data can be optionally simulated with a spatial Gaussian Process in the occurrence portion of the model. Non-spatial random intercepts can also be included in the detection or occurrence portions of the occupancy model.

Usage

simTOcc(J.x, J.y, n.time, n.rep, n.rep.max, beta, alpha, sp.only = 0, trend = TRUE, 
        psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, 
        sigma.sq, phi, nu, ar1 = FALSE, rho, sigma.sq.t, x.positive = FALSE, 
  mis.spec.type = 'none', scale.param = 1, avail, grid, ...)

Arguments

J.x: a single numeric value indicating the number of sites to simulate detection-nondetection data along the horizontal axis. Total number of sites with simulated data is \(J.x \times J.y\).
J.y: a single numeric value indicating the number of sites to simulate detection-nondetection data along the vertical axis. Total number of sites with simulated data is \(J.x \times J.y\).
n.time: a single numeric value indicating the number of primary time periods (denoted T) over which sampling occurs.
n.rep: a numeric matrix indicating the number of replicates at each site during each primary time period. The matrix must have \(J = J.x \times J.y\) rows and T columns, where T is the number of primary time periods (e.g., years or seasons) over which sampling occurs.
n.rep.max: a single numeric value indicating the maximum number of replicate surveys. This is an optional argument, with its default value set to max(n.rep). This can be used to generate data sets with different types of missingness (e.g., simulate data across 20 days (replicate surveys) but sites are only sampled a maximum of ten times each).
beta: a numeric vector containing the intercept and regression coefficient parameters for the occupancy portion of the single-species occupancy model. Note that if trend = TRUE, the second value in the vector corresponds to the estimated occurrence trend.
alpha: a numeric vector containing the intercept and regression coefficient parameters for the detection portion of the single-species occupancy model.
sp.only: a numeric vector specifying which occurrence covariates should only vary over space and not over time. The numbers in the vector correspond to the elements in the vector of regression coefficients (beta). By default, all simulated occurrence covariates are assumed to vary over both space and time.
trend: a logical value. If TRUE, a temporal trend will be used to simulate the detection-nondetection data and the second element of beta is assumed to be the trend parameter. If FALSE no trend is used to simulate the data and all elements of beta (except the first value which is the intercept) correspond to covariate effects.
psi.RE: a list used to specify the unstructured random intercepts included in the occupancy portion of the model. The list must have two tags: levels and sigma.sq.psi. levels is a vector of length equal to the number of distinct random intercepts to include in the model and contains the number of levels there are in each intercept. sigma.sq.psi is a vector of length equal to the number of distinct random intercepts to include in the model and contains the variances for each random effect. An additional tag site.RE can be set to TRUE to simulate data with a site-specific non-spatial random effect on occurrence. If not specified, no random effects are included in the occupancy portion of the model.
p.RE: a list used to specify the unstructured random intercepts included in the detection portion of the model. The list must have two tags: levels and sigma.sq.p. levels is a vector of length equal to the number of distinct random intercepts to include in the model and contains the number of levels there are in each intercept. sigma.sq.p is a vector of length equal to the number of distinct random intercepts to include in the model and contains the variances for each random effect. If not specified, no random effects are included in the detection portion of the model.
sp: a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to FALSE.
svc.cols: a vector indicating the variables whose effects will be estimated as spatially-varying coefficients. svc.cols is an integer vector with values indicating the order of covariates specified in the model formula (with 1 being the intercept if specified).
cov.model: a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: "exponential", "matern", "spherical", and "gaussian".
sigma.sq: a numeric value indicating the spatial variance parameter. Ignored when sp = FALSE.
phi: a numeric value indicating the spatial decay parameter. Ignored when sp = FALSE.
nu: a numeric value indicating the spatial smoothness parameter. Only used when sp = TRUE and cov.model = "matern".
ar1: a logical value indicating whether to simulate a temporal random effect with an AR(1) process. By default, set to FALSE.
rho: a numeric value indicating the AR(1) temporal correlation parameter. Ignored when ar1 = FALSE.
sigma.sq.t: a numeric value indicating the AR(1) temporal variance parameter. Ignored when ar1 = FALSE.
x.positive: a logical value indicating whether the simulated covariates should be simulated as random standard normal covariates (x.positive = FALSE) or restricted to positive values (x.positive = TRUE). If x.positive = TRUE, covariates are simulated from a random normal and then the minimum value is added to each covariate value to ensure non-negative covariate values.
mis.spec.type: a quoted keyword indicating the type of model mis-specification to use when simulating the data. These correspond to model mis-specification of the functional relationship between occupancy/detection probability and covariates. Valid keywords are: "none" (no model mis-specification, i.e., logit link), "scale" (scaled logistic link), "line" (linear link), and "probit" (probit link). Defaults to "none".
scale.param: a positive number between 0 and 1 that indicates the scale parameter for the occupancy portion of the model when mis.spec.type = 'scale'. When specified, scale.param corresponds to the scale parameter for the occupancy portion of the model, while the reciprocal of scale.param is used for the detection portion of the model.
avail: a site x primary time period x visit array indicating the availability probability of the species during each survey simulated at the given site/primary time period/visit combination. This can be used to assess impacts of non-constant availability across replicate surveys in simulation studies. Values should fall between 0 and 1. When not specified, availability is set to 1 for all surveys.
grid: an atomic vector used to specify the grid across which to simulate the latent spatial processes. This argument is used to simulate the underlying spatial processes at a different resolution than the coordinates (e.g., if coordinates are distributed across a grid).
...: currently no additional arguments

Author

Jeffrey W. Doser doserjef@msu.edu,

References

Stoudt, S., P. de Valpine, and W. Fithian. Non-parametric identifiability in species distribution and abundance models: why it matters and how to diagnose a lack of fit using simulation. Journal of Statistical Theory and Practice 17, 39 (2023). https://doi.org/10.1007/s42519-023-00336-5.

Value

A list comprised of:

X: a \(J \times T \times p.occ\) numeric array containing the design matrix for the occurrence portion of the occupancy model.
X.p: a four-dimensional numeric array with dimensions corresponding to sites, primary time periods, repeat visits, and number of detection regression coefficients. This is the design matrix used for the detection portion of the occupancy model.
coords: a \(J \times 2\) numeric matrix of coordinates of each occupancy site. Required for spatial models.
w: a \(J \times 1\) matrix of the spatial random effects. Only used to simulate data when sp = TRUE.
psi: a \(J \times T\) matrix of the occupancy probabilities for each site during each primary time period.
z: a \(J \times T\) matrix of the latent occupancy states at each site during each primary time period.
p: a J x T x max(n.rep) array of the detection probabilities for each site, primary time period, and replicate combination. Site/time periods with fewer than max(n.rep) replicates will contain NA values.
y: a J x T x max(n.rep) array of the raw detection-nondetection data for each sit, primary time period, and replicate combination.
X.p.re: a four-dimensional numeric array containing the levels of any detection random effect included in the model. Only relevant when detection random effects are specified in p.RE.
X.re: a numeric matrix containing the levels of any occurrence random effect included in the model. Only relevant when occurrence random effects are specified in psi.RE.
alpha.star: a numeric vector that contains the simulated detection random effects for each given level of the random effects included in the detection model. Only relevant when detection random effects are included in the model.
beta.star: a numeric vector that contains the simulated occurrence random effects for each given level of the random effects included in the occurrence model. Only relevant when occurrence random effects are included in the model.
eta: a \(T \times 1\) matrix of the latent AR(1) random effects. Only included when ar1 = TRUE.

Examples

J.x <- 10
J.y <- 10
J <- J.x * J.y
# Number of time periods sampled
n.time <- sample(10, J, replace = TRUE)
n.time.max <- max(n.time)
# Replicates
n.rep <- matrix(NA, J, max(n.time))
for (j in 1:J) {
  n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE)
}
# Occurrence --------------------------
# Fixed
beta <- c(0.4, 0.5, -0.9)
trend <- TRUE 
sp.only <- 0
psi.RE <- list(levels = c(10), 
               sigma.sq.psi = c(1))
# Detection ---------------------------
alpha <- c(-1, 0.7, -0.5)
p.RE <- list(levels = c(10), 
             sigma.sq.p = c(0.5))
# Spatial parameters ------------------
sp <- TRUE
cov.model <- "exponential"
sigma.sq <- 2
phi <- 3 / .4
nu <- 1
# Temporal parameters -----------------
ar1 <- TRUE
rho <- 0.5
sigma.sq.t <- 0.8
# Get all the data
dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, 
               beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, 
               psi.RE = psi.RE, p.RE = p.RE, 
               sp = sp, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, 
               ar1 = ar1, rho = rho, sigma.sq.t = sigma.sq.t)
str(dat)
#> List of 15
#>  $ X          : num [1:100, 1:10, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ X.p        : num [1:100, 1:10, 1:4, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ coords     : num [1:100, 1:2] 0 0.111 0.222 0.333 0.444 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:100] "1" "2" "3" "4" ...
#>   .. ..$ : NULL
#>  $ coords.full: num [1:100, 1:2] 0 0.111 0.222 0.333 0.444 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Var1" "Var2"
#>  $ psi        : num [1:100, 1:10] 0.534 0.8 0.994 0.98 0.714 ...
#>  $ z          : int [1:100, 1:10] 0 1 1 1 1 0 1 0 1 1 ...
#>  $ p          : num [1:100, 1:10, 1:4] NA 0.181 NA NA 0.648 ...
#>  $ y          : int [1:100, 1:10, 1:4] NA 1 NA NA 1 NA NA NA NA 0 ...
#>  $ w          : num [1:100, 1] -0.851 -0.242 -0.032 0.315 -0.488 ...
#>  $ w.grid     : num [1:100, 1] -0.851 -0.242 -0.032 0.315 -0.488 ...
#>  $ X.p.re     : int [1:100, 1:10, 1:4, 1] 8 3 10 10 3 7 4 5 8 9 ...
#>  $ X.re       : int [1:100, 1:10, 1] 3 10 2 9 10 2 3 4 10 2 ...
#>  $ alpha.star : num [1:10] 0.559 0.31 0.582 1.688 0.17 ...
#>  $ beta.star  : num [1:10] -0.168 2.455 0.242 1.57 0.518 ...
#>  $ eta        : num [1:10, 1] 1.75 2.12 1.78 1.66 1.03 ...