# Simulate Univariate Data for Testing GLMMs

`simAbund.Rd`

The function `simAbund`

simulates univariate data without imperfect detection for simulation studies, power assessments, or function testing related to GLMMs. Data can be optionally simulated with a spatial Gaussian Process in the model. Non-spatial random effects can also be included in the model.

## Usage

```
simAbund(J.x, J.y, n.rep, n.rep.max, beta, kappa, tau.sq, mu.RE = list(),
offset = 1, sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu,
family = 'Poisson', z, x.positive = FALSE, ...)
```

## Arguments

- J.x
a single numeric value indicating the number of sites to simulate count data along the horizontal axis. Total number of sites with simulated data is \(J.x \times J.y\).

- J.y
a single numeric value indicating the number of sites to simulate count data along the vertical axis. Total number of sites with simulated data is \(J.x \times J.y\).

- n.rep
a numeric vector of length \(J = J.x \times J.y\) indicating the number of replicate surveys at each of the \(J\) sites.

- n.rep.max
a single numeric value indicating the maximum number of replicate surveys. This is an optional argument, with its default value set to

`max(n.rep)`

. This can be used to generate data sets with different types of missingness (e.g., simulate data across 20 days (replicate surveys) but sites are only sampled a maximum of ten times each).- beta
a numeric vector containing the intercept and regression coefficient parameters for the abundance model.

- kappa
a single numeric value containing the dispersion parameter for the abundance portion of the model. Only relevant when

`family = 'NB'`

.- tau.sq
a single numeric value containing the residual variance parameter of the Gaussian distribution. Only relevant when

`family = 'Gaussian'`

or`family = 'zi-Gaussian'`

.- mu.RE
a list used to specify the non-spatial random intercepts included in the model. The list must have two tags:

`levels`

and`sigma.sq.mu`

.`levels`

is a vector of length equal to the number of distinct random intercepts to include in the model and contains the number of levels there are in each intercept.`sigma.sq.mu`

is a vector of length equal to the number of distinct random intercepts to include in the model and contains the variances for each random effect. A third optional tage is`beta.indx`

, which is a numeric vector with length equal to the number of distinct random intercepts. The values in`beta.indx`

denote the intercept/covariate for which you wish to simulate a random intercept/slope. Numeric values correspond to the intercept/covaraite in`beta`

. If`mu.RE`

is not specified, no random effects are included in the abundance portion of the model.- sp
a logical value indicating whether to simulate a spatially-explicit model with a Gaussian process. By default set to

`FALSE`

.- offset
either a single numeric value, a vector of length

`J`

, or a site by replicate matrix that contains the offset for each data point in the data set.- svc.cols
a vector indicating the variables whose effects will be estimated as spatially-varying coefficients.

`svc.cols`

is an integer vector with values indicating the order of covariates specified in the model formula (with 1 being the intercept if specified).- cov.model
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the abundance data. Supported covariance model key words are:

`"exponential"`

,`"matern"`

,`"spherical"`

, and`"gaussian"`

.- sigma.sq
a numeric value indicating the spatial variance parameter. Ignored when

`sp = FALSE`

.- phi
a numeric value indicating the spatial decay parameter. Ignored when

`sp = FALSE`

.- nu
a numeric value indicating the spatial smoothness parameter. Only used when

`sp = TRUE`

and`cov.model = "matern"`

.- family
the distribution to use for the data. Currently supports

`'NB'`

(negative binomial),`'Poisson'`

,`'Gaussian'`

, and`'zi-Gaussian'`

.- z
a vector of length

`J`

containing the binary presence/absence portion of a zero-inflated Gaussian model. Only relevant when`family = 'zi-Gaussian'`

.- x.positive
a logical value indicating whether the simulated covariates should be simulated as random standard normal covariates (

`x.positive = FALSE`

) or restricted to positive values using a uniform distribution with lower bound 0 and upper bound 1 (`x.positive = TRUE`

).- ...
currently no additional arguments

## Author

Jeffrey W. Doser doserjef@msu.edu

## Value

A list comprised of:

- X
a three-dimensional numeric design array of covariates with dimensions corresponding to sites, replicates, and number of covariates (including an intercept) for the model.

- coords
a \(J \times 2\) numeric matrix of coordinates of each site. Required for spatial models.

- w
a matrix of the spatial random effects. Only used to simulate data when

`sp = TRUE`

. If simulating data with spatially-varying coefficients, the number of columns equals the number of spatially-varying coefficients and each row corresponds to a site.- mu
a

`J x max(n.rep)`

matrix of the expected abundance values for each site and replicate survey.- y
a

`J x max(n.rep)`

matrix of the raw count data for each site and replicate combination.- X.re
a numeric three-dimensional array containing the levels of any abundance random effect included in the model. Only relevant when abundance random effects are specified in

`mu.RE`

.- beta.star
a numeric vector that contains the simulated abundance random effects for each given level of the random effects included in the abundance model. Only relevant when abundance random effects are included in the model.

## Examples

```
set.seed(401)
J.x <- 15
J.y <- 15
J <- J.x * J.y
n.rep <- sample(3, J, replace = TRUE)
beta <- c(0, -1.5, 0.3, -0.8)
p.abund <- length(beta)
mu.RE <- list(levels = c(30), sigma.sq.mu = c(1.3))
kappa <- 0.5
sp <- FALSE
family <- 'NB'
dat <- simAbund(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta,
kappa = kappa, mu.RE = mu.RE, sp = sp, family = 'NB')
```