Skip to contents

Introduction

This vignette presents new functionality in v0.6.0 to fit an integrated multi-species occupancy model (i.e., a multi-species occupancy model with multiple data sources) with the intMsPGOcc() function. This is a single-season version of the “integrated community occupancy model” developed by Doser et al. (2022). Often, multiple detection-nondetection data sources are available to study the occurrence and distribution of multiple species of interest. Such data can arise from citizen science checklists, point count surveys, camera traps, and autonomous acoustic recording units. A model that combines multiple multi-species data sources together in a single modeling framework has the potential to provide improved ecological inference as a result of increased sample sizes, increased spatial extent, and/or increased ability to account for sampling biases in individual data sources (Miller et al. 2019). Integrated multi-species occupancy models seek to combine multiple sources of multi-species detection-nondetection data in a single hierarchical modeling framework to provide improved inference on multiple species occurrence patterns. The data sources may more not be replicated, allowing for use of so-called “single-visit” data within the integrated modeilng approach. Further, data sources can be used in the modeling framework even if they do not all sample the entire species community. For a more thorough description of integrated multi-species occupancy models, see Doser et al. (2022).

Functionality for integrated multi-species occupancy models in spOccupancy is under active development, and the full suite of spOccupancy tools (e.g., cross-validation, posterior predictive checks) are not currently available for the intMsPGOcc() function, and currently only a single-season, non-spatial model is available. In this vignette, we will present a brief overview of how to fit integrated multi-species occupancy models using the current functionality in spOccupancy. We will update this vignette as we add in new functionality to fit spatial models, as well as perform cross-validation and do posterior predictive checks.

First we load spOccupancy.

Example data sets: Foliage-gleaning birds at Hubbard Brook and Bartlett Forests

As an exmaple data set, we will use data from point count surveys on twelve foliage-gleaning bird species in the Hubbard Brook Experimental Forest (HBEF) and the National Ecological Observatory Network (NEON) at Bartlett Forest. These are two forest patches in the White Mountain National Forest in New Hampshire, USA (see Figure 1 in Doser et al. (2022)]. Both data sources are provided as part of the spOccupancy package in the objects hbef2015 and neon2015.

In the Hubbard Brook data set, observers performed standard point count surveys at 373 sites over three replicates, each of 10 minutes in length and with a detection radius of 100m. In the data set provided here, we converted these data to detection-nondetection data. Some sites were not visited for all three replicates. The NEON data are collected at 80 point count sites in Bartlett Forest using a removal protocol with three time periods, resulting in replicated detection-nondetection data that can be used in an occupancy modeling framework (after reducing counts to binary detection indicators). In this case, the two data sources do not overlap spatially, but rather provide information on two areas within a larger forested region of interest (i.e., White Mountain National Forest).

data(hbef2015)
data(neon2015)
# Take a look at the two data sources
str(hbef2015)
List of 4
 $ y       : num [1:12, 1:373, 1:3] 0 0 0 0 0 1 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:12] "AMRE" "BAWW" "BHVI" "BLBW" ...
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:3] "1" "2" "3"
 $ occ.covs: num [1:373, 1] 475 494 546 587 588 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ day: num [1:373, 1:3] 156 156 156 156 156 156 156 156 156 156 ...
  ..$ tod: num [1:373, 1:3] 330 346 369 386 409 425 447 463 482 499 ...
 $ coords  : num [1:373, 1:2] 280000 280000 280000 280001 280000 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "X" "Y"
str(neon2015)
List of 4
 $ y       : num [1:12, 1:80, 1:3] 0 0 0 0 0 0 1 0 0 0 ...
 $ occ.covs: num [1:80, 1] 390 425 443 382 441 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ day: num [1:80] 169 169 169 169 169 169 169 169 169 170 ...
  ..$ tod: int [1:80] 8 8 6 7 8 6 7 7 7 8 ...
 $ coords  : num [1:80, 1:2] 318472 318722 318972 318472 318722 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:80] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "X" "Y"

The objects hbef2015 and neon2015 contain data formatted for fitting a multi-species occupancy model in spOccupancy. Briefly, the object y is a three-dimensional array of the detection-nondetection data, with dimensions corresponding to species, site, and replicate. occ.covs is a matrix or data frame with rows corresponding to sites and columns corresponding to the possible covariates for modeling occupancy probability. det.covs is a list where each element of the list consists of the possible covariates for modeling detection probability, where site-level covariates are specified as a one-dimensional vector while observation-level covariates are specified as a two-dimensional matrix with rows corresponding to sites and columns corresponding to replicates. Lastly, the coords object consists of the spatial coordinates for each site. These are only necessary for spatially-explicit models in spOccupancy (note there is no current implementation of spatially-explicit integrated multi-species occupancy models).

Next, let’s take a look at the total observations for each species in each data source.

# Species four-letter codes
sp.names <- dimnames(hbef2015$y)[[1]]
(sp.sums.hbef <- apply(hbef2015$y, 1, sum, na.rm = TRUE))
AMRE BAWW BHVI BLBW BLPW BTBW BTNW CAWA MAWA NAWA OVEN REVI 
   4   44   66  420   66  619  615   27   94   26  588  613 
# Assign species names to the NEON data.
dimnames(neon2015$y)[[1]] <- sp.names
(sp.sums.neon <- apply(neon2015$y, 1, sum, na.rm = TRUE))
AMRE BAWW BHVI BLBW BLPW BTBW BTNW CAWA MAWA NAWA OVEN REVI 
   8   15   12   23    0   22   45    2    3    0   62   60 

We see that BLPW and NAWA do not have any observations in the NEON data. Additionally, AMRE is only detected a total of 12 times between the two data sources. Here we will remove BLPW and NAWA from the NEON data set and will remove AMRE from both data sources, and thus will model a total of 11 species across the two data sources.

neon.obs.sp.indx <- which(sp.sums.neon > 0 & sp.names != 'AMRE')
hbef.obs.sp.indx <- which(sp.sums.hbef > 0 & sp.names != 'AMRE')
hbef2015$y <- hbef2015$y[hbef.obs.sp.indx, , ]
neon2015$y <- neon2015$y[neon.obs.sp.indx, , ]
str(hbef2015)
List of 4
 $ y       : num [1:11, 1:373, 1:3] 0 0 0 0 1 0 0 0 0 1 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:3] "1" "2" "3"
 $ occ.covs: num [1:373, 1] 475 494 546 587 588 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ day: num [1:373, 1:3] 156 156 156 156 156 156 156 156 156 156 ...
  ..$ tod: num [1:373, 1:3] 330 346 369 386 409 425 447 463 482 499 ...
 $ coords  : num [1:373, 1:2] 280000 280000 280000 280001 280000 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "X" "Y"
str(neon2015)
List of 4
 $ y       : num [1:9, 1:80, 1:3] 0 0 0 0 1 0 0 1 1 0 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
  .. ..$ : NULL
  .. ..$ : NULL
 $ occ.covs: num [1:80, 1] 390 425 443 382 441 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ day: num [1:80] 169 169 169 169 169 169 169 169 169 170 ...
  ..$ tod: int [1:80] 8 8 6 7 8 6 7 7 7 8 ...
 $ coords  : num [1:80, 1:2] 318472 318722 318972 318472 318722 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:80] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "X" "Y"

Notice that now the two data sources have a different number of species.

Basic model description

An integrated multi-species occupancy model takes the same form as a multi-species occupancy model, with separate sub-models to describe the ecological process (occupancy) from the observation process. The only difference is that now there are multiple observation sub-models for each data source included in the model. The ecological sub-model of an integrated multi-species occupancy model is identical to that of a regular multi-species occupancy model. Let \(z_{i, j}\) be the true presence (1) or absence (0) of a species \(i\) at site \(j\), with \(j = 1, \dots, J\) and \(i = 1, \dots, N\). We model the latent occurrence of each species according to

\[\begin{equation} \begin{split} &z_{i, j} \sim \text{Bernoulli}(\psi_{i, j}), \\ &\text{logit}(\psi_{i, j}) = \boldsymbol{x}^{\top}_{j}\boldsymbol{\beta}_i, \end{split} \end{equation}\]

where \(\psi_{i, j}\) is the probability of occurrence of species \(i\) at site \(j\), which is a function of site-specific covariates \(\boldsymbol{X}\) and a vector of species-specific regression coefficients (\(\boldsymbol{\beta}_i\)). The regression coefficients in multi-species occupancy models are envisioned as random effects arising from a common community level distribution. Specifically, we have,

\[\begin{equation} \boldsymbol{\beta}_i \sim \text{Normal}(\boldsymbol{\mu}_{\beta}, \boldsymbol{T}_{\beta}), \end{equation}\]

where \(\boldsymbol{\mu}_{\beta}\) is a vector of community level mean effects for each occurrence covariate effect (including the intercept) and \(\boldsymbol{T}_{\beta}\) is a diagonal matrix with diagonal elements \(\boldsymbol{\tau}^2_{\beta}\) that represent the variability of each occurrence covariate effect among species in the community.

We do not directly observe \(z_{i, j}\), but rather we observe an imperfect representation of the latent occurrence process. In integrated models, we have \(r = 1, \dots, R\) distinct sources of data that are all imperfect representations of a single, shared occurrence process. Let \(y_{r, i, a, k}\) be the observed detection (1) or nondetection (0) of species \(i\) in data set \(r\) at site \(a\) during replicate \(k\). Note that a data source may have data for all \(i = 1, \dots, N\) species in the community, or only a subset of the species (of course, each species modeled must have data from at least one data source). Because different data sources have different variables influencing the observation process, we envision a separate detection model for each data source that is defined conditional on a single, shared ecological process described above. More specifically, for data source \(r\) we have

\[\begin{equation} \begin{split} &y_{r, i, a, k} \sim \text{Bernoulli}(p_{r, i, a, k}z_{i, j[a]}), \\ &\text{logit}(p_{r, i, a, k}) = \textbf{v}^{\top}_{r, i, a, k}\boldsymbol{\alpha}_{r, i}, \end{split} \end{equation}\]

where \(p_{r,i, a, k}\) is the probability of detecting species \(i\) at site \(a\) during replicate \(k\) (given it is present at site \(a\)) for data source \(r\), which is a function of site, replicate, and data source specific covariates \(\textbf{V}_r\), and a vector of regression coefficients specific to each data source and species (\(\boldsymbol{\alpha}_{r, i}\)). \(z_{i, j[a]}\) is the true latent occurrence status of species \(i\) at point count location \(j\) that corresponds to the \(a\)th data set location in data set \(r\). Note that species-specific coefficients are modeled as random effects arising from a common distribution with community-level mean and variance parameters, as we saw previously with the occurrence coefficients.

We assign normal priors for the occurrence and data-set specific detection regression coefficients, and inverse-Gamma priors for the community-level variance parameters to complete the Bayesian specification of the model. Pólya-Gamma data augmentation is implemented in an analogous manner to that of other spOccupancy functions to yield an efficient implementation of multi-species integrated occupancy models.

Fitting integrated multi-species occupancy models with intMsPGOcc()

The function intMsPGOcc() fits multi-species integrated occupancy models in spOccupancy. The function follows the standard spOccupancy syntax, and more details on this syntax are available in the introductory vignette.

intMsPGOcc(occ.formula, det.formula, data, inits, priors, n.samples,
           n.omp.threads = 1, verbose = TRUE, n.report = 100, 
           n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1,
           k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)

The data argument contains the list of data elements necessary for fitting an integrated multi-species occupancy model. data should be a list comprised of the following objects: y (list of detection-nondetection data matrices for each data source), occ.covs (data frame or matrix of covariates for occurrence model), det.covs (a list of lists where each element of the list corresponds to the detection-nondetection data for the given data source), sites (a list where each element consists of the site indices for the given data source), and species (a list where each element consists of the species indices for the given data source).

The hbef2015 and neon2015 data sources are currently formatted for use in standard multi-species occupancy models. Perhaps the trickiest part of data integration is ensuring each location in each data source lines up with the correct geographical location where you want to determine the true presence/absence of each species of interest. In spOccupancy, most of this bookkeeping is done under the hood, but we will need to combine the two data sources together into a single list in which we are consistent about how the data sources are sorted. To accomplish this, we recommend first creating the occurrence covariates matrix for all data sources. Because our two data sources do not overlap spatially, this is relatively simple and here we can just use rbind().

occ.covs <- rbind(hbef2015$occ.covs, neon2015$occ.covs)
str(occ.covs)
 num [1:453, 1] 475 494 546 587 588 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr "Elevation"

Notice the order in which we placed these covariates: all covariate values for HBEF come first, followed by all covariates for NEON. We need to ensure that we use this identical ordering for all objects in the data list. Next, we create the site indices stored in sites. sites should be a list with two elements (one for each data source), where each element consists of a vector that indicates the rows in occ.covs that correspond with the specific row of the detection-nondetection data for that data source. When the data sources stem from distinct points (like in our current case), this is again relatively straightforward as the indices simply correspond to how we ordered the points in occ.covs.

sites.indx <- list(hbef = 1:nrow(hbef2015$occ.covs),
                  neon = 1:nrow(neon2015$occ.covs) + nrow(hbef2015$occ.covs))
str(sites.indx)
List of 2
 $ hbef: int [1:373] 1 2 3 4 5 6 7 8 9 10 ...
 $ neon: int [1:80] 374 375 376 377 378 379 380 381 382 383 ...

Next we create the combined detection-nondetection data y. For integrated multi-species models in spOccupancy, y is a list of three-dimensional arrays, with each array having dimensions corresponding to species, site, and replicate. Again, we must ensure that we place the data sources in the correct order.

y.full <- list(hbef = hbef2015$y,
               neon = neon2015$y)
str(y.full)
List of 2
 $ hbef: num [1:11, 1:373, 1:3] 0 0 0 0 1 0 0 0 0 1 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:3] "1" "2" "3"
 $ neon: num [1:9, 1:80, 1:3] 0 0 0 0 1 0 0 1 1 0 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
  .. ..$ : NULL
  .. ..$ : NULL

The last thing we need to specify in data for an integrated multi-species occupancy model is the species list, which is a list of vectors where each vector is a set of codes that indicates the specific species in the corresponding data source. In other words, each element of the species list should consist of some code to indicate the species that each row of the corresponding array in the y data list corresponds to. This is necessary to link data sources that may not sample exactly the same species, as is our case here where the NEON data only sample 9 of the 11 species we are modeling.

sp.indx <- list(hbef = sp.names[hbef.obs.sp.indx], 
                neon = sp.names[neon.obs.sp.indx])
sp.indx
$hbef
 [1] "BAWW" "BHVI" "BLBW" "BLPW" "BTBW" "BTNW" "CAWA" "MAWA" "NAWA" "OVEN"
[11] "REVI"

$neon
[1] "BAWW" "BHVI" "BLBW" "BTBW" "BTNW" "CAWA" "MAWA" "OVEN" "REVI"

Lastly, we put the detection covariates for each data set in a single list, and then combine all the elements together in one data list that we will eventually supply to intMsPGOcc().

det.covs <- list(hbef = hbef2015$det.covs,
                 neon = neon2015$det.covs)
data.list <- list(y = y.full,
                 occ.covs = occ.covs,
                 det.covs = det.covs,
                 sites = sites.indx, 
                 species = sp.indx)
str(data.list)
List of 5
 $ y       :List of 2
  ..$ hbef: num [1:11, 1:373, 1:3] 0 0 0 0 1 0 0 0 0 1 ...
  .. ..- attr(*, "dimnames")=List of 3
  .. .. ..$ : chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
  .. .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:3] "1" "2" "3"
  ..$ neon: num [1:9, 1:80, 1:3] 0 0 0 0 1 0 0 1 1 0 ...
  .. ..- attr(*, "dimnames")=List of 3
  .. .. ..$ : chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
  .. .. ..$ : NULL
  .. .. ..$ : NULL
 $ occ.covs: num [1:453, 1] 475 494 546 587 588 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ hbef:List of 2
  .. ..$ day: num [1:373, 1:3] 156 156 156 156 156 156 156 156 156 156 ...
  .. ..$ tod: num [1:373, 1:3] 330 346 369 386 409 425 447 463 482 499 ...
  ..$ neon:List of 2
  .. ..$ day: num [1:80] 169 169 169 169 169 169 169 169 169 170 ...
  .. ..$ tod: int [1:80] 8 8 6 7 8 6 7 7 7 8 ...
 $ sites   :List of 2
  ..$ hbef: int [1:373] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ neon: int [1:80] 374 375 376 377 378 379 380 381 382 383 ...
 $ species :List of 2
  ..$ hbef: chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
  ..$ neon: chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...

We will now go ahead and run the integrated multi-species occupancy model. We will first fit a model with a linear and quadratic effect of elevation on occurrence, and effects of day (linear and quadratic) of the year and time of day (linear) on the detection process for each data source. Note that there is no requirement to have the same covariates on the detection models for each individual data source. The function intMsPGOcc() also allows the user to specify the priors and initial values, which follow the same format as other spOccupancy functions (see the introductory vignette for details), but here we will just use the default values, which we see below are printed to the screen. The last thing we need to do is specify the formulas for occurrence and detection (note we need a separate formula for each data source for the detection models in det.formula, which are supplied as a list). We also specify all the criteria of the MCMC sampler, where here we run three chains of the model for 20,000 iterations with a burn-in period of 12,000 and a thinning rate of 8, resulting in a total of 3000 posterior samples.

# Approx. run time: 5.5 minutes
out.1 <- intMsPGOcc(occ.formula = ~ scale(Elevation) + I(scale(Elevation)^2),
                    det.formula = list(hbef = ~ scale(day) + I(scale(day)^2) + scale(tod),
                                       neon = ~ scale(day) + I(scale(day)^2) + scale(tod)),
                    data = data.list,
                    n.samples = 20000,
                    n.omp.threads = 1,
                    verbose = TRUE,
                    n.report = 4000,
                    n.burn = 12000,
                    n.thin = 8,
                    n.chains = 3)
----------------------------------------
    Preparing to run the model
----------------------------------------
No prior specified for beta.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for alpha.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for tau.sq.beta.ig.
Setting prior shape to 0.1 and prior scale to 0.1
No prior specified for tau.sq.alpha.ig.
Setting prior shape and scale to 0.1
z is not specified in initial values.
Setting initial values based on observed data
beta.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
alpha.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10

tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10
beta is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
alpha is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
----------------------------------------
    Model description
----------------------------------------
Integrated Multispecies Occupancy Model with Polya-Gamma latent
variable fit with 453 sites and 11 species.

Integrating 2 occupancy data sets.

Samples per Chain: 20000 
Burn-in: 12000 
Thinning Rate: 8 
Number of Chains: 3 
Total Posterior Samples: 3000 

Source compiled with OpenMP support and model fit using 1 thread(s).

----------------------------------------
    Chain 1
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
    Chain 2
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
    Chain 3
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
summary(out.1)

Call:
intMsPGOcc(occ.formula = ~scale(Elevation) + I(scale(Elevation)^2), 
    det.formula = list(hbef = ~scale(day) + I(scale(day)^2) + 
        scale(tod), neon = ~scale(day) + I(scale(day)^2) + scale(tod)), 
    data = data.list, n.samples = 20000, n.omp.threads = 1, verbose = TRUE, 
    n.report = 4000, n.burn = 12000, n.thin = 8, n.chains = 3)

Samples per Chain: 20000
Burn-in: 12000
Thinning Rate: 8
Number of Chains: 3
Total Posterior Samples: 3000
Run Time (min): 8.455

----------------------------------------
    Community Level
----------------------------------------
Occurrence Means (logit scale): 
                         Mean     SD    2.5%     50%  97.5%   Rhat  ESS
(Intercept)            0.4669 0.8962 -1.3149  0.4527 2.2252 1.0019 2159
scale(Elevation)       0.1512 0.4908 -0.8409  0.1474 1.1169 1.0094  820
I(scale(Elevation)^2) -0.0166 0.2818 -0.5689 -0.0189 0.5666 1.0078 1132

Occurrence Variances (logit scale): 
                         Mean     SD   2.5%     50%   97.5%   Rhat ESS
(Intercept)           12.8002 8.4599 4.4448 10.5906 33.4562 1.0831 194
scale(Elevation)       2.5577 1.9165 0.7673  2.0855  7.2622 1.0079 578
I(scale(Elevation)^2)  0.7166 0.5300 0.2060  0.5836  2.0181 1.0439 703

-----------------------------
    Data source 1
-----------------------------
Detection Means (logit scale): 
                   Mean     SD    2.5%     50%  97.5%   Rhat  ESS
(Intercept)     -0.6214 0.4389 -1.4700 -0.6208 0.2819 1.0016 2043
scale(day)       0.0601 0.0898 -0.1165  0.0592 0.2413 1.0030 3000
I(scale(day)^2) -0.0149 0.0864 -0.1846 -0.0142 0.1562 1.0021 3000
scale(tod)      -0.0422 0.0781 -0.2070 -0.0404 0.1016 1.0021 3000

Detection Variances (logit scale): 
                  Mean     SD   2.5%    50%  97.5%   Rhat  ESS
(Intercept)     2.2070 1.3096 0.8180 1.9034 5.5305 1.0115 1919
scale(day)      0.0692 0.0429 0.0239 0.0583 0.1754 1.0021 3000
I(scale(day)^2) 0.0572 0.0391 0.0171 0.0468 0.1606 1.0077 3194
scale(tod)      0.0508 0.0325 0.0166 0.0427 0.1266 1.0068 3000

-----------------------------
    Data source 2
-----------------------------
Detection Means (logit scale): 
                   Mean     SD    2.5%     50%   97.5%   Rhat  ESS
(Intercept)     -1.4171 0.3051 -2.0181 -1.4183 -0.8185 1.0239 1207
scale(day)       0.0462 0.2032 -0.3654  0.0497  0.4309 1.0019 2771
I(scale(day)^2) -0.0609 0.1588 -0.3875 -0.0567  0.2397 1.0014 2768
scale(tod)       0.0094 0.1440 -0.2914  0.0099  0.2803 1.0037 2474

Detection Variances (logit scale): 
                  Mean     SD   2.5%    50%  97.5%   Rhat  ESS
(Intercept)     0.6035 0.4836 0.1452 0.4732 1.9034 1.0051 2790
scale(day)      0.2399 0.2461 0.0380 0.1643 0.8865 1.0025 2540
I(scale(day)^2) 0.1385 0.1210 0.0290 0.1017 0.4722 0.9999 2519
scale(tod)      0.0993 0.0895 0.0235 0.0755 0.3165 1.0054 2638


----------------------------------------
    Species Level
----------------------------------------
Occurrence (logit scale): 
                              Mean     SD    2.5%     50%   97.5%   Rhat  ESS
(Intercept)-BAWW            1.6700 1.6856 -0.5063  1.2806  5.8628 1.2170   56
(Intercept)-BHVI            3.3069 2.9127 -0.0625  2.5856 11.4881 1.1804   36
(Intercept)-BLBW            2.6258 0.6438  1.7163  2.5076  4.3076 1.0154  413
(Intercept)-BLPW           -5.1155 0.6616 -6.5368 -5.0654 -3.9653 1.0230  852
(Intercept)-BTBW            3.6746 0.4917  2.8694  3.6256  4.7597 1.0064 1160
(Intercept)-BTNW            2.1880 0.2975  1.6718  2.1636  2.8425 1.0009 2205
(Intercept)-CAWA           -2.2596 0.5007 -3.1769 -2.2831 -1.2405 1.0081  529
(Intercept)-MAWA           -1.9943 0.2651 -2.5233 -1.9943 -1.4799 1.0024 2514
(Intercept)-NAWA           -2.7535 0.5378 -3.7604 -2.7843 -1.5907 1.0301  622
(Intercept)-OVEN            2.5851 0.3251  2.0463  2.5580  3.3105 1.0112 1173
(Intercept)-REVI            3.5636 0.5175  2.7027  3.5163  4.7115 1.0012  504
scale(Elevation)-BAWW      -0.3373 1.0493 -3.2667 -0.1673  1.6943 1.0914   60
scale(Elevation)-BHVI       0.2786 1.0947 -2.0188  0.2491  2.5767 1.0218  199
scale(Elevation)-BLBW      -0.2494 0.2338 -0.7718 -0.2234  0.1181 1.0203  845
scale(Elevation)-BLPW       2.2944 0.6676  1.3177  2.1770  3.8195 1.0234  593
scale(Elevation)-BTBW      -0.1403 0.1606 -0.4633 -0.1357  0.1621 1.0122 2716
scale(Elevation)-BTNW       0.6639 0.2754  0.2274  0.6320  1.3155 1.0031 1203
scale(Elevation)-CAWA       1.1410 0.4572  0.4324  1.0745  2.2359 1.0091  338
scale(Elevation)-MAWA       1.1799 0.2457  0.7470  1.1634  1.7069 1.0085 1504
scale(Elevation)-NAWA       1.0355 0.4033  0.3621  1.0033  1.9310 1.0029 1345
scale(Elevation)-OVEN      -1.7627 0.3651 -2.5938 -1.7222 -1.1956 1.0170  798
scale(Elevation)-REVI      -2.1957 0.7204 -3.8552 -2.1192 -1.0245 1.0064  318
I(scale(Elevation)^2)-BAWW -0.7560 0.5016 -1.8446 -0.7253  0.3047 1.1276  145
I(scale(Elevation)^2)-BHVI  0.4691 0.7466 -0.8118  0.4205  2.2222 1.0132  263
I(scale(Elevation)^2)-BLBW -0.6242 0.2237 -1.1192 -0.6025 -0.2389 1.0084  839
I(scale(Elevation)^2)-BLPW  1.0046 0.3952  0.1974  1.0290  1.7206 1.0199 1245
I(scale(Elevation)^2)-BTBW -1.1231 0.1863 -1.5124 -1.1104 -0.7800 1.0058 1472
I(scale(Elevation)^2)-BTNW  0.0154 0.1951 -0.3273  0.0014  0.4422 1.0058 1904
I(scale(Elevation)^2)-CAWA  0.2948 0.3517 -0.3050  0.2668  1.1309 1.0320  464
I(scale(Elevation)^2)-MAWA  0.6846 0.2079  0.2952  0.6704  1.1242 1.0050 1850
I(scale(Elevation)^2)-NAWA  0.3766 0.2939 -0.1909  0.3688  0.9508 1.0061 1417
I(scale(Elevation)^2)-OVEN -0.4206 0.2230 -0.8276 -0.4301  0.0534 1.0026 1531
I(scale(Elevation)^2)-REVI -0.0627 0.3502 -0.6935 -0.0745  0.6736 1.0059  464

-----------------------------
    Data source 1
-----------------------------
Detection (logit scale): 
                        Mean     SD    2.5%     50%   97.5%   Rhat  ESS
(Intercept)-BAWW     -2.6212 0.4257 -3.3380 -2.6640 -1.7200 1.2671  111
(Intercept)-BHVI     -2.6951 0.2433 -3.1335 -2.7140 -2.1469 1.0038  151
(Intercept)-BLBW     -0.0440 0.1299 -0.2908 -0.0412  0.2109 1.0102 1332
(Intercept)-BLPW     -0.4635 0.2767 -1.0050 -0.4640  0.0724 1.0095 2981
(Intercept)-BTBW      0.6345 0.1115  0.4217  0.6336  0.8612 1.0020 3000
(Intercept)-BTNW      0.5772 0.1130  0.3555  0.5756  0.8092 1.0021 2539
(Intercept)-CAWA     -1.8297 0.5830 -2.8971 -1.8408 -0.7266 1.0395  276
(Intercept)-MAWA     -0.8576 0.2388 -1.3172 -0.8588 -0.3880 1.0140 1678
(Intercept)-NAWA     -1.5061 0.4925 -2.5112 -1.4999 -0.5910 1.0321  560
(Intercept)-OVEN      0.7935 0.1174  0.5696  0.7932  1.0184 1.0055 3028
(Intercept)-REVI      0.5444 0.1065  0.3387  0.5443  0.7532 1.0033 2243
scale(day)-BAWW       0.2099 0.1385 -0.0587  0.2087  0.4806 1.0065 2811
scale(day)-BHVI       0.2064 0.1209 -0.0210  0.2044  0.4499 1.0027 3000
scale(day)-BLBW      -0.2233 0.0682 -0.3541 -0.2231 -0.0886 1.0002 3000
scale(day)-BLPW       0.0685 0.1592 -0.2525  0.0695  0.3720 1.0027 2570
scale(day)-BTBW       0.2698 0.0683  0.1344  0.2701  0.4021 1.0030 3000
scale(day)-BTNW       0.1510 0.0669  0.0215  0.1509  0.2858 1.0053 3000
scale(day)-CAWA      -0.0094 0.1761 -0.3520 -0.0077  0.3295 1.0066 3000
scale(day)-MAWA       0.1185 0.1250 -0.1172  0.1154  0.3705 0.9996 3537
scale(day)-NAWA       0.0147 0.1737 -0.3278  0.0128  0.3494 1.0005 3000
scale(day)-OVEN      -0.0733 0.0728 -0.2171 -0.0734  0.0708 1.0015 3000
scale(day)-REVI      -0.0713 0.0690 -0.2071 -0.0709  0.0660 1.0059 3201
I(scale(day)^2)-BAWW -0.0323 0.1498 -0.3286 -0.0329  0.2627 1.0087 2838
I(scale(day)^2)-BHVI  0.0514 0.1297 -0.2025  0.0530  0.2997 1.0009 3000
I(scale(day)^2)-BLBW -0.1723 0.0797 -0.3255 -0.1735 -0.0144 1.0029 3000
I(scale(day)^2)-BLPW  0.2059 0.1921 -0.1509  0.2002  0.6255 1.0057 3000
I(scale(day)^2)-BTBW -0.0623 0.0782 -0.2154 -0.0627  0.0950 1.0046 3000
I(scale(day)^2)-BTNW -0.0604 0.0779 -0.2132 -0.0605  0.0933 1.0014 3000
I(scale(day)^2)-CAWA -0.0359 0.1861 -0.3987 -0.0390  0.3545 1.0004 3000
I(scale(day)^2)-MAWA  0.0188 0.1388 -0.2517  0.0193  0.2877 1.0091 3000
I(scale(day)^2)-NAWA -0.1288 0.1813 -0.4929 -0.1278  0.2192 1.0009 3000
I(scale(day)^2)-OVEN  0.0122 0.0855 -0.1493  0.0124  0.1788 1.0022 3000
I(scale(day)^2)-REVI  0.0395 0.0776 -0.1123  0.0397  0.1953 1.0072 3000
scale(tod)-BAWW      -0.1808 0.1369 -0.4607 -0.1787  0.0763 1.0069 3000
scale(tod)-BHVI      -0.0482 0.1110 -0.2638 -0.0495  0.1694 1.0017 3000
scale(tod)-BLBW       0.0573 0.0668 -0.0699  0.0553  0.1931 1.0039 2555
scale(tod)-BLPW       0.1089 0.1463 -0.1674  0.1059  0.3986 1.0065 3000
scale(tod)-BTBW      -0.0359 0.0673 -0.1706 -0.0369  0.0955 1.0002 3000
scale(tod)-BTNW       0.0365 0.0635 -0.0841  0.0370  0.1582 1.0018 3000
scale(tod)-CAWA      -0.2154 0.1619 -0.5590 -0.2114  0.0867 0.9996 3000
scale(tod)-MAWA       0.0226 0.1153 -0.2001  0.0231  0.2506 1.0044 3000
scale(tod)-NAWA      -0.0723 0.1600 -0.4028 -0.0700  0.2559 1.0048 3000
scale(tod)-OVEN      -0.0461 0.0743 -0.1947 -0.0455  0.0968 1.0005 3000
scale(tod)-REVI      -0.0797 0.0642 -0.2051 -0.0800  0.0532 1.0005 3000

-----------------------------
    Data source 2
-----------------------------
Detection (logit scale): 
                        Mean     SD    2.5%     50%   97.5%   Rhat  ESS
(Intercept)-BAWW     -1.5641 0.7135 -2.7018 -1.6510 -0.0688 1.2212  117
(Intercept)-BHVI     -2.1734 0.3760 -2.9180 -2.1684 -1.4087 1.0080  278
(Intercept)-BLBW     -2.0963 0.3224 -2.7735 -2.0846 -1.4971 1.0013 2494
(Intercept)-BTBW     -1.7899 0.2757 -2.3437 -1.7827 -1.2630 1.0024 3000
(Intercept)-BTNW     -1.0067 0.2445 -1.4822 -1.0113 -0.5166 1.0001 3000
(Intercept)-CAWA     -1.3541 0.6563 -2.6707 -1.3551 -0.0645 1.0027 2125
(Intercept)-MAWA     -1.3881 0.5658 -2.4678 -1.3813 -0.2236 1.0061 3377
(Intercept)-OVEN     -0.7609 0.2032 -1.1574 -0.7646 -0.3642 1.0001 3000
(Intercept)-REVI     -0.8593 0.2056 -1.2656 -0.8576 -0.4521 1.0030 2948
scale(day)-BAWW      -0.2955 0.3398 -1.0439 -0.2726  0.3144 1.0088 1649
scale(day)-BHVI       0.1869 0.2910 -0.3758  0.1745  0.7867 1.0017 2560
scale(day)-BLBW       0.5763 0.2723  0.0926  0.5566  1.1596 1.0022 2612
scale(day)-BTBW      -0.1400 0.2419 -0.6247 -0.1395  0.3250 1.0014 2963
scale(day)-BTNW       0.0803 0.1764 -0.2643  0.0820  0.4311 1.0006 2408
scale(day)-CAWA       0.1894 0.4262 -0.6027  0.1768  1.1219 1.0002 3000
scale(day)-MAWA      -0.3160 0.4563 -1.3062 -0.2854  0.4769 1.0003 2670
scale(day)-OVEN       0.1437 0.1712 -0.1829  0.1467  0.4845 1.0004 3000
scale(day)-REVI      -0.0279 0.1588 -0.3446 -0.0219  0.2724 1.0003 2798
I(scale(day)^2)-BAWW -0.0369 0.2514 -0.4865 -0.0513  0.5168 1.0295  932
I(scale(day)^2)-BHVI -0.4401 0.3101 -1.1585 -0.4092  0.0851 1.0105 2202
I(scale(day)^2)-BLBW  0.0247 0.2166 -0.4128  0.0299  0.4409 1.0043 3803
I(scale(day)^2)-BTBW -0.2148 0.2091 -0.6614 -0.2047  0.1743 1.0037 3000
I(scale(day)^2)-BTNW  0.1864 0.1641 -0.1305  0.1821  0.5199 1.0008 3000
I(scale(day)^2)-CAWA  0.0258 0.3305 -0.5929  0.0154  0.7230 1.0025 2894
I(scale(day)^2)-MAWA  0.0206 0.2713 -0.5144  0.0196  0.5708 1.0008 3000
I(scale(day)^2)-OVEN -0.0939 0.1648 -0.4279 -0.0870  0.2162 1.0011 3000
I(scale(day)^2)-REVI  0.0101 0.1515 -0.2922  0.0128  0.3010 1.0017 2581
scale(tod)-BAWW      -0.1181 0.2523 -0.6369 -0.1102  0.3497 1.0064 2819
scale(tod)-BHVI      -0.1421 0.2195 -0.5920 -0.1305  0.2532 1.0044 2340
scale(tod)-BLBW      -0.0114 0.1870 -0.3735 -0.0129  0.3622 1.0040 2998
scale(tod)-BTBW       0.1264 0.1998 -0.2499  0.1240  0.5260 1.0064 3000
scale(tod)-BTNW       0.0595 0.1662 -0.2596  0.0594  0.3970 1.0012 3000
scale(tod)-CAWA      -0.0136 0.2934 -0.6306 -0.0091  0.5481 1.0001 3000
scale(tod)-MAWA       0.0652 0.3053 -0.5425  0.0592  0.7172 1.0027 3000
scale(tod)-OVEN       0.0852 0.1442 -0.1868  0.0840  0.3710 1.0012 3000
scale(tod)-REVI       0.0662 0.1447 -0.2163  0.0657  0.3395 1.0005 3000

From the summary output, we can see that most of the model parameters have converged. We can compare this model to a simpler model that assumes only a linear effect of elevation, and then compare the two models using WAIC with the waicOcc() function.

# Approx. run time: 5.5 minutes
out.2 <- intMsPGOcc(occ.formula = ~ scale(Elevation),
                    det.formula = list(hbef = ~ scale(day) + I(scale(day)^2) + scale(tod),
                       neon = ~ scale(day) + I(scale(day)^2) + scale(tod)),
                    data = data.list,
                    n.samples = 20000,
                    n.omp.threads = 1,
                    verbose = TRUE,
                    n.report = 4000,
                    n.burn = 12000,
                    n.thin = 8,
                    n.chains = 3)
----------------------------------------
    Preparing to run the model
----------------------------------------
No prior specified for beta.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for alpha.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for tau.sq.beta.ig.
Setting prior shape to 0.1 and prior scale to 0.1
No prior specified for tau.sq.alpha.ig.
Setting prior shape and scale to 0.1
z is not specified in initial values.
Setting initial values based on observed data
beta.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
alpha.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10

tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10
beta is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
alpha is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
----------------------------------------
    Model description
----------------------------------------
Integrated Multispecies Occupancy Model with Polya-Gamma latent
variable fit with 453 sites and 11 species.

Integrating 2 occupancy data sets.

Samples per Chain: 20000 
Burn-in: 12000 
Thinning Rate: 8 
Number of Chains: 3 
Total Posterior Samples: 3000 

Source compiled with OpenMP support and model fit using 1 thread(s).

----------------------------------------
    Chain 1
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
    Chain 2
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
    Chain 3
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
# Model selection using WAIC
waicOcc(out.1)
       elpd          pD        WAIC 
-5120.86252    84.60809 10410.94123 
waicOcc(out.2)
      elpd         pD       WAIC 
-5153.8926    78.7532 10465.2915 

We see the model with both a linear and quadratic effect of elevation is preferred. We can also do prediction using the predict() function to generate species distribution maps across a region of interest (or generate estimates of community-level metrics), which follows exactly the format used for single data-source multi-species occupancy models.

As of v0.7.0, we can also calculate WAIC individually for each species using the by.sp argument

waicOcc(out.1, by.sp = TRUE)
        elpd       pD      WAIC
1  -228.4953 9.054605  475.0998
2  -290.3288 6.519948  593.6975
3  -767.1563 9.842579 1553.9978
4  -139.3608 4.992896  288.7075
5  -764.0241 9.359029 1546.7663
6  -842.9293 9.120359 1704.0993
7  -125.6628 7.595562  266.5167
8  -289.2963 6.687321  591.9673
9  -106.6829 5.202496  223.7707
10 -751.3159 7.583553 1517.7989
11 -815.6100 8.649747 1648.5195
waicOcc(out.2, by.sp = TRUE)
        elpd        pD      WAIC
1  -230.4410  8.444234  477.7704
2  -290.7486  6.177394  593.8519
3  -771.4304 10.783319 1564.4274
4  -141.0661  5.120823  292.3739
5  -778.8890  7.972345 1573.7226
6  -842.9124  8.312117 1702.4491
7  -126.5126  5.587553  264.2003
8  -296.3344  5.955685  604.5801
9  -107.5437  4.684388  224.4562
10 -752.5008  7.468169 1519.9379
11 -815.5137  8.247173 1647.5217

Active development

The functionality for integrated multi-species occupancy models in spOccupancy is in active development, and currently some functionality that is common to all other spOccupancy model fitting functions is not available for intMsPGOcc() (i.e., posterior predictive checks, cross-validation, random effects on detection probability). I am actively working on incorporating these extensions into the model. Additionally, there will of course be spatially-explicit versions of these models available at some point in the future as well, with options to accommodate species correlations using the factor modeling approach detailed in Doser, Finley, and Banerjee (2023).

References

Doser, Jeffrey W, Andrew O Finley, and Sudipto Banerjee. 2023. “Joint Species Distribution Models with Imperfect Detection for High-Dimensional Spatial Data.” Ecology 104 (9): e4137.

Doser, Jeffrey W, Wendy Leuenberger, T Scott Sillett, Michael T Hallworth, and Elise F Zipkin. 2022. “Integrated Community Occupancy Models: A Framework to Assess Occurrence and Biodiversity Dynamics Using Multiple Data Sources.” Methods in Ecology and Evolution.

Miller, David AW, Krishna Pacifici, Jamie S Sanderlin, and Brian J Reich. 2019. “The Recent Past and Promising Future for Data Integration Methods to Estimate Species’ Distributions.” Methods in Ecology and Evolution 10 (1): 22–37.