Practical guidelines and considerations when fitting SVC occupancy models

Introduction

In this article, I give a few thoughts on the interpretation, validation, and reliability of inferences made from spatially varying coefficient (SVC) occupancy models (or really just SVC models in general). SVC occupancy models are a flexible extension of spatial occupancy models that allow for not only the intercept to vary spatially, but also allow the regression coefficients themselves to vary spatially. As a result, SVC occupancy models have great potential for exploring spatial nonstationarity in species-environment relationships (e.g., the relationship between a species and its environment varies across the species’ range). Here I will not go in depth on SVC occupancy models and what they are, but rather will provide a brief overview before discussing some potential thoughts related to the reliability of inferences from SVC occupancy models and how we can improve our reliability and confidence in such inferences. For a much more complete discussion of SVC occupancy models I encourage you to check out two of our recent papers (Doser, Finley, et al. 2024; Doser, Kéry, et al. 2024). In Doser, Finley, et al. (2024), we provide full statistical details of the single-species and multi-species occupancy models available in spOccupancy, while in Doser, Kéry, et al. (2024) we present a variety of comparisons (using both simulated and empirical data sets) that highlight the utility of SVC occupancy models for testing and generating hypotheses related to spatial variation in species-environment relationships. A few other great recent examples and resources on SVCs in ecology include Pease, Pacifici, and Kays (2022), Sultaire et al. (2022), and Thorson et al. (2023).

Note that SVC occupancy models can also be used to quantify spatial variation in occupancy trends when fitting a multi-season occupancy model (e.g., Figure 3 in Doser, Kéry, et al. (2024)). Such approaches are also fairly commonly used for abundance-based models to look at spatial variation in population trends (Meehan, Michel, and Rue 2019; Smith et al. 2024). The concepts discussed below are primarily in the context of estimating spatial variation in species-environment relationships (or any variable that varies spatially), whose interpretation is more nuanced than interpretation of a spatially-varying trend. Interpretation of a spatially-varying trend is much more straightforward, particularly when there are no temporally-varying covariates in the model. The visualization of the spatially varying trend simply shows how occupancy probability has changed over time in different ways across a species range.

Quick overview of an SVC occupancy model

Let \(\boldsymbol{s}_j\) denote the spatial coordinates of site \(j\), where \(j = 1, \dots, J\). We define \(z(\boldsymbol{s}_j)\) as the true presence (1) or absence (0) at site \(j\) with spatial coordinates \(\boldsymbol{s}_j\). We model \(z(\boldsymbol{s}_j)\) as

\[\begin{equation}\label{z} z(\boldsymbol{s}_j) \sim \text{Bernoulli}(\psi(\boldsymbol{s}_j)), \end{equation}\]

where \(\psi(\boldsymbol{s}_j)\) is the occurrence probability at site \(j\). For simplicity, suppose we have a single covariate, \(x_1\), whose effect on occupancy we believe may vary across space. Our model for \(\psi(\boldsymbol{s}_j)\) is thus

\[\begin{equation}\label{psi} \text{logit}(\psi(\boldsymbol{s}_j)) = \beta_0 + \text{w}_0(\boldsymbol{s}_j) + \beta_1 \cdot x_1(\boldsymbol{s}_j) + \text{w}_1(\boldsymbol{s}_j) \cdot x_1(\boldsymbol{s}_j), \end{equation}\]

where \(\beta_0\) is an intercept, \(\text{w}_0(\boldsymbol{s}_j)\) is a spatial random effect to account for unmodeled spatial variation in occurrence probability, \(\beta_1\) is the non-spatial effect of the covariate \(x_1(\boldsymbol{s}_j)\), and \(\text{w}_1(\boldsymbol{s}_j)\) is the spatially varying effect of the covariate at each spatial location \(\boldsymbol{s}_j\). The spatially varying component of the covariate effect, \(\text{w}_1(\boldsymbol{s}_j)\) serves as a local adjustment of the species-environment relationship from the overall effect \(\beta_1\). We model SVCs in spOccupancy using Nearest Neighbor Gaussian Processes (Datta et al. 2016; Doser, Finley, et al. 2024).

The sub-model for detection probability in an SVC occupancy model in spOccupancy is identical to a normal occupancy model (i.e., there are no spatially-varying coefficients on detection). See the intro vignette for details.

Guidelines and suggestions for making inference with SVC occupancy models

SVC models are very flexible in their ability to estimate spatial variation in both the intercept and regression coefficients, which is what makes them useful for exploring nonstationarity in species-environment relationships. However, with such a flexible modeling approach, such models do present the potential for overinterpretation, and so we must be careful with the specific covariates we allow to have spatially varying effects and with our resulting interpretation of such effects. As with any sort of model, we can improve our confidence in the resulting interpretation and inferences if we carefully scrutinize the model in different ways (both formally using model comparison as well as informally using visualizations). Below I give a few thoughts and suggestions on how best to perform inference when using SVC models and prevent overinterpretation of any resulting spatial patterns.

Do not arbitarily assign SVCs to covariates. Use SVCs for covariates when you hypothesize the relationship may vary spatially

As with most species distribution models, occupancy models are correlative by nature. We are simply relating a set of covariates that we choose as researchers to the occupancy of our species of interest through a logistic regression framework (that accounts for imperfect detection). With any correlative model, we need to carefully consider the covariates that we include in the model. In most ecological studies, there are many potential covariates that could influence the response variable of interest, and many of those covariates are also correlated with one another. For example, suppose we have two covariates \(x_1\) and \(x_2\) that are highly correlated. In reality, \(x_1\) has a strong positive effect on the occupancy probability of some species of interest. If we, as the researchers, include \(x_2\) instead of \(x_1\) in an occupancy model, we will likely see a strong positive effect of \(x_2\) on occupancy, since \(x_2\) is correlated with \(x_1\). This of course does not mean that changes in \(x_2\) cause changes in occupancy probability. Rather, it just suggests that a change in \(x_2\) is correlated with a change in occupancy probability. It does not provide us with any information on causality.

While the previous paragraph is a fundamental concept in any sort of regression model, it becomes arguably more important to consider in a more flexible model like an SVC occupancy model where the effects of the covariates now vary across space. Because the relationship between the covariate and the response now varies across space, it allows us to make richer inferences on the species-environment relationship. Given this increased potential for inference and resulting ecological interpretation, we should think very carefully about the covariates we include in the model and which ones are allowed to vary spatially. I do not recommend randomly assigning SVCs to covariates without any explicit hyopthesis or expectation as to why the effect may vary spatially, particularly if you plan to make inference on the resulting SVCs (i.e., make a map of the SVC and interpret what it means ecologically). Instead, only use SVCs when testing an explicit hypothesis related to spatial variation in a species-environment relationship. For example, in Doser, Finley, et al. (2024) we used SVCs to explore spatial variation in the relationship between maximum temperature and occupancy of a community of grassland bird species across the US. We hypothesized this relationship may vary spatially as a result of differences in species-specific thermal tolerances where the effect may be larger along the edges of a given species’ range. This advice is nothing new when doing any sort of statistical inference in a species distribution model: if a key objective of an analysis is interpretation of covariate effects (i.e., inference rather than prediction), you should carefully choose the covariates you include in the model (based on explicit hypotheses) rather than throw in the entire kitchen sink.

Compare SVC models to simpler models that represent explicit hypotheses

Comparisons of SVC models to simpler parametric models that represent explicit hypotheses can provide substantial insights into the amount of support for different drivers of spatially varying species-environment relationships (Pease, Pacifici, and Kays 2022; Doser, Kéry, et al. 2024). In Pease, Pacifici, and Kays (2022), the authors compared SVC occupancy models to models that only allowed species-environment relationships to vary across ecoregions (i.e., a broader scale than the SVCs) as an explicit assessment of the scale at which the covariate effects may vary. In Doser, Kéry, et al. (2024), we outlined four simple functional forms of species-environment relationships (i.e., linear, quadratic, stratum, interaction) that may be useful in comparison to an SVC model to further understand the SVC. Figure 3 in Doser, Kéry, et al. (2024) showcases this concept in the context of assessing spatial variation in 20-year occupancy trends. A linear relationship assumes a constant effect across the spatial region. A stratum-specific relationship allows effects to vary across space, which will outperform an SVC model (according to WAIC or some sort of cross-validation) if the spatial variation in the effect is a result of differences across the strata used in the model. An interaction model allows effects to vary in relationship to another known covariate. If the true underlying reason for spatial variation in the effect is a result of the interaction, the interaction model will outperform the SVC model (according to WAIC or cross-validation). Explicit comparison of multiple models of varying complexity (e.g., linear model, interaction model, SVC model, SVC + interaction model) can provide insights into the amount of support there is for spatial variation in a species-environment relationship, and whether that variation is most explained by the interaction or by other spatially varying reasons.

Compare SVC models to a model with only a spatially varying intercept

SVC models use a Gaussian Process (in spOccupancy, an approximation of that process) to estimate the spatial variation in a species-environment relationship. Gaussian Processes are notoriously highly flexible, and so we might question whether the estimated SVCs are truly a result of spatial variation in the species-environment relationship, or if the spatial variability estimated in the SVC is simply the result of additional variables that are not included in the model. This is a completely valid concern, and something we should consider carefully. As discussed before, one of the ways we can protect against this is by only using SVCs for specific covariates for which we hypothesize there may be spatial variation in the effect. Further, whenever using SVCs, I strongly suggest performing explicit comparisons of SVC models that include both a spatially varying intercept and a spatially varying coefficient with a model that only includes a spatially varying intercept and a constant effect of the covariate. When fitting a model with a spatially varying intercept and constant covariate effects (i.e., a spatial occupancy model) the spatially varying intercept accounts for additional spatial variation in occupancy probability that is not accounted for by any of the covariates in the model (e.g., due to missing covariates, biotic processes). When fitting a SVC model with both a spatially varying intercept and spatially varying coefficient, you are now allowing both the intercept and covariate to vary spatially. If residual spatial variability in occupancy probability is only related to missing covariates not included in the model, then a spatial occupancy model should outperform the SVC model, as the additional flexibility the SVC model provides is not needed. This is exactly what we see in Figure 1 of Doser, Finley, et al. (2024), in which the SVC model only outperforms a spatially varying intercept model when (1) there is spatial variation in the effect of the covariate itself; and (2) the range of the spatial variation in the covariate effect is reasonable relative to the spatial extent of the study area. Of course this was shown with relatively simple simulated data. Additional simulation studies under differing characteristics of the covariates and the missing spatial variation could be useful to gain additional insights into how these models perform under differing scenarios.

One could imagine only fitting a model with a spatially varying coefficient and removing a spatially varying intercept from the model. I do not generally recommend doing this given the discussion above unless you have a very good reason to remove the intercept and understand the potential consequences it can have on inference. By fitting a model with only an SVC for a covariate of interest and no spatially varying intercept, this means any additional spatial variation in occupancy probability not explained by the covariates in the model may be ascribed to the SVC, which could lead to inappropriate conclusions regarding the species-environment relationship.

In summary, when exploring nonstationarity with SVC models, I recommend the following:

Fit a spatial occupancy model (i.e., a model with a spatially varying intercept)
Fit a spatially varying coefficient occupancy model that includes a spatially varying intercept and an SVC(s) for the covariate(s) of interest. Do not fit a model with only a spatially varying coefficient and not a spatially varying intercept.
Compare the two models using WAIC to assess whether inclusion of the SVC improved model fit. Out-of-sample assessments (e.g., some sort of hold-out validation) could also be useful, particularly when the analysis involves predictions.

Visualize the SVCs in multiple ways

In addition to formal model comparisons with spatial occupancy models and simpler ways to estimate spatial variation in a species-environment relationship, visualization of the resulting SVCs (and spatially varying intercept) can provide very useful insights. In particular, there are two specific visualizations that I think can be quite useful with SVCs when trying to understand what they mean:

A map of the SVC effects either of the actual data points themselves (e.g. Figure 3b in Finley and Banerjee (2020)) or predicted across a region of interest (e.g., Figure 5a, 6a in Doser, Finley, et al. (2024), Figure 2 in Sultaire et al. (2022)). Looking at the spatial patterning in the SVC and how it relates to any explicit hypotheses you may have had regarding the drivers of the spatial variability in the effect can be very insightful. Further, it can also help generate new hypotheses and insights on the factors related to spatial variability in the species-environment relationship.
A scatter plot showing the relationship between the covariate and the effect of the covariate at each site. In this plot, the x-axis would be the values of the covariate at each site, and the y-axis would be the estimated SVC values at each site. This is a really useful plot that can help reveal (1) the amount of variation in the effect across different values of the covariate; and (2) if the spatial variation in the effect is really more indicative of a specific parametric form. For example, if one were to fit an SVC for a covariate whose effect was truly of a negative quadratic form (e.g., it peaked at some value within the range of the covariate data points), then this plot would help identify that pattern (e.g., the points would show a quadratic pattern). One could probably imagine doing this formally via some statistical test. I haven’t done a thorough literature search to see if this is out there, but it could be something useful to eventually incoporate into spOccupancy.

Consider more informative priors for the spatial variance and/or spatial decay parameters

Each SVC included in an SVC occupancy model (including the spatially varying intercept) is controlled by two parameters: (1) a spatial variance parameter \(\sigma^2\) that controls the magnitude of variability in the effect across space; and (2) a spatial decay parameter \(\phi\) that controls the range of spatial autocorrelation, such that higher values of \(\phi\) correspond to more fine-scale variability and lower values of \(\phi\) correspond to more broad-scale variability (e.g., see Figure 1 in Doser and Stoudt (2024)). By default in spOccupancy, the priors on these two parameters are quite uninformative, and the same for all SVCs. Specifically, the default prior for \(\sigma^2\) is an inverse-Gamma prior with shape of 2 and scale of 1. This is a fairly uninformative prior that gives a prior mean of 1 for the spatial variability (a modest amount of variation) and an infinite variance, indicating that we do not have a large amount of prior information. The prior for the spatial decay parameter \(\phi\) is set to Uniform(\(\frac{3}{\text{MAX}}, \frac{3}{\text{MIN}}\)) where MAX and MIN correspond to the maximum and minimum intersite distances across all locations in the data set. When using an exponential correlation function (the default in spOccupancy), this corresponds to the effective spatial range (i.e., the distance at which correlation between two sites falls to 0.05) taking values anywhere from the mininum intersite distance (very fine scale) to the maximum intersite distance (very broad scale). Altogether, these priors are weakly informative, such that estimation of the parameters is largely driven by the data.

Specifying more informative priors may be particulary useful in SVC models in order to improve one’s confidence in the resulting inferences. For example, earlier in the document I discussed how when including both a spatially varying intercept and a spatially varying coefficient, the spatially varying intercept should account for extra spatial variation that results from missing covariates that influence occupancy probability but are not included in the model. In order to ensure this is the case, we could assign a more informative prior to the spatial variance parameters, such that we give (relative to the coefficient) more prior weight to larger spatial variability in the intercept and more weight to smaller spatial variability in the spatially varying coefficient (relative to the intercept). This would help ensure that any residual spatial variation is ascribed to the intercept as opposed to the SVC. This may be particularly useful when fitting SVC models with a modest number of spatial locations (e.g., a couple hundred), as in such cases there can be difficulty in separate estimating the two spatially varying processes. By giving less weight to the prior variance for the SVC, we are essentially protecting against making false conclusions regarding the effect of the covariate and how it varies across space. If our model reveals substantial spatial variation in the covariate effect when using a prior on the spatial variance that gives more weight to lower spatial variability, then this provides us with more support for there truly being spatial variability in the species-environment relationship.

Further, more informative priors on the spatial decay parameters can be particularly useful for providing more support for inferences regarding spatial variability in a species-environment relationship. Specifying a more informative prior that prevents very fine-scale variability in the covariate effect can be a particularly useful way to prevent potential overinterpretation of spatial variation in a species-environment relationship. For example, instead of setting the uniform prior to Uniform(\(\frac{3}{\text{MAX}}, \frac{3}{\text{MIN}}\)) (the default in spOccupancy) you could set the upper bound to something like \(\frac{3}{q_{0.25}}\), where \(q_{0.25}\) represents the 25% quantile of the intersite distance matrix. This prior sets the lower bound of the effective spatial range to \(q_{0.25}\), which effectively prevents very fine-scale variation in the effect. One could alternatively use a different quantile, or choose the prior based on some underlying ecological phenomenon of the study species (e.g., dispersal limitations). By restricting very fine-scale variation in the effect of the covariate, this could protect against spurious inferences made on very fine-scale patterns of nonstationarity.

Further, setting such a prior on the spatial decay parameter of any SVCs could be particularly useful in combination with a less informative prior (i.e., the default prior) on the decay parameter of the spatially varying intercept. If using this approach, this would ensure that very fine-scale spatial variability that is not explained by any covariates in the model is assigned to the spatially varying intercept, and not (potentially spuriously) assigned to the spatially varying coefficient.

References

Datta, Abhirup, Sudipto Banerjee, Andrew O Finley, and Alan E Gelfand. 2016. “Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.” Journal of the American Statistical Association 111 (514): 800–812.

Doser, Jeffrey W, Andrew O Finley, Sarah P Saunders, Marc Kéry, Aaron S Weed, and Elise F Zipkin. 2024. “Modeling Complex Species-Environment Relationships Through Spatially-Varying Coefficient Occupancy Models.” Journal of Agricultural, Biological, and Environmental Statistics.

Doser, Jeffrey W, Marc Kéry, Sarah P Saunders, Andrew O Finley, Brooke L Bateman, Joanna Grand, Shannon Reault, Aaron S Weed, and Elise F Zipkin. 2024. “Guidelines for the Use of Spatially Varying Coefficients in Species Distribution Models.” Global Ecology and Biogeography, e13814.

Doser, Jeffrey W, and Sara Stoudt. 2024. “‘Fractional replication’ in single-visit multi-season occupancy models: Impacts of spatiotemporal autocorrelation on identifiability.” Methods in Ecology and Evolution 15 (2): 358–72.

Finley, Andrew O, and Sudipto Banerjee. 2020. “Bayesian spatially varying coefficient models in the spBayes R package.” Environmental Modelling & Software 125: 104608.

Meehan, Timothy D, Nicole L Michel, and Håvard Rue. 2019. “Spatial Modeling of Audubon Christmas Bird Counts Reveals Fine-Scale Patterns and Drivers of Relative Abundance Trends.” Ecosphere 10 (4): e02707.

Pease, Brent S, Krishna Pacifici, and Roland Kays. 2022. “Exploring Spatial Nonstationarity for Four Mammal Species Reveals Regional Variation in Environmental Relationships.” Ecosphere 13 (8): e4166.

Smith, Adam C, Allison D. Binley, Lindsay Daly, Brandon PM Edwards, Danielle Ethier, Barbara Frei, David Iles, Timothy D Meehan, Nicole L Michel, and Paul A Smith. 2024. “Spatially Explicit Bayesian Hierarchical Models Improve Estimates of Avian Population Status and Trends.” Ornithological Applications 126 (1): duad056.

Sultaire, Sean M, John M Humphreys, Benjamin Zuckerberg, Jonathan N Pauli, and Gary J Roloff. 2022. “Spatial Variation in Bioclimatic Relationships for a Snow-Adapted Species Along a Discontinuous Southern Range Boundary.” Journal of Biogeography 49 (1): 66–78.

Thorson, James T, Cheryl L Barnes, Sarah T Friedman, Janelle L Morano, and Margaret C Siple. 2023. “Spatially Varying Coefficients Can Improve Parsimony and Descriptive Power for Species Distribution Models.” Ecography 2023 (5): e06510.

Jeffrey W. Doser

2024 (last update: March 14, 2024)