Function for Fitting Linear Mixed Models with Previous Model Estimates

Function for fitting a linear (mixed) model as a second-stage model where the response variable itself comes from a previous model fit and has uncertainty associated with it. The response variable is assumed to be a set of estimates from a previous model fit, where each value in the response variable has a posterior MCMC sample of estimates. This function is useful for doing "posthoc" analyses of model estimates (e.g., exploring how species traits relate to species-specific parameter estimates from a multi-species occupancy model). Such analyses are sometimes referred to as "two-stage" analyses.

Usage

postHocLM(formula, data, inits, priors, verbose = FALSE, 
          n.report = 100, n.samples, n.chains = 1, ...)

Arguments

formula: a symbolic description of the model to be fit for the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015).
data: a list containing data necessary for model fitting. Valid tags are y and covs. y is a matrix or data frame with first dimension equal to the number of posterior samples of each value in the response variable and the second dimension is equal to the number of values in the response variable. For example, if the response is species-specific covariate effect estimates from a multi-species occupancy model, the rows correspond to the posterior MCMC samples and the columns correspond to species. covs is a matrix or data frame containing the independent variables used in the model. Note the number of rows of covs should be equal to the number of columns in y.
inits: a list with each tag corresponding to a parameter name. Valid tags are beta, tau.sq, and sigma.sq. The value portion of each tag is the parameter's initial value. sigma.sq is only relevant when including random effects in the model. See priors description for definition of each parameter name. Additionally, the tag fix can be set to TRUE to fix the starting values across all chains. If fix is not specified (the default), starting values are varied randomly across chains.
priors: a list with each tag corresponding to a parameter name. Valid tags are beta.normal, tau.sq.ig, and sigma.sq.ig. Regression coefficients (beta) are assumed to follow a normal distribution. The hyperparameters of the normal distribution are passed as a list of length two with the first and second elements corresponding to the mean and variance of the normal distribution, which are each specified as vectors of length equal to the number of coefficients to be estimated or of length one if priors are the same for all coefficients. If not specified, prior means are set to 0 and prior variances set to 100. tau.sq is the residual variance, and is assumed to follow an inverse-Gamma distribution. The hyperparameters of the inverse-Gamma distribution are passed as a vector of length two with first and second elements corresponding to the shape and scale parameters, respectively. sigma.sq are the variances of any random intercepts included in the model, which similarly to tau.sq follow an inverse-Gamma distribution. The hyperparameters of the inverse-Gamma distribution are passed as a list of length two with first and second elements corresponding to the shape and scale parameters, respectively, which are each specified as vectors of length equal to the number of random intercepts or of length one if priors are the same for all random effect variances.
verbose: if TRUE, messages about data preparation, model specification, and progress of the sampler are printed to the screen. Otherwise, no messages are printed.
n.report: the interval to report MCMC progress.
n.samples: the number of posterior samples to collect in each chain. Note that by default, the same number of MCMC samples fit in the first stage model is assumed to be fit for the second stage model. If n.samples is specified, it must be a multiple of the number of samples fit in the first stage, otherwise an error will be reported.
n.chains: the number of chains to run in sequence.
...: currently no additional arguments

References

Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01 .

Author

Jeffrey W. Doser doserjef@msu.edu,

Value

An object of class postHocLM that is a list comprised of:

beta.samples: a coda object of posterior samples for the regression coefficients.
tau.sq.samples: a coda object of posterior samples for the residual variances.
y.hat.samples: a coda object of posterior samples of fitted values.
sigma.sq.samples: a coda object of posterior samples for the random effect variances if any random intercepts were included in the model.
beta.star.samples: a coda object of posterior samples for the random effects. Only included if random intercepts are specified in formula.
rhat: a list of Gelman-Rubin diagnostic values for some of the model parameters.
ESS: a list of effective sample sizes for some of the model parameters.
run.time: execution time reported using proc.time().
bayes.R2: a coda object of posterior samples of the Bayesian R-squared as a measure of model fit. Note that when random intercepts are included in the model, this is the conditional Bayesian R-squared, not the marginal Bayesian R-squared.

The return object will include additional objects used for subsequent summarization.

Examples

# Simulate Data -----------------------------------------------------------
set.seed(100)
N <- 100
beta <- c(0, 0.5, 1.2)
tau.sq <- 1 
p <- length(beta)
X <- matrix(1, nrow = N, ncol = p)
if (p > 1) {
  for (i in 2:p) {
    X[, i] <- rnorm(N)
  } # i
}
mu <- X[, 1] * beta[1] + X[, 2] * beta[2] + X[, 3] * beta[3]
y <- rnorm(N, mu, sqrt(tau.sq))
# Replicate y n.samples times and add a small amount of noise that corresponds
# to uncertainty from a first stage model.
n.samples <- 1000
y <- matrix(y, n.samples, N, byrow = TRUE)
y <- y + rnorm(length(y), 0, 0.25)

# Package data for use with postHocLM -------------------------------------
colnames(X) <- c('int', 'cov.1', 'cov.2')
data.list <- list(y = y, covs = X)
data <- data.list
inits <- list(beta = 0, tau.sq = 1)
priors <- list(beta.normal = list(mean = 0, var = 10000),
               tau.sq.ig = c(0.001, 0.001))

# Run the model -----------------------------------------------------------
out <- postHocLM(formula = ~ cov.1 + cov.2, 
                 inits = inits, 
                 data = data.list, 
                 priors = priors, 
                 verbose = FALSE, 
                 n.chains = 1)
summary(out)
#> 
#> Call:
#> postHocLM(formula = ~cov.1 + cov.2, data = data.list, inits = inits, 
#>     priors = priors, verbose = FALSE, n.chains = 1)
#> 
#> Samples per Chain: 1000
#> Number of Chains: 1
#> Total Posterior Samples: 1000
#> Run Time (min): 4e-04
#> 
#> ----------------------------------------
#> Fixed Effects: 
#> ----------------------------------------
#>                     Mean     SD    2.5%    50%  97.5% Rhat  ESS
#> (Intercept)       0.0142 0.1079 -0.2008 0.0150 0.2333   NA  976
#> cov.1             0.5709 0.1079  0.3610 0.5712 0.7820   NA 1000
#> cov.2             0.8703 0.1400  0.6010 0.8673 1.1417   NA 1000
#> Residual Variance 1.1126 0.1710  0.8248 1.1007 1.4673   NA  516
#> 
#> ----------------------------------------
#> Bayesian R2: 
#> ----------------------------------------
#>    Mean    SD   2.5%    50%  97.5%
#>  0.3949 0.068 0.2649 0.3986 0.5236