Get movement tracking datasets ready for modeling, fit the model using cmdstanr or rstan
Source:vignettes/prepare-data-for-stan.Rmd
prepare-data-for-stan.Rmd
library(MortalMove)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Required data structure
You probably have two data frames: one with the GPS fixes with timestamps and one with the individual information (e.g., age, sex, fate, etc.). The first one should have the following columns:
-
animal_id
: unique identifier for each individual -
timestamp
: date-time of the GPS fix (converted tolubridate::ymd_hms()
format, or numeric format in seconds) -
x
: x-coordinate (in meters) -
y
: y-coordinate (in meters) - Other habitat covariates you want to include. Please note that if
you want to use categorical variables (e.g., 4 season), you should input
the variables as independent columns (e.g.,
season_spring
,season_summer
,season_autumn
, with winter as reference level), where each column is a binary variable indicating the presence of that season. All values should be numeric.
Here, I included prey availability (prey_avail
) and
hunter density (hunter
) as habitat covariates.
head(df)
#> x y animal_id timestamp prey_avail hunter
#> 1 992.94402 583.03060 1 1 448.6998 1.930454e-03
#> 2 253.09899 165.02589 1 2 585.7532 2.273980e-18
#> 3 49.53844 96.30111 1 3 471.9762 3.723363e-25
#> 4 686.32495 428.61309 1 4 401.6691 4.393693e-02
#> 5 786.92735 355.76157 1 5 524.8925 3.726653e-06
#> 6 353.60608 844.93354 1 6 477.7169 1.637377e-07
The second one should have the following columns: -
animal_id
: unique identifier for each individual -
fate
: fate of the individual (e.g., 0 for alive, 1 for
dead) - Other habitat covariates you want to include. Please note that
if you want to use categorical variables (e.g., sex), you should input
the variables as independent columns (e.g., `sex_male’, with 1 for male,
0 for female). All values should be numeric.
Here, I included age and sex as individual covariates.
head(indv)
#> age sex_m animal_id fate
#> 1 16 0 1 0
#> 2 20 0 2 0
#> 3 6 0 3 0
#> 4 11 0 4 0
#> 5 8 1 5 0
#> 6 7 0 6 0
Prepare data for modeling
First, we need to integrate these two tables into one data frame,
which will be used for modeling. The dplyr::left_join()
function does this for you. Further, we need to change column name
fate
to delta
.
df <- left_join(df, indv, by = "animal_id")
df <- df %>%
dplyr::rename(delta = fate)
head(df)
#> x y animal_id timestamp prey_avail hunter age sex_m
#> 1 992.94402 583.03060 1 1 448.6998 1.930454e-03 16 0
#> 2 253.09899 165.02589 1 2 585.7532 2.273980e-18 16 0
#> 3 49.53844 96.30111 1 3 471.9762 3.723363e-25 16 0
#> 4 686.32495 428.61309 1 4 401.6691 4.393693e-02 16 0
#> 5 786.92735 355.76157 1 5 524.8925 3.726653e-06 16 0
#> 6 353.60608 844.93354 1 6 477.7169 1.637377e-07 16 0
#> delta
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
Next, we prepare the data for modeling. The
prepare_data_for_stan()
function does this for you. You can
specify which habitat covariates and individual covariates you want to
include in the model, as well as whether you want to include a cell
effect (i.e., a random effect for each grid cell). You can also specify
the grid bounds and resolution if you want to include a cell effect. The
grid bounds should be specified as a list with xmin
,
xmax
, ymin
, and ymax
values, and
the grid resolution should be specified in meters. Please note that the
grid bounds should cover the entire area where your data is collected,
and the difference between xmax
and xmin
,
ymax
and ymin
should be divisible by the grid
resolution.
df_stan_data <- prepare_stan_data(
df = df,
include_hab_cov = TRUE, include_ind_cov = TRUE,
hab_cov_names = c("hunter", "prey_avail"),
ind_cov_names = c("age", "sex_m"),
include_cell_effect = TRUE,
grid_bounds = list(xmin = 0, xmax = 1000, ymin = 0, ymax = 1000),
grid_res = 100
)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
names(df_stan_data)
#> [1] "n" "max_locs" "n_locs" "time_step"
#> [5] "delta" "n_knots" "knots_ce" "sigma"
#> [9] "rho" "cell_mat" "ind_cell_effect" "beta_prior"
#> [13] "llambda_prior" "alpha_prior" "num_hab_covs" "hab_cov"
#> [17] "num_indv_covs" "z"
Fit model using cmdstanr
It is generally faster to fit the model using cmdstanr
than rstan
, but requires a bit compilation before running.
You can install cmdstanr
from GitHub, and then set the
path to the CmdStan installation directory using
set_cmdstan_path()
. You can specify the number of chains,
iterations, and other parameters as needed.
library(cmdstanr)
set_cmdstan_path("YOUR_CMD_PATH")
model_file <- system.file("stan/mortality_model.stan", package = "MortalMove")
mod <- cmdstan_model(model_file, exe_file = system.file("stan/mortality_model.exe", package = "MortalMove"))
fit <- mod$sample(
data = df_stan_data,
chains = 2,
parallel_chains = 2,
iter_warmup = 50,
iter_sampling = 150,
seed = 123
)
Fit model using rstan
If you prefer to use rstan
, you can do so as well. It’s
much simpler to install, but
may take longer to fit than cmdstanr
, especially for
complicated models with spatial effects.
library(rstan)
model_file <- system.file("stan/mortality_model.stan", package = "MortalMove")
fit <- stan(
file = model_file,
data = df_stan_data,
chains = 2,
iter = 200,
warmup = 50,
seed = 123
)