bean reduces sampling bias in species occurrence data by
thinning it in environmental space rather than in
geographic space. The result is a cleaner training set for species
distribution models (SDM / ENM).
The protocol is:
prepare_bean().find_env_resolution(), which selects a kernel-density
bandwidth for each environmental variable.thin_env_nd()
(stochastic) or thin_env_center() (deterministic).fit_ellipsoid().predict() on
the fitted ellipsoid.data(origin_dat_prepared, package = "bean")
env_vars <- c("bio_1", "bio_4", "bio_12", "bio_15")
# 1. Pick an objective grid resolution from the data
res <- find_env_resolution(origin_dat_prepared, env_vars = env_vars)
res
#> --- Bean environmental grid resolution ---
#> Bandwidth selector: sheather-jones
#>
#> variable resolution
#> bio_1 0.056162684
#> bio_4 0.013438324
#> bio_12 0.004615848
#> bio_15 0.006501067
# 2. Thin in environmental space
thinned <- thin_env_nd(
data = origin_dat_prepared,
env_vars = env_vars,
grid_resolution = res$suggested_resolution,
seed = 1
)
thinned
#> --- Bean Stochastic Thinning Results ---
#>
#> Thinned 1024 original points to 78 points.
#> This represents a retention of 7.6% of the data.
#>
#> --------------------------------------The remaining vignettes walk through each step in detail.