3. Creating configurations (gen3sis input)

21.11.2023

The configuration object defines the core processes of the system modeled. By defining some run settings the configuration object is essential to define the model to be simulated and facilitates formalization of eco-evolutionary processes.

This vignette will guide you trough the config input object. It can be either an R object or a file, that can be easily shared and modified.

Here we will:

1. Create an empty config using the write_config_skeleton function.

2. Explain an example config using the config from introduction vignette.

3. Modify a config from within R.

library(gen3sis)

1. Create an empty config

In order to create an empty config, we use the function write_config_skeleton and define the output location. For that we have to define where the empty config will be stored.

# get data path
datapath <- system.file(file.path("extdata", "EmptyConfig"), package="gen3sis")
# set config_empty.R file path
config_file_path <- file.path(tempdir(), "config_empty.R")

And then create an empty config.

#writes out a config skeleton
write_config_skeleton(config_file_path)

This is how the config_empty.R looks like. We present them by main areas and each part is described briefly. For more information and a hand’s on example, see next session.

Content of config_empty.R

Metadata

# Version: 1.0
# Author:
# Date:
# Landscape:
# Publications:
# Description:

Settings

# set the random seed for the simulation.
random_seed = NA
# set the starting time step or leave NA to use the earliest/highest time-step.
start_time = NA
# set the end time step or leave as NA to use the latest/lowest time-step (0).
end_time = NA
# maximum total number of species in the simulation before it is aborted.
max_number_of_species = 25000
# maximum number of species within one cell before the simulation is aborted.
max_number_of_coexisting_species = 2500
# a list of traits to include with each species.
# a "dispersal" trait is implicitly added in any case.
trait_names = c("dispersal")
# ranges to scale the input environments with:
# not listed variable:         no scaling takes place
# listed, set to NA:           env. variable will be scaled from [min, max] to [0, 1]
# listed with a given range r: env. variable will be scaled from [r1, r2] to [0, 1]
environmental_ranges = list( )

Observer

end_of_timestep_observer = function(data, vars, config){
 # the list of all species can be found in data$all_species.
 # the current landscape can be found in data$landscape.
}

Initialization

# the initial abundance of a newly colonized cell, both during setup and later when 
# colonizing a cell during the dispersal.
initial_abundance = 1
# place species in the landscape:
create_ancestor_species <- function(landscape, config) {
stop("create the initial species here")
}

Core Processes

### Dispersal
# the maximum range to consider when calculating the distances from local distance inputs.
max_dispersal <- Inf
# returns n dispersal values.
get_dispersal_values <- function(n, species, landscape, config) {
 stop("calculate dispersal values here")
}
### Speciation
# threshold for genetic distance after which a speciation event takes place.
divergence_threshold = NULL
# factor by which the divergence is increased between geographically isolated population.
# can also be a matrix between the different population clusters.
get_divergence_factor <- function(species, cluster_indices, landscape, config) {
 stop("calculate divergence factor here")
}
### Evolution
# mutate the traits of a species and return the new traits matrix.
apply_evolution <- function(species, cluster_indices, landscape, config) {
 stop("mutate species traits here")
}
### Ecology
# called for every cell with all occurring species, this function calculates abundances and/or who survives for each sites
# returns a vector of abundances.
# set the abundance to 0 for every species supposed to die.
apply_ecology <- function(abundance, traits, environment, config) {
 stop("calculate species abundances and deaths here")
}

Note that the main core functions have stop warnings by default, this forces the configuration to be edited according to the rules and assumptions (model) that is to be created. You can do this from scratch or by modifying an existing config.

2. Explain example config

Here we will go bit by bit of the config used at the introduction vignette vignette. We will follow the structure presented above.

Content of config_.empty_southamerica.R

Metadata

The first section of the config file contains metadata. It informs the user when and by whom the file has been created, which landscape it was made for and informs about associated publications and implemented processes or exploration intentions of the specific configuration.

# Version: 1.0
# Author: Oskar Hagen
# Date: 26.10.2020
# Landscape: SouthAmerica
# Publications: R-package gen3sis
# Description: Example config used at the introduction vignette and similar to case study global configs in Hagen et al. 2020.
# O. Hagen, B. Flück, F. Fopp, J.S. Cabral, F. Hartig, M. Pontarp, T.F. Rangel, L. Pellissier. gen3sis: the general engine for eco-evolutionary simulations on the origins of biodiversity.

Settings

We start our simulation 40 Ma, which corresponds to time step 40 since 1 time-step = 1 myr. We end it a the latest available time-step at the landscape, thus start_time is set to 40 and end_time is set to NA. We limit the maximum total number of species alive in a simulation to 50000 with max_number_of_species and the maximum number of species within one site to 10000 with max_number_of_coexisting_species. Like so, simulations that could possible go off hand and generate too many species are stopped and properly flagged at the sgen3sis summary object. We define which traits we will consider in our simulation with traits_names, in our example, species only have a dispersal ability and a optimum temperature traits. For our model to work, we normalize the environmental values, according to the range informed at environmental_ranges.

random_seed = 6
start_time = 40
end_time = NA
max_number_of_species = 50000
max_number_of_coexisting_species = 10000
trait_names = c("temp",  "dispersal")
environmental_ranges = list("temp" = c(-45, 55), "area"=c(2361.5, 12923.4), 
   "arid"=c(1,0.5))

Observer

With the observer function, changes over time in any abiotic or biotic information of the virtual world can be recorded by defining the outputs that are saved at specified time steps, and results can be saved and plotted in real-time as the model runs. In our observer function we save the species and plot the richness. Like so, plots are printed as the simulation progresses and also saved inside the plot folder inside output.

end_of_timestep_observer = function(data, vars, config){
 save_species()
 plot_richness(data$all_species, data$landscape)
}

Combined plots can be generated inside the observer function as for example:

end_of_timestep_observer = function(data, vars, config){
   par(mfrow=c(2,3))
   plot_raster_single(data$landscape$environment[,"temp"], data$landscape, "temp", NA)
   plot_raster_single(data$landscape$environment[,"arid"], data$landscape, "arid", NA)
   plot_raster_single(data$landscape$environment[,"area"], data$landscape, "area", NA)
   plot_richness(data$all_species, data$landscape)
   plot_species_presence(data$all_species[[1]], data$landscape)
   plot(0,type='n',axes=FALSE,ann=FALSE)
   mtext("STATUS",1)
}

Initialization

The initialization function creates the ancestor species at the start of the simulation. Users can define the number of ancestor from a single to multiple species, their distribution within the paleolandscape and their trait values. In our example, the initial abundance of the 10 ancestor species, as well as the abundance of species on newly colonized sites is 1. We restrict the region of colonization to South America by limiting the range to a spatial extent of c(-95, -24, -68, 13). We then create ten different species, each of which is randomly inhabiting one of the sites of the continent. Thereby all of the sites have an equal probability of being inhabited by a species in the beginning of the simulation. However since the sites vary in their size, the habitation probability per area is not the same for all regions. If you want to correct for that you can consider using an equal area transformation of your coordinates or tweeking sampling functions to account for area correction. We set the optimum temperature trait temp of each population of the initial species to the environmental temperature. This means, that each population is adapted to the local conditions, as we will see later again at the ecology function. Dispersal is set to one, since it is not allowed to evolve, we use it’s maximum value. The function create_ancestor_species is expected to return a list of new species, i.e. the ancestor(s) one(s).

initial_abundance = 1
create_ancestor_species <- function(landscape, config) {
 range <- c(-95, -24, -68, 13)
 co <- landscape$coordinates
 selection <- co[, "x"] >= range[1] &
   co[, "x"] <= range[2] &
   co[, "y"] >= range[3] &
   co[, "y"] <= range[4]
 new_species <- list()
 for(i in 1:10){
   initial_cells <- rownames(co)[selection]
   initial_cells <- sample(initial_cells, 1)
   new_species[[i]] <- create_species(initial_cells, config)
   #set local adaptation to max optimal temp equals local temp
   new_species[[i]]$traits[ , "temp"] <- landscape$environment[initial_cells,"temp"]
   new_species[[i]]$traits[ , "dispersal"] <- 1 
   plot_species_presence(landscape, species=new_species[[i]])
 }
 return(new_species)
}

Core Processes

Dispersal

The dispersal function iterates over all species populations and determines the connectivity between sites and the colonization of new sites in the grid cell. In our example, species dispersal between time-steps is stochastic and similar to all species. It follows a Weibull distribution with shape 1.5 and scale 133. For different dispersal between species, the dispersal trait should be defined. Note that dispersal can be made deterministic by making the function return a fix value.

get_dispersal_values <- function(n, species, landscape, config) {
 values <- rweibull(n, shape = 1.5, scale = 133)
 return(values) 
}

The histogram of 100 draws from the dispersal function used above.

n <- 100
hist(rweibull(n, shape = 1.5, scale = 133), col="black")

Speciation

The speciation iterates over every species separately, registers populations’ geographic occupancy (species range), and determines when geographic isolation between population clusters is higher than a user-defined threshold, triggering a lineage splitting event of cladogenesis. The clustering of occupied sites is based on the species’ dispersal capacity and the landscape connection costs. Over time, disconnected clusters gradually accumulate incompatibility, analogous to genetic differentiation. When the divergence between clusters is above the speciation threshold, those clusters become two or more distinct species, and a divergence matrix reset follows. On the other hand, if geographic clusters come into secondary contact before the speciation occurs, they coalesce and incompatibilities are gradually reduced to zero. In our example, speciation takes place after 2 time-steps of isolation and the divergence increase is the same for all species as indicated by get_divergence_threshold. Since our landscape consists of 1 myr time-steps, these 2 time-steps correspond to a span of 2 myr.

divergence_threshold = 2 
get_divergence_factor <- function(species, cluster_indices, landscape, config) {
 return(1) 
 }

Evolution

In the evolution function, clustered populations (exchanging genes) had their trait homogenized and weighted by abundance, meaning that a trait of a population that is doing well in a site, as dictated by the ecology function, will contribute more to the average trait of a cluster. Later, populations mutate based on a normal distribution with standard deviation 0.001, possibly increasing or decreasing species optimum temperature.

# mutate the traits of a species and return the new traits matrix
apply_evolution <- function(species, cluster_indices, landscape, config) {
 trait_evolutionary_power <- 0.001
 traits <- species[["traits"]]
 cells <- rownames(traits)
 #homogenize trait based on abundance
 for(cluster_index in unique(cluster_indices)){
   cells_cluster <- cells[which(cluster_indices == cluster_index)]
   mean_abd <- mean(species$abundance[cells_cluster])
   weight_abd <- species$abundance[cells_cluster]/mean_abd
   traits[cells_cluster, "temp"] <- mean(traits[cells_cluster, "temp"]*weight_abd)
 }
 #mutations
 mutation_deltas <-rnorm(length(traits[, "temp"]), mean=0, sd=trait_evolutionary_power)
 traits[, "temp"] <- traits[, "temp"] + mutation_deltas
 return(traits)
}

Ecology

The ecology function determines the abundance or presence of the populations in occupied sites of each species. The function iterates over all occupied sites and updates the species population abundances or presences on the basis of local environmental values, updated co-occurrence patterns and species traits. The function takes as input the species abundance, species trait, species divergence and clusters, and the landscape values. In our example, we calculate abundances in a site based on how close the species is to the site temperature. We scale the values to avoid small numbers and apply a carrying capacity based on aridity and temperature corrected by the area of a site. If abundances are below 1, species are considered extinct, if total abundance in a site is above the carrying capacity, small abundances are removed progressively and randomly distributed across the present species until total abundance is smaller or equal to the carrying capacity.

apply_ecology <- function(abundance, traits, landscape, config) {
 abundance_scale = 10
 abundance_threshold = 1
 #abundance treashold
 survive <- abundance>=abundance_threshold
 abundance[!survive] <- 0
 abundance <- (( 1-abs( traits[, "temp"] - landscape[, "temp"]))*abundance_scale)*as.numeric(survive)
 #abundance thhreashold
 abundance[abundance<abundance_threshold] <- 0
 k <- ((landscape[,"area"]*(landscape[,"arid"]+0.1)*(landscape[,"temp"]+0.1))*abundance_scale^2)
 total_ab <- sum(abundance)
 subtract <- total_ab-k
 if (subtract > 0) {
   # print(paste("should:", k, "is:", total_ab, "DIFF:", round(subtract,0) ))
   while (total_ab>k){
     alive <- abundance>0
     loose <- sample(1:length(abundance[alive]),1)
     abundance[alive][loose] <- abundance[alive][loose]-1
     total_ab <- sum(abundance)
   }
   #set negative abundances to zero
   abundance[!alive] <- 0
 }
 return(abundance)
}

3. Modify a config

A configuration object can be modified by editing a file or the config object in R. Here we will load the config_southamerica.R as an R object and modify the initialization of species. Instead of having multiple species that occupy one site each, we will now initialize one species that occurs in all habitable sites and starts with abundance 10.

# get data path
datapath <- system.file(file.path("extdata", "SouthAmerica"), package="gen3sis")
# creates config object from config file
config_object <- create_input_config(file.path(datapath, "config/config_southamerica.R"))

# modify the initialization function
config_object$gen3sis$initialization$create_ancestor_species <- function(landscape, config) {
  range <- c(-95, -24, -68, 13)
  co <- landscape$coordinates
  selection <- co[, "x"] >= range[1] &
    co[, "x"] <= range[2] &
    co[, "y"] >= range[3] &
    co[, "y"] <= range[4]

  initial_cells <- rownames(co)[selection]
  new_species <- create_species(initial_cells, config)
  #set local adaptation to max optimal temp equals local temp
  new_species$traits[ , "temp"] <- landscape$environment[initial_cells,"temp"]
  new_species$traits[ , "dispersal"] <- 1
  new_species$abundance <- new_species$abundance*10
  return(list(new_species))
}

Additionally, we will now implement a simpler ecology function in our config_object. Abundances in a site are defined by how close the species population optimum temperature is to the site temperature. The abundance_threshold parameter defines the niche width cutoff. The higher the value, the narrower is the temperature range of the niche.

# modify the ecology function
config_object$gen3sis$ecology$apply_ecology <- function(abundance, traits, landscape, config, abundance_scale = 10, abundance_threshold = 8) {
  #abundance threshold
  abundance <- as.numeric(!abundance<abundance_threshold)
  abundance <- (( 1-abs( traits[, "temp"] - landscape[, "temp"]))*abundance_scale)*abundance
  #abundance threshold
  abundance[abundance<abundance_threshold] <- 0
  return(abundance)
}

Like so, functions and parameters can be conveniently changed and simulated in gen3sis. Remember that run_simulation works either with a path to a config file or config object, so we can run the old and the new modified config.

To compare the initial modified config, we will load the old config and set the old and the new config to run only for 3 time-steps and plot species ranges plot_ranges

config_object_old <- create_input_config(file.path(datapath, "config/config_southamerica.R"))

# define observer
config_object_old$gen3sis$general$end_of_timestep_observer <- function(data, vars, config){
  plot_ranges(data$all_species, data$landscape, disturb=0, max_sps=10)
}

config_object$gen3sis$general$end_of_timestep_observer <- config_object_old$gen3sis$general$end_of_timestep_observer 

# define only 3 time-steps
config_object_old$gen3sis$general$end_time <- 38
config_object$gen3sis$general$end_time <- 38

Before using the modified configs, test if it is valid. If the function verify_config returns TRUE, you are good to go.

verify_config(config_object_old)
#> [1] TRUE
verify_config(config_object)
#> [1] TRUE

Run the modified old config

sim_old <- run_simulation(config = config_object_old,
               landscape = file.path(datapath, "landscape"),
               output_directory=tempdir(),
               call_observer = 4,
               verbose=0) # no progress printed

And run the modified new config

sim_new <- run_simulation(config = config_object,
                          landscape = file.path(datapath, "landscape"),
                          verbose=0, # no progress printed
                          output_directory=tempdir())