In this vignette you will be introduced to gen3sis, an engine for simulating various spatial eco-evolutionary models. Here will create a virtual planet, biodiversify it and examine virtual species.
In particular, we will go through the following steps:
1. Set-up gen3sis package and check if we are set with all the necessary input data.
2. Run one simple example simulation
3. Visualize the outputs
4. Analyze the outputs
The first step for running a simulation consists of setting up the folder structure that should contain input data (i.e. config and landscape).
First of all, load the gen3sis package.
From now on, if you run into problems, refer to gen3sis the for the full help documentation.
Let’s check the version of the gen3sis package we are using
All gen3sis simulations need a landscape object which define the environment over which the simulation will take place and a configuration object which defines the mechanisms of speciation, dispersal, trait evolution and ecological interactions. Here we will use a published example1. We will consider the continent of South America with a time span from 65Ma to 0Ma (1 time-step = 1 myr) and a spatial resolution of 1°.
All the data we will use here (i.e. the landscape and the configuration object) is available in the simulation repository, so you need to download the data (i.e. landscape, configs and outputs) and set the correct datapath. For more information about these input data, please check the associated metadata.This is stored in the METADATA.txt file inside the landscape folder and, for the configuration, as a comment in the beginning of the config R script.
To access this dataset, define the path to the data contained inside the package.
The full landscape can be downloaded and stored locally for running the experiment. An experiment folder should look like this:
>EXPERIMENT_FOLDER |
---|
>landscape |
>distances_local |
>landscapes.rds |
>METADATA.txt |
>config |
>config.R |
The landscape is the virtual environment in which the processes of speciation, dispersal, evolution and ecology take place (e.g. temperature and aridity data that might be spatially and temporally dynamic). Landscape objects used in gen3sis are generated based on temporal sequences of landscapes in the form of raster files, which are summarized in the form of two classes.
The first landscape class contains: (i) the geographic coordinates of the landscape site, (ii) the corresponding information on which sites are generally suitable for the organisms modeled (e.g. land or ocean), and (iii) the environmental conditions relevant for the species’ ecology (e.g. temperature and aridity). The landscape may be simplified into a single geographic axis for theoretical experiments, or it may consider realistic configurations aimed at reproducing real local or global landscapes.
The second landscape class defines the permeability of the landscape for species movement between sites via dispersal. Connection cost between sites is computed for each time step from the gridded landscape data based on harvesine geographic distances modified by a user-defined cost function and stored as sparse distance matrices. Distance matrices, containing the connection costs, are provided at every time step as either: (i) a pre-computed full distance matrix, containing all habitable sites in the landscape (faster simulations but more storage required); or (ii) a local distance matrix, computed from neighboring site distances up to a user-defined range limit (slower simulations but less storage required).
NOTE the gen3sis landscape contains several files. The landscape.rds file contains the landscape changing over time with its environmental variables. The distances folder stores the information on how the landscape is connected (connections costs) which will interact with the disperal ability of the modeled organisms. To see how to create a landscape object, refer to design_landscape vignette and create_input_landscape vignette.
The configuration are the rules of our virtual planet, it is where we define all the ecological and evolutionary equations and parameters.
The configuration object includes information on the model initialization, the observer function and the biological functions, namely speciation, dispersal, evolution, and ecology. The possibility of the user to customize these functions confers the flexibility and generality of gen3sis. The initialization function creates the ancestor species at the start of the simulation. Users can define the number of species, their distribution within the landscape and their trait values. The observer function allows recording any abiotic or biotic information of the virtual world as it changes by defining the outputs that are saved at specified time-steps. The configuration further allows to define the speciation function, the dispersal function, the evolution function and the ecology function, which are defined in the section on core processes below. Altogether, those six functions are specified in the configuration object and interact with a set of core objects defined in the simulation engine. The configuration object further lists the ecological traits considered in the simulation, sets a random seed allowing simulation reproducibility, start and end times of the simulation, as well as simulation aborting rules including the limits for a total or local number of species.
In our example, we start our simulation with the 40\(^{th}\) time-step and end it a the latest available time-step of the landscape, i.e. 40 to 0Ma. We set ten different ancestor species, each colonizing one random cell in the landscape, with initial optimum temperature equal to the local temperature of the site. The dispersal of all species is the same and stochastic, following a Weibull distribution with shape 1.5 and scale 133. Speciation takes place after every second time-step (2 Myr) of isolation. Clustered populations (exchanging genes) had their trait homogenized and weighted by abundance, meaning that a temperature optimum of populations that are doing well in a site will contribute more to the average trait of a cluster. Temperature optimum evolves based on a normal distribution with standard deviation 0.001, possibly increasing or decreasing species optimum temperature. Abundances in a site are based on how close the species population optimum temperature is to the site temperature. We scale the values to avoid small numbers and apply a carrying capacity based on aridity and temperature corrected by the area. If abundances are below 1, species are considered extinct, if total abundance in a site (sum of all species in a site) is above the carrying capacity, small abundances are removed progressively and randomly distributed across the present species until total abundance is smaller or equal to the carrying capacity. For more details regarding the configuration used here and how to modify it, please refer to create_config vignette.
Now that we have set up the general folder structure, the input data and configuration file, we can run a simulation using the run_simulation function. This function will:
Read in the config and prepare the output directories, initial landscape and call create_ancestor_species from the user config to create the initial species for the simulation.
Start the main loop over the time-steps. For every time-step it loads the appropriate landscape, removes all sites that became uninhabitable in the new time-step, and calls the main steps of any iteration.
At the end of every time-step, if desired, the simulation saves the species, landscapes, species richness patterns, etc… by calling the end_of_timestep_observer from the user config. In this function the user can implement customized observer functions, for example calculating summary statistics or creating species patterns plots.
When the simulation reaches the end, a phylogeny, runtime_information, the world starting conditions and the used config are saved at the output folder. If the output folder is not specified, it is deduced from the location of the landscape object. The function will return a summary object. In this vignette we will save it as sim.
Additionally to the main functions, the package provides several convenience functions to generate input data, configuration files, and plots. Moreover, all functions are accessible to the observer functions, which requests the variables for calculation during the model runs. This allows to transform the output of the simulation model into a format that can be readily analyzed.
To launch a simulation you need to call the run_simulation function. We only store one intermediate step between the starting and end time-steps by setting “call_observer=1”, this means one time-step equally distributed between the starting and ending time-steps. By doing so, we will have a look at our virtual world at 40, 20 and 0 Ma.
# we set verbose to 0 to avoid a large console outputs of how the simulation is
# developing
sim <- run_simulation(config = file.path(datapath, "config/config_southamerica.R"),
landscape = file.path(datapath, "landscape"), output_directory = tempdir(), call_observer = 1,
verbose = 0)
After the simulation is finished, a plot is created inside the SouthAmerica folder to show the state of the initial world. They are automatically stored in a sub folder called output inside our SouthAmerica folder and inside another folder named after our configuration file (i.e. config). It should look like this:
>EXPERIMENT_FOLDER |
---|
>landscape |
>config |
>output |
After having run the simulation, we can load and plot the species richness at different time-steps. Here, we will load them at the time-steps 40, 20 and 0 corresponding to 40, 20 and 0 Ma (1 time-step = 1 myr).
timesteps <- c(40, 20, 0)
oldpar <- par(no.readonly = TRUE)
par(mfrow = c(1, 3))
for (i in timesteps) {
landscape_i <- readRDS(file.path(datapath, paste0("output/config_southamerica/landscapes/landscape_t_",
i, ".rds")))
species_i <- readRDS(file.path(datapath, paste0("output/config_southamerica/species/species_t_",
i, ".rds")))
plot_richness(species_i, landscape_i)
}
par(oldpar)
gen3sis also returns a summary object at the end of the simulation storing default summary statistics and important data over time. One of the default observer summaries is the species richness over time. NOTE You can also create your own observer function.
We can quickly visualize some main results using the plot_summary function. Here we stored the summary object (sgen3sis class) as sim.
There are a few visualization tools already included in the package, but you are free to explore and check the outputs with your favorite plotting functions and colors.
Now that we have run the simulation, we are ready to perform some analysis to investigate the model behavior. Because we set a limit to the total abundances lower in colder and in more arid environments, we expect that the number of species that can co-exist also decreases in those environments. We will perform a few statistical analyses to investigate those patterns based on the species richness pattern that emerged at the end of the simulation. To do so, we first load the landscape and combine it with the simulated species richness.
landscapes <- readRDS(file.path(datapath, "landscape", "landscapes.rds")) #get the input landscape
landscape_t0 <- as.data.frame(cbind(landscapes$temp[, 1:2], temp = landscapes$temp[,
3], arid = landscapes$arid[, 3], area = landscapes$area[, 3])) #get landscape at last time-step
landscape_t0 <- cbind(landscape_t0, rich = sim$summary$`richness-final`[, 3]) #add richness to the dataframe
landscape_t0 <- na.omit(landscape_t0)
Much is possible here, from looking at phylogenies to spatial correlations. For this introduction we will keep things simple and investigate the relationship between richness and environmental variables at the final time-step of the simulation. For this, we will fit two univariate models, where each explains the relationship between species richness and one of the environmental variables.
For this, we first fit a generalized linear model between richness and temperature.
glm.uni <- glm(rich ~ poly(temp, 2), data = landscape_t0, family = poisson)
cor(landscape_t0$temp, landscape_t0$rich)
#> [1] 0.588251
Second, we plot the response curve.
# prepare data with temperature and predicted richness from our model
data_plot <- data.frame(cbind(landscape_t0$temp, predict(glm.uni, type = "response")))
# sort data for plotting and ommit NA's
data_plot <- na.omit(data_plot[order(data_plot[, 1], decreasing = FALSE), ])
# get the number of observations
n <- paste0("observations (n = ", length(landscape_t0$rich), ")")
# plot model curve
plot(data_plot[, 1], data_plot[, 2], xlab = "Temperature [°C]", ylab = expression(paste(alpha,
" richness")), frame.plot = F, type = "l", col = "red", lwd = 2, xlim = c(min(landscape_t0$temp),
max(landscape_t0$temp)), ylim = c(min(landscape_t0$rich), max(landscape_t0$rich)))
# add observed points
points(landscape_t0$temp, landscape_t0$rich, col = rgb(0.5, 0.5, 0.5, alpha = 0.4),
pch = 16)
# add legend
legend(-20, 30, col = c(rgb(0.5, 0.5, 0.5, 0.4), "red"), legend = c(n, "model fit"),
pch = c(16, NA), lty = c(NA, 1), lwd = c(NA, 2), bty = "n")
By looking at the model fit of the generalized linear model we can see that the relationship between the two variables is rather hump-shaped, indicating that species richness is highest at intermediate temperature levels. First the higher richness found above 10° is consistent with the ecological rule that we imposed. At higher temperature, we have a higher energy and capacity facilitating coexist. This is consistent with our model expectations. Moreover, we can observe a large spread of the points at high temperature. This is likely because high temperature regions frequently have high aridity. Thus we will next investigate the association of aridity and species richness by fitting a generalized linear model between richness and aridity.
The number of species is negatively related to aridity. The explained deviance of this relationship is D2=0.239. This relationship emerge because we set a lower carrying capacity in sites with higher level of aridity. Hence, the models behaves as expected from the configuration function.
Finally, we fit a multivariate model for explaining species richness by taking into account the two predictors together.
richness_model <- glm(rich ~ poly(temp, 2) + poly(arid, 2), family = "quasipoisson",
data = landscape_t0)
summary(richness_model)$coefficients
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.8953244 0.005433575 532.858090 0.000000e+00
#> poly(temp, 2)1 7.0303172 0.361808498 19.431045 1.063461e-75
#> poly(temp, 2)2 -4.4884180 0.370139268 -12.126295 1.916558e-32
#> poly(arid, 2)1 -4.7978907 0.281971880 -17.015494 8.032991e-60
#> poly(arid, 2)2 0.4552498 0.232571058 1.957465 5.046587e-02
Above, we report the explained deviance of the glm model on richness considering temperature and aridity. The explained deviance of this relationship is D2=0.426. This explained deviance will not be total since there are other effects including dispersal limitations, that are not considered in this statistical summary. It is important to consider that we have now only been analyzing the species richness at one time-step, namely the last one (0 Ma). The emerged patterns are more likely not only to be explained by the environmental conditions of that time but also dependent on historical conditions and multiple processes interactions tackled by gen3sis.
For more details on how to create a landscape and prepare it to use in gen3sis see design_landscape vignette and create_input_landscape vignette. For more details on how to create or modify a configuration object see create_config vignette. It’s now up to you to explore the “virtual” world.
O. Hagen, B. Flück, F. Fopp, J.S. Cabral, F. Hartig, M. Pontarp, T.F. Rangel, L. Pellissier. (2020). GENƎSIS: the GENeral Engine for Eco-Evolutionary SImulationS on the origins of biodiversity. (in review)↩︎