Summary
This document describes in detail the methods used to generate a virtual environmental driver with a given temporal autocorrelation, to be used as an input for a population model simulating synthetic pollen curves generated by virtual taxa with different life-traits (life-span and fecundity) and environmental niche features (niche position and breadth). We also describe how we generated a virtual sediment accumulation rate to aggregate the results of the population model to mimic taphonomic processes producing real pollen curves, and how we resampled virtual pollen data at different depth intervals. Finally, we present the code used to generate the 16 virtual taxa used for the analyses described in the paper.
IMPORTANT: An Rmarkdown version of this document can be found at: https://github.com/BlasBenito/EcologicalMemory.
#Generating a virtual environmental driver#
###Rationale###
To simulate virtual pollen curves with the population model described in section 2 a virtual driver representing changes in environmental conditions is required. This section explains how to generate such a driver as a time series with a given temporal autocorrelation simulating the temporal structure generally shown by observed environmental drivers.
###Generating a random walk###
The following steps are required to generate a random walk in R:
Generate a vector time with time values.
Re-sample with replacement the set of numbers (-1, 0, 1) as many times as values are in the time vector, to produce the vector moves (as in moves of a random walk).
Compute the cumulative sum of moves, which generates the random walk.
#sets a state for the generator of pseudo-random numbers
set.seed(1)
#defines the variable "time"
<- 1:10000
time #samples (-1, 0, 1) with replacement
<- sample(x=c(-1, 0, 1), size=length(time), replace=TRUE)
moves #computes the cumulative sum of moves
<- cumsum(moves) random.walk
Every time this code is executed with a different number in set.seed(), it generates a different random walk with a different temporal autocorrelation, but still, this simple method is not useful to generate a time series with a given temporal autocorrelation.
###Applying a convolution filter to generate a given temporal autocorrelation###
Applying a convolution filter to a time series allows to generate dependence among sets of consecutive cases. Below we show an example of how it works on a random sequence a composed by five numbers in the range [0, 1]. The operations to compute the filtered sequence are shown in Table 1.
#setting a fixed seed for the generator of pseudo-random numbers
set.seed(1)
#generating 5 random numbers in the range [0, 1]
<- runif(5)
a
#applying a convolution filter of length 2 and value 1
<- filter(a, filter=c(1,1), method="convolution", circular=TRUE) b
row | a | b | operation |
---|---|---|---|
1 | 0.27 | 0.64 | b1 = a1 x f2 + a2 x f1 |
2 | 0.37 | 0.94 | b2 = a2 x f2 + a3 x f1 |
3 | 0.57 | 1.48 | b3 = a3 x f2 + a4 x f1 |
4 | 0.91 | 1.11 | b4 = a4 x f2 + a5 x f1 |
5 | 0.20 | 0.47 | b5 = a5 x f2 + a6 x f1 |
The operation column in Table 1 shows how the convolution filter generates a dependence between values located within the length of the filter. This positional dependence creates a temporal autocorrelation pattern with a significant length equal to the length of the filter. The following piece of code demonstrates this by generating two versions of the same moves vector used before, one with a length of the significant temporal autocorrelation equal to 10 (defined in the same units of the time vector), and another with a length of 100. Results are shown in Figure 3.
.10 <- filter(moves, filter=rep(1, 10), circular=TRUE)
moves.100 <- filter(moves, filter=rep(1, 100), circular=TRUE) moves
A major limitation is that this method does not work well when the required length of the temporal autocorrelation is a large fraction of the overall length of the time series. The code below and Figure 4 show an example with a filter of length 5000 (note that the time variable has 10000 elements).
.5000 <- filter(moves, filter=rep(1, 5000), circular=TRUE) moves
All the ideas presented above are implemented in the functions named simulateDriver() and simulateDriverS, with these main arguments:
Figure 5 shows a driver generated with simulateDriverS with annual resolution in the range [0, 100], over a period of 10000 years, with a temporal autocorrelation length of 600 years.
<- simulateDriverS(
driver random.seeds=c(60, 120),
time=1:10000,
autocorrelation.lengths = 600,
output.min=c(0, 0),
output.max=c(100, 100),
driver.names=c("A", "B"),
filename=NULL)
#Simulating pollen curves from virtual taxa#
###Rationale###
The ability of plant populations to respond more or less quickly to environmental change is mediated by a set of species’ traits, such as niche optimum and breadth, growth rate, sexual maturity age, effective fecundity, and life span. Even though we have values for these traits for many plant species, pollen types are often clusters of species of the same genus or family rather than single species, making the assignment of trait values uncertain. For the purpose of this paper it is more practical to work with virtual pollen curves generated by virtual taxa with known relationships with the environment and traits. These virtual taxa are the result of a population model with different parametrizations.
###Assumptions###
The model follows these assumptions:
###Parameters###
We have designed a simple individual-based population model which input parameters are:
In order to explain the model dynamics in the most simplified way, the parameters in Table 1 are considered.
Parameter | Species 1 | |
---|---|---|
1 | 1 | 50 |
2 | 2 | 20 |
3 | 3 | 2 |
4 | 4 | 0.2 |
5 | 5 | 0 |
6 | 6 | 100 |
7 | 7 | 10000 |
8 | 8 | 1 |
10 | 10 | 50 |
11 | 11 | 10 |
###Ecological niche and environmental suitability###
The model assumes that normal distributions can be used as simple mathematical representations of ecological niches. Considering a virtual species and an environmental predictor, the abundance of the species along the range of the predictor values can be defined by a normal distribution with a mean \(μ\) (niche optimum or niche position), and standard deviation \(σ\) (niche breadth or tolerance).
The equation to compute the density of a normal distribution has the form:
Equation 1: \[f(x) = 1/(√(2 π) σ) e^{-((x - μ)^2/(2 σ^2))}\]
Where:
The following code shows a simple example on how dnorm() uses Equation 1 to compute the density of a normal function over a data range from a mean and a standard deviation:
<- dnorm(x=0:100, mean=50, sd=10) niche.A
We use the code above and a computation on the density of the driver to plot the ecological niche of the virtual taxa against the availability of driver values (Figure 6).
Environmental suitability for the given species over the study period is computed as follows:
A burn-in period with a length of ten generations of the virtual taxa is added to the environmental suitability computed from the species niche and the driver/drivers. The added segment starts at maximum environmental suitability, stays there for five generations, and decreases linearly for another five generations until meeting the first suitability value of the actual simulation time. The whole burn-in segment has a small amount of white noise added (Figure 7). The burn-in period lets the population model to warm-up and go beyond starting conditions before simulation time starts.
## Warning: Use of `plot.df4$Time` is discouraged. Use `Time` instead.
###Biomass growth###
Individuals age one year on each simulation step, and their biomass at any given age is defined by the following equation of logistic growth (Equation 2). Figure 8 shows different growth curves for different growth rates for a virtual taxon with a maximum age of 400 years.
Equation 2: \[f(x) = \frac{B}{1 + B} \times e^{(- \alpha \times t)}\]
Where:
###Population dynamics###
The model starts with a population of 100 individuals with random ages, in the range [1, maximum age], taken from a uniform distribution (all ages are equiprobable). For each environmental suitability value, including the burn-in period, the model performs the following operations:
The model returns a table with climatic suitability, pollen production, and population size (reproductive individuals only) per simulation year. Figure 10 shows the results of the population model when applied to the example virtual species.
Equation 3: \[P_{m} = 1 - sqrt(a/A)\]
Where:
Equation 4: \[P_{t} = \sum x_{it} \times max(S_{t}, B)\]
Where:
The code below shows the core of the simulatePopulation function. It is slightly simplified to improve readability, and only pollen counts are written as output. Note that age of individuals is represented as a proportion of the maximum age to facilitate operations throughout the code.
#parameters (1st line in dataframe "parameters")
<- parameters[1, "maximum.age"]
maximum.age <- parameters[1, "reproductive.age"]
reproductive.age <- parameters[1, "growth.rate"]
growth.rate <- parameters[1, "carrying.capacity"]
carrying.capacity <- parameters[1, "fecundity"]
fecundity
#reproductive age to proportion
<- reproductive.age / maximum.age
reproductive.age
#years scaled taking maximum.age as reference
<- 1/maximum.age
scaled.year
#vector to store pollen counts
<- vector()
pollen.count
#starting population
<- sample(seq(0, 1, by=scaled.year),
population 100,
replace=TRUE)
#iterating through suitability (once per year)
#------------------------------------
for(suitability.i in suitability){
#AGING -----------------------------------------------
<- population + scaled.year
population
#SENESCENCE ------------------------------------------
#1 is the maximum age of ages expressed as proportions
<- population[population < 1]
population
#LOCAL EXTINCTION AND RECOLONIZATION -----------------
if (length(population) == 0){
#local extinction, replaces population with a seedbank
<- rep(0, floor(100 * suitability.i))
population
#adds 0 to pollen.count
<- c(pollen.count, 0)
pollen.count
#jumps to next iteration
next
}
#PLANT GROWTH ---------------------------------------
#biomass of every individual
<- maximum.biomass /
biomass 1 + maximum.biomass *
(exp(- (growth.rate * suitability.i) *
* maximum.age)
(population
)
)
#SELF-THINNING --------------------------------------
#carrying capacity reached
while(sum(biomass) > carrying.capacity){
#removes a random individual based on risk curve
<- sample(
individual.to.remove x = length(population),
size = 1,
replace = TRUE,
prob = 1 - sqrt(population) #risk curve
)
#removing individuals from population and biomass
<- population[-individual.to.remove]
population <- biomass[-individual.to.remove]
biomass
#end of while
}
#REPRODUCTION --------------------------------------
#identifyies adult individuals
<- population > reproductive.age
adults
#seeds (vector of 0s)
#fractional biomass of adults * fecundity * suitability
<- rep(0, floor(sum((biomass[adults]/maximum.biomass) *
seeds * suitability.i))
fecundity)
#adding seeds to the population
<- c(population, seeds)
population
#POLLEN OUTPUT -------------------------------------
#biomass of adults multiplied by suitability
<- c(pollen.count,
pollen.count sum(biomass[adults]) * suitability.i)
#end of loop through suitability values }
The code below executes the simulation, and plots the outcome using the function plotSimulation.
#simulate population based on parameters
<- simulatePopulation(parameters=parameters[1, ],
simulation drivers=driver)
#plotting simulation output
plotSimulation(simulation.output=simulation,
burnin=FALSE,
panels=c("Driver A",
"Suitability",
"Population",
"Pollen"),
plot.title="",
text.size=12,
line.size=0.4)
The simulation outcomes can vary with the traits of the virtual species. Table 2 shows the parameters of two new taxa named Species 2 and Species 3. These species have a higher niche breadth than Species 1, and Species 3 has a pollen productivity depending more on biomass than suitability (parameter pollen.control higher than zero). The comparison of both simulations (Figure 11) along with Species 1 shows that different traits generate different pollen curves in our simulation.
Parameter | Species 1 | Species 2 | Species 3 |
---|---|---|---|
maximum.age | 50 | 50 | 50 |
reproductive.age | 20 | 20 | 20 |
fecundity | 2 | 4 | 6 |
growth.rate | 0.2 | 0.3 | 0.4 |
pollen.control | 0.0 | 0.0 | 0.5 |
maximum.biomass | 100 | 100 | 100 |
carrying.capacity | 10000 | 10000 | 10000 |
driver.A.weight | 1 | 1 | 1 |
niche.A.mean | 50 | 50 | 50 |
niche.A.sd | 10 | 15 | 20 |
#simulating species 2 and 3 of the dataframe parameters
.2 <- simulatePopulation(parameters=parameters,
simulationspecies=c(2,3),
drivers=driver)
#adding the results to the previous simulation
<- c(simulation, simulation.2)
simulation rm(simulation.2)
#plotting the comparison for the time interval between 4000 and 5000.-
compareSimulations(simulation.output=simulation,
species = "all",
columns = c("Suitability", "Pollen"),
time.zoom = c(5600, 6400))
###Testing the model limits###
We searched for the minimum values of the parameters required to keep a simulated population viable under fully suitable conditions. The taxa Test 1 and Test 2 shown below
Parameter | Test 1 | Test 2 |
---|---|---|
maximum.age | 4 | 3 |
reproductive.age | 1 | 1 |
fecundity | 0.55 | 0.50 |
growth.rate | 2 | 2 |
pollen.control | 0 | 0 |
maximum.biomass | 1 | 1 |
carrying.capacity | 30 | 30 |
driver.A.weight | 0.5 | 0.5 |
driver.B.weight | 0.5 | 0.5 |
niche.A.mean | 50 | 50 |
niche.A.sd | 10 | 10 |
niche.B.mean | 50 | 50 |
niche.B.sd | 10 | 10 |
.1 <- simulatePopulation(
simulation.testparameters=parameters.test,
driver.A=jitter(rep(50, 500), amount=4)
)
The test driver used had an average of 50 (optimum values according the environmental niche of both species) for 500 years, with random noise added through the jitter function. The model results (column Pollen only) for both virtual taxa are shown below.
compareSimulations(simulation.output=simulation.test.1,
columns="Pollen",
time.zoom = c(0, 200))
The outputs of the test taxa show how a minimal change in the parameters can lead to fully unstable results when the considered taxa are short lived. Considering such a result, values for life-traits (maximum.age, reproductive.age, fecundity, growth.rate, and maximum.biomass) of taxon Test 1 should be taken as safe lower limits for these traits.
A similar situation can happen with long lived species when the age of sexual maturity is close to the maximum age. The table below shows three new test species with long life-spans and increasing maturity ages. The driver is again centered in 50, with added white noise, and 2000 years length.
Parameter | Test 3 | Test 4 | Test 5 |
---|---|---|---|
maximum.age | 1000 | 1000 | 1000 |
reproductive.age | 100 | 500 | 900 |
fecundity | 0.5 | 0.5 | 0.5 |
growth.rate | 0.05 | 0.05 | 0.05 |
pollen.control | 0 | 0 | 0 |
maximum.biomass | 100 | 100 | 100 |
carrying.capacity | 10000 | 10000 | 10000 |
driver.A.weight | 0.5 | 0.5 | 0.5 |
driver.B.weight | 0.5 | 0.5 | 0.5 |
niche.A.mean | 50 | 50 | 50 |
niche.A.sd | 10 | 10 | 10 |
niche.B.mean | 50 | 50 | 50 |
niche.B.sd | 10 | 10 | 10 |
.2 <- simulatePopulation(
simulation.testparameters=parameters.test,
species=c(3:5),
driver.A=jitter(rep(50, 2000), amount=4)
)
compareSimulations(simulation.output=simulation.test.2,
columns="Pollen",
time.zoom = c(0, 800))
The figure shows how Test 3 yields a stable pollen productivity across time, while Test 4 and Test 5 show, respectively, a very low productivity due to scarcity of adults, a total inability to sustain stable populations. Considering these results, it is important to keep a careful balance between the parameters maximum.age and reproductive.age to obtain viable virtual populations.
#Simulating a virtual accumulation rate#
Sediments containing pollen grains accumulate at varying rates, generally measured in years per centimeter (y/cm). Accumulation rates found in real datasets are broadly between 10 and 70 y/cm, with a paulatine increase towards the present time. To simulate such an effect and aggregate the annual data produced by the simulation in a realistic manner we have written a function named simulateAccumulationRate that takes a random seed, a time-span, and a range of possible accumulation rate values, and generates a virtual accumulation rate curve. It does so by generating a random walk first, smoothing it through the application of a GAM model, and scaling it between given minimum and maximum accumulation rates (see Figure 12).
<- simulateAccumulationRate(seed=140,
accumulation.rate time=1:10000,
output.min=1,
output.max=50)
The output is a dataframe with three columns: time, accumulation.rate, and grouping (see Table 4).
time | accumulation.rate | grouping |
---|---|---|
1 | 30 | 1 |
2 | 30 | 1 |
3 | 30 | 1 |
4 | 30 | 1 |
5 | 30 | 1 |
6 | 29 | 1 |
7 | 29 | 1 |
8 | 29 | 1 |
9 | 29 | 1 |
10 | 29 | 1 |
11 | 29 | 1 |
12 | 29 | 1 |
13 | 29 | 1 |
14 | 29 | 1 |
15 | 29 | 1 |
16 | 29 | 1 |
17 | 29 | 1 |
18 | 29 | 1 |
19 | 29 | 1 |
20 | 29 | 1 |
Cases of the simulation data belonging to the same group according to the grouping column are aggregated together to simulate a centimeter of sedimented data. The data are aggregated by the function aggregateSimulation, that can additionally sample the resulting data at given depth intervals expressed in centimeters between consecutive samples (2, 6, and 10 cm in the code below).
<- aggregateSimulation(
simulation.aggregated simulation.output=simulation,
accumulation.rate=accumulation.rate,
sampling.intervals=c(2, 6, 10)
)
The function returns a matrix-like list with as many rows as simulations are available in simulation.output, a column containing the data of the original simulations, a column with the data aggregated every centimeter, and the sampling intervals requested by the user. The data are accessed individually by list subsetting, as shown below (see Figure 13), to allow easy analysis and visualization.
#Sampling virtual pollen curves at different depth intervals#
Applying a virtual accumulation rate to the data generated by the population model at given depth intervals simulates to a certain extent how pollen deposition and sampling work in reality, and the outcome of that is data-points separated by regular depth intervals, but not regular time intervals. Figure 14 shows that time intervals between consecutive samples produced by aggregateSimulation are not regular. However, analyzing ecological memory requires to organize the input data in regular time lags, and to do that the data need to have regular time intervals between consecutive cases.
Irregular time series can be interpolated into regular time series by using the loess function. This function fits a polynomial surface representing the relationship between two (or more) variables. The smoothness of this polynomial surface is modulated by the span parameter, and finding the right value for this parameter is critical to obtain an interpolation result as close as possible to the real data. The following code is able to find the value of span that maximizes the correlation between input and interpolated data for any given time series.
#getting example data sampled at 2cm intervals
= simulation.aggregated[[1, 3]]
simulated.data
#span values to be explored
=seq(20/nrow(simulated.data),
span.values5/nrow(simulated.data),
by=-0.0005)
#plotting the optimization process in real time
x11(height=12, width=24)
#iteration through candidate span values
for(span in span.values){
#function to interpolate the data
= loess(
interpolation.function ~ Time,
Pollen data=simulated.data,
span=span,
control=loess.control(surface="direct"))
#plot model fit
plot(simulated.data$Pollen, type="l", lwd=2)
lines(interpolation.function$fitted, col="red", lwd=2)
#if correlation equals 0.9985 loop stops
if(cor(interpolation.function$fitted,
$Pollen) >= 0.9985){break}
simulated.data
}
#gives time to look at result before closing the plot window
Sys.sleep(5)
The function toRegularTime (usage shown below), uses the code above to interpolate the data produced by aggregateSimulation into a given time interval, expressed in years, returning a list of the same dimensions of the input list.
<- toRegularTime(
simulation.interpolated x=simulation.aggregated,
time.column="Time",
interpolation.interval=10,
columns.to.interpolate=c("Pollen",
"Suitability",
"Driver.A")
)
Figure 15 shows the same data segment shown in Figure 13, but with samples re-interpolated into a regular time grid at 10 years intervals.