isocat
(Isotope Clustering and Assignment Tools)Contact Information: Caitlin J. Campbell (caitjcampbell@gmail.com)
The isocat
package provides multiple tools for creating
and quantitatively analyzing and summarizing probability-of-origin
surfaces generated from stable isotope analyses of animal tissue. This
vignette will walk users through a brief example for each function
included in this vignette.
isocat
has example isoscape data included for a small
extent of North America.
Example isoscape:
Load example isoscape standard deviation surface:
library(ggplot2, quietly = TRUE)
library(rasterVis, quietly = TRUE)
library(gridExtra, quietly = TRUE)
gglayers <- list(
geom_tile(aes(fill = value)),
coord_equal(),
theme_bw(),
scale_x_continuous(name = "Long", expand = c(0,0)),
scale_y_continuous(name = "Lat", expand = c(0,0))
)
lab1 <- list(
gglayers,
scale_fill_gradient(name = expression(paste(delta, "D (\u2030)")), low = 'grey10', high = 'grey90')
)
gridExtra::grid.arrange(
gplot(myiso) + lab1 + ggtitle("Example Isoscape"),
gplot(myiso_sd) + lab1 + ggtitle("Standard Deviation"),
ncol = 2
)
To create a probability-of-origin map, we first import or generate a dataframe containing:
Individual IDs
Individual-level isotope data, transformed with a transfer function to reflect environmental isotope (e.g., precipitation) values.
Standard deviation of isotope standard measurements, which are associated with machine accuracy.
It is also common to instead use a transfer function to transform
precipitation isoscapes to reflect expected tissue values. Either works;
just make sure the transfer function is fit appropriately to the right
predictor and response variables (and/or is a two-way regression, e.g.,
see smatr::sma
).
n <- 6 # Number of example rasters
set.seed(1)
df <- data.frame(
ID = LETTERS[1:n],
isotopeValue = sample(cellStats(myiso, "min"):cellStats(myiso, "max"), n, replace = TRUE),
SD_indv = rep(5, n),
stringsAsFactors = FALSE
)
kableExtra::kable(df)
ID | isotopeValue | SD_indv |
---|---|---|
A | -67.98399 | 5 |
B | -96.98399 | 5 |
C | -134.98399 | 5 |
D | -101.98399 | 5 |
E | -48.98399 | 5 |
F | -92.98399 | 5 |
The contents of these columns are passed to the function
isotopeAssignmentModel
as vectors, along with the object
names of isoscape and isoscape-SD objects. If parallel processing is
specified, the function creates and deploys clusters using the
doParallel
package. The output is a RasterStack with layers
named corresponding to the individual IDs.
assignmentModels <- isotopeAssignmentModel(
ID = df$ID,
isotopeValue = df$isotopeValue,
SD_indv = df$SD_indv,
precip_raster = myiso,
precip_SD_raster = myiso_sd,
nClusters = FALSE
)
# Plot.
ggProb <- list(
facet_wrap(~ variable),
scale_fill_gradient(name = "Probability\nOf Origin", low = 'darkblue', high = 'yellow')
)
gplot(assignmentModels) + gglayers + ggProb
To compare probability-of-origin surfaces, we apply Schoener’s D
metric. To simply compare two surfaces, we can apply
isocat
’s schoenersD
function, which determines
Schoener’s D-metric of similarity between two surfaces. The D-value
varies between 0 (completely dissimilar surfaces) and 1 (identical
surfaces).
# Calculate Schoener's D-metric of spatial similarity between
# two of the example probability surfaces.
schoenersD(assignmentModels[[1]], assignmentModels[[2]])
#> [1] 0.120559
To compare multiple surfaces to one another, isocat
includes a simmatrixMaker
function to create a similarity
matrix of the surfaces. The output is a symmetric matrix with row and
column names corresponding to the layer names of the surfaces to be
compared. The nClusters
specification, as in the
isotopeAssignmentModel
function, generates a number of
parallel processing clusters equal to the numeric value specified. If
csvSavePath
is included, a .csv file will also be written
to the path specified. For large RasterStacks, this function can be
quite processing-intensive and take some time.
mySimilarityMatrix <- simmatrixMaker(
assignmentModels,
nClusters = FALSE,
csvSavePath = FALSE
)
mySimilarityMatrix
#> A B C D E F
#> A 1.000000000 0.12055895 2.308768e-03 0.075836414 3.576003e-01 0.17339223
#> B 0.120558953 1.00000000 1.259615e-01 0.814850423 1.192683e-02 0.84960437
#> C 0.002308768 0.12596152 1.000000e+00 0.189881401 3.168732e-05 0.08466208
#> D 0.075836414 0.81485042 1.898814e-01 1.000000000 5.518849e-03 0.67133167
#> E 0.357600305 0.01192683 3.168732e-05 0.005518849 1.000000e+00 0.02168286
#> F 0.173392235 0.84960437 8.466208e-02 0.671331674 2.168286e-02 1.00000000
To cluster individuals by similar origins, isocat
relies
on the titular function of the package pvclust
. The input
to this function is the similarity matrix (here, “simmatrix”). Distance
measures and clustering methods are detailed in the pvclust
package, so for more information on methods discussed here, see:
The default distance measure built into this function is
correlational distance, and ‘average’ as a clustering method. The number
of bootstrap replications default to 1000, and nClusters specifies how
many clusters to initiate for parallel processing through
doParallel
. The output of this is an object of class
“pvclust”.
cS <- clusterSimmatrix(
simmatrix = mySimilarityMatrix,
dist_mthd = "correlation",
hclust_mthd = "average",
nBoot = 1000,
nClusters = FALSE,
r = seq(.7,1.4,by=.1)
)
#> Bootstrap (r = 0.67)... Done.
#> Bootstrap (r = 0.83)... Done.
#> Bootstrap (r = 1.0)... Done.
#> Bootstrap (r = 1.17)... Done.
#> Bootstrap (r = 1.33)... Done.
plot(cS)
If bootstrapped clustering is not desired, one could instead apply
the hclust
function from within the base stats
package:
Note that the output of the pvclust
analysis also
contains a nested object of class “hclust”.
To divide individuals into a discrete number of groups with common origin, one might apply one or more options. In this example, we cut the tree relating individual origins at a given height. Users might alternatively cut with respect to bootstrap support of given groups, into a certain number of groups, or into groups optimizing k-means or another metric of within-group similarity for D-values or isotope tissue values.
myheight <- 0.05
plot(as.dendrogram(cS$hclust), horiz = FALSE)
abline(h = myheight, col = "red", lwd = 2, lty = 2)
df$cluster <- dendextend::cutree(cS$hclust, h = myheight)
#> Registered S3 method overwritten by 'dendextend':
#> method from
#> text.pvclust pvclust
kableExtra::kable(df)
ID | isotopeValue | SD_indv | cluster |
---|---|---|---|
A | -67.98399 | 5 | 1 |
B | -96.98399 | 5 | 2 |
C | -134.98399 | 5 | 3 |
D | -101.98399 | 5 | 4 |
E | -48.98399 | 5 | 5 |
F | -92.98399 | 5 | 2 |
For each group of individuals of common origin, create an aggregate
surface of mean within-group probability of origin using the
meanAggregateClusterProbability
function. This function
returns a RasterStack corresponding to each cluster fed into it. If
specified as an integer, nClust
parameter interfaces with
raster::clusterR
to create and apply apply \(n\) multi-core clusters for faster
processing.
To visualize the general regions of a spatial range most associated
with each aggregate surface, isocat
includes a
projectSummaryMaxSurface
function. This function associates
each cell of a spatial extent with the identity of a discrete
RasterLayer the highest relative value at that location. The output is a
ratified raster that, in this context, shows the regions of highest
relative association with the aggregated origins of groups of
individuals. This summary surface is not appropriate for summary
statistics (which we recommend be applied to mean aggregate surfaces for
individuals sharing common origins, if not to the individual origin maps
themselves), but does serve as a visual summary of the relative regions
of probable origins of each group.
Cumulative Sum
Odds-Ratio
Quantile
Quantile-Simulation
The relative strengths and performance of these approaches are explored in Campbell et al.’s “Refining assessment of the geographic origins of animals inferred from stable isotope data” (in prep).
We will use an example probability surface in the following example. Let us also specify a sampling site at point \(i,j\), indicated with a red circle.
set.seed(42)
p <- isotopeAssignmentModel(
ID = "Example",
isotopeValue = sample(-125:-25, 1),
SD_indv = 5,
precip_raster = myiso,
precip_SD_raster = myiso_sd,
nClusters = FALSE
)[[1]]
# Example Point
pt <- data.frame(x = -100, y = 40)
ptDeets <- list(
geom_point(
data = pt,
col = "red", shape = 1, size = 2,
aes(x = x, y = y)
)
)
ex_plot <- gplot(p) + gglayers + ggProb + ptDeets
ex_hist <- data.frame(x = p[]) %>%
ggplot(.) +
geom_density(aes(x = x)) +
scale_y_continuous(expand = c(0,0)) +
scale_x_continuous(name = "Probability Value") +
theme_bw() +
geom_vline(aes(xintercept = extract(p, pt)), linetype = "dashed", col = "red")
gridExtra::grid.arrange(ex_plot, ex_hist, ncol = 2, widths = c(2,1))
cumsum_plot <- gplot(CumSumEx) +
gglayers + ptDeets +
scale_fill_gradient(
name = "Cumulative Sum\nProbability\nOf Origin", low = 'darkblue', high = 'yellow')
cumsum_hist <- data.frame(x = CumSumEx[]) %>%
ggplot(.) +
geom_density(aes(x = x)) +
scale_y_continuous(expand = c(0,0)) +
scale_x_continuous(name = "Cumulative Sum\nProbability Value") +
theme_bw() +
geom_vline(aes(xintercept = extract(CumSumEx, pt)), linetype = "dashed", col = "red")
gridExtra::grid.arrange( cumsum_plot, cumsum_hist, ncol = 2, widths = c(2,1) )
odds_plot <- gplot(OddsRatioEx) +
gglayers + ptDeets +
scale_fill_gradient(
name = "Odds-Ratio\nProbability\nOf Origin", low = 'darkblue', high = 'yellow')
odds_hist <- data.frame(x = OddsRatioEx[]) %>%
ggplot(.) +
geom_density(aes(x = x)) +
scale_y_continuous(expand = c(0,0)) +
scale_x_continuous(name = "Odds Ratio Value") +
theme_bw() +
geom_vline(aes(xintercept = extract(OddsRatioEx, pt)), linetype = "dashed", col = "red")
gridExtra::grid.arrange( odds_plot, odds_hist, ncol = 2, widths = c(2,1) )
quantile_plot <- gplot(QuantileEx) +
gglayers + ptDeets +
scale_fill_gradient(
name = "Quantile\nProbability\nOf Origin", low = 'darkblue', high = 'yellow')
quantile_hist <- data.frame(x = QuantileEx[]) %>%
ggplot(.) +
geom_density(aes(x = x)) +
scale_y_continuous(expand = c(0,0)) +
scale_x_continuous(name = "Quantile Value") +
theme_bw() +
geom_vline(aes(xintercept = extract(QuantileEx, pt)), linetype = "dashed", col = "red")
gridExtra::grid.arrange( quantile_plot, quantile_hist, ncol = 2, widths = c(2,1) )
Simulate a distribution fit to known-origin quantile values:
Create quantile-simulation surface:
QuantSimEx <- makeQuantileSimulationSurface(
probabilitySurface = p,
ValidationQuantiles = q,
rename = FALSE, rescale = TRUE
)
quantsim_plot <- gplot(QuantSimEx) +
gglayers + ptDeets +
scale_fill_gradient(
name = "Quantile-Simulation\nProbability\nOf Origin", low = 'darkblue', high = 'yellow')
quantsim_hist <- data.frame(x = QuantSimEx[]) %>%
ggplot(.) +
geom_density(aes(x = x)) +
scale_y_continuous(expand = c(0,0)) +
scale_x_continuous(name = "Quantile-Simulation Value") +
theme_bw() +
geom_vline(aes(xintercept = extract(QuantSimEx, pt)), linetype = "dashed", col = "red")
gridExtra::grid.arrange( quantsim_plot, quantsim_hist, ncol = 2, widths = c(2,1) )