Here’s a quick example to show how we could use the fishtree
package to conduct some phylogenetic community analyses. First, we load fishtree
and ensure that the other packages that we need are installed.
library(ape)
library(fishtree)
loadNamespace("rfishbase")
#> <environment: namespace:rfishbase>
loadNamespace("picante")
#> <environment: namespace:picante>
loadNamespace("geiger")
#> <environment: namespace:geiger>
Next we’ll start downloading some data from rfishbase
. We’ll be seeing if reef-associated ray-finned fish species are clustered or overdispersed in the Atlantic, Pacific, and Indian Oceans.
# Get reef-associated species from the `species` table
<- rfishbase::species(fields = c("Species", "DemersPelag"))
reef_species <- reef_species[reef_species$DemersPelag == "reef-associated", ]
reef_species
# Get native and endemic species from the Atlantic, Pacific, and Indian Oceans
<- rfishbase::ecosystem(species_list = reef_species$Species)
eco <- eco$Status %in% c("native", "endemic") & eco$EcosystemName %in% c("Atlantic Ocean", "Pacific Ocean", "Indian Ocean")
valid_idx <- eco[valid_idx, c("Species", "EcosystemName")]
eco
# Retrieve the phylogeny of only native reef species across all three oceans.
<- fishtree_phylogeny(species = eco$Species)
phy #> Warning: Requested 5813 but only found 3961 species.
#> * Ablabys binotatus
#> * Ablabys macracanthus
#> * Ablabys taenianotus
#> * Abudefduf conformis
#> * Abudefduf natalensis
#> * ...and 1847 others
We’ll have to clean up the data in a few ways before sending it to picante
for analysis. First, we’ll need to convert our species-by-site data frame into a presence-absence matrix. We’ll use base::table
for this, and use unclass
to convert the table
into a standard matrix
object.
<- unclass(table(eco))
sample_matrix dimnames(sample_matrix)$Species <- gsub(" ", "_", dimnames(sample_matrix)$Species, fixed = TRUE)
Next, we’ll use geiger::name.check
to ensure the tip labels of the phylogeny and the rows of the data matrix match each other.
<- geiger::name.check(phy, sample_matrix)
nc <- sample_matrix[!rownames(sample_matrix) %in% nc$data_not_tree, ] sample_matrix
Finally, we’ll generate the cophenetic matrix based on the phylogeny, and transpose the presence-absence matrix since picante
likes its columns to be species and its rows to be sites.
<- cophenetic(phy)
cophen <- t(sample_matrix) sample_matrix
We’ll run picante::ses.mpd
and picante::ses.mntd
with only 100 iterations, to speed up the analysis. For a real analysis you would likely increase this to 1000, and possibly test other null models if your datasets have e.g., abundance information.
::ses.mpd(sample_matrix, cophen, null.model = "taxa.labels", runs = 99)
picante#> ntaxa mpd.obs mpd.rand.mean mpd.rand.sd mpd.obs.rank
#> Atlantic Ocean 623 238.1682 231.3588 2.5213665 100
#> Indian Ocean 1213 233.1537 231.9486 1.3943958 76
#> Pacific Ocean 1497 231.6829 231.7546 0.9395066 47
#> mpd.obs.z mpd.obs.p runs
#> Atlantic Ocean 2.70068771 1.00 99
#> Indian Ocean 0.86421415 0.76 99
#> Pacific Ocean -0.07634417 0.47 99
::ses.mntd(sample_matrix, cophen, null.model = "taxa.labels", runs = 99)
picante#> ntaxa mntd.obs mntd.rand.mean mntd.rand.sd mntd.obs.rank
#> Atlantic Ocean 623 41.86956 48.22864 1.1668943 1
#> Indian Ocean 1213 34.99129 38.02963 0.7548866 1
#> Pacific Ocean 1497 34.08644 34.99204 0.5509990 7
#> mntd.obs.z mntd.obs.p runs
#> Atlantic Ocean -5.449580 0.01 99
#> Indian Ocean -4.024901 0.01 99
#> Pacific Ocean -1.643559 0.07 99
The Atlantic and Indian Oceans are overdispersed using the MPD metric, and all three oceans are clustered under the MNTD metric. MPD is thought to be more sensitive to patterns closer to the root of the tree, while MNTD is thought to more closely reflect patterns towards the tips of the phylogeny.
We can confirm these patterns visually by running the following code, which will plot the phylogeny and add colored dots (red, green, and blue) to indicate whether a tip is associated with a specific ocean basin.
plot(phy, show.tip.label = FALSE, no.margin = TRUE)
<- get("last_plot.phylo", .PlotPhyloEnv)
obj
<- t(sample_matrix)[phy$tip.label, ]
matr <- obj$xx[1:obj$Ntip]
xx <- obj$yy[1:obj$Ntip]
yy <- c("#1b9e77", "#d95f02", "#7570b3")
cols for (ii in 1:ncol(matr)) {
<- matr[, ii] == 1
present_idx points(xx[present_idx] + ii, yy[present_idx], col = cols[ii], cex = 0.1)
}