rtry
:
Preprocessing Plant Trait Datartry
is an R package to support the application of plant
trait data providing easily applicable functions for the basic steps of
data preprocessing, e.g. data import, data exploration, selection of
columns and rows, excluding trait data according to different
attributes, long- to wide-table transformation, data export, and
geocoding. The rtry
package is designed to support the
preprocessing of data released from the TRY Plant Trait Database
(https://www.try-db.org), but is also applicable for other trait
data.
rtry
There are two sources where users can download the rtry
package and the relevant documentation.
CRAN
The rtry
package is available on the CRAN repository.
This is the recommended option to obtain the latest version of the
package.
GitHub Repository
The TRY R project is an open-source project that can be found on the MPI-BGC-Functional-Biogeography GitHub repository: https://github.com/MPI-BGC-Functional-Biogeography/rtry.
Developers are also welcome to contribute to the package.
R 4.0.5 was used to develop and build the rtry
package,
and this is the minimum version required to use the package.
The latest version of R can be downloaded from CRAN, a network of ftp and web servers around the world that store the code and documentation of R: https://cran.r-project.org/
In case RStudio is used, we also recommend to use the latest version of RStudio when using the package, which can be found at https://posit.co/download/rstudio-desktop/, it is sufficient to use the free and open source version of RStudio Desktop.
rtry
packageThe installation of the rtry
package can be performed
through the RStudio console.
First, install all the dependencies with the command:
install.packages(c("data.table", "dplyr", "tidyr", "jsonlite", "curl"))
Once the installation is completed, the message
“The downloaded source packages are in <path>
” should
be seen.
Next, install the rtry
package with the command:
From CRAN:
install.packages("rtry")
Else, if user downloaded the source package (.tar.gz
)
from the GitHub repository:
install.packages("<path_to_rtry.tar.gz>", repos = NULL, type = "source")
You may ignore the warning message
“Rtools is required to build R packages but is not currently installed
”
if appears.
Once the installation is completed, the rtry
package
needs to be loaded with the command library(rtry)
.
Inside the rtry
package, we use a function naming
convention where each function begins with the prefix rtry_
followed by the description of what the specific function does. The
rtry
package consists of the following functions:
rtry_import
: Import datartry_explore
: Explore datartry_bind_col
: Bind data by columnsrtry_bind_row
: Bind data by rowsrtry_join_left
: Left join for two data framesrtry_join_outer
: Outer join for two data framesrtry_select_col
: Select columnsrtry_select_row
: Select rowsrtry_select_anc
: Select ancillary data in wide-table
formatrtry_exclude
: Exclude datartry_remove_col
: Remove columnsrtry_remove_dup
: Remove duplicates in datartry_trans_wider
: Transform data from long- to
wide-tablertry_export
: Export preprocessed datartry_geocoding
: Perform geocodingrtry_revgeocoding
: Perform reverse geocodingOnce rtry
is installed and loaded, for documentation
type ?
and the function name, e.g.:
?rtry_import
To view the R code underlying the function:
View(rtry_import)
Here we provide a brief example of how to use the rtry
package to import a dataset released from TRY, explore the data and
exclude trait records based on specific criteria.
The rtry_import
function displays the number of columns
and rows of the imported datset and the column headers. Thus it provides
the first step to explore the dataset. TRY released data in a long-table
format: one trait record or ancillary data per row.
In the second step, we explore the dataset for plant species, traits and ancillary data.
Finally, we use the ancillary data on plant maturity
(DataID
413) to exclude traits measured on juvenile plants
or unknown. For this, we use the feature of the TRY data structure to
combine different trait records and ancillary data measured on the same
entity (plant) via the ObservationID
. Then, we double-check
that the data filtered for further analyses contain only the
observations of adult and mature plants.
For a comprehensive introduction and detailed example, see the
vignettes rtry-introduction
and
rtry-workflow-general
.
# Load the rtry package
library(rtry)
# Import the sample dataset from TRY provided within rtry package
<- rtry_import(system.file("testdata", "data_TRY_15160.txt", package = "rtry"))
TRYdata1
# View the imported data
View(TRYdata1)
# Explore the imported data
# Group the input data based on AccSpeciesID, AccSpeciesName, DataID, DataName, TraitID and TraitName, and sort by TraitID
# Note: For TraitID == "NA", meaning that entry is an ancillary data
<- rtry_explore(TRYdata1,
TRYdata1_explore_anc
AccSpeciesID, AccSpeciesName, DataID, DataName,
TraitID, TraitName,sortBy = TraitID)
View(TRYdata1_explore_anc)
# Select the rows where DataID is 413, i.e. the data containing the plant development status
# Explore the unique values of the OrigValueStr within the selected data
<- rtry_select_row(TRYdata1, DataID %in% 413)
tmp_unfiltered <- rtry_explore(tmp_unfiltered,
tmp_unfiltered
DataID, DataName, OriglName, OrigValueStr, OrigUnitStr,
StdValue, Comment,sortBy = OrigValueStr)
View(tmp_unfiltered)
# Exclude (remove) observations of juvenile plants or unknown development state
# Criteria
# 1. DataID equals to 413
# 2. OrigValueStr equals to "juvenile" or "unknown"
<- rtry_exclude(TRYdata1,
TRYdata1_filtered %in% 413) & (OrigValueStr %in% c("juvenile", "unknown")),
(DataID baseOn = ObservationID)
View(TRYdata1_filtered)
# Double-check the filtered data to ensure the excluding worked as expected
# Select the rows where DataID is 413
# Explore the unique values of the OrigValueStr within the selected data
<- rtry_select_row(TRYdata1_filtered, DataID %in% 413)
tmp_filtered <- rtry_explore(tmp_filtered,
tmp_filtered
DataID, DataName, OriglName, OrigValueStr, OrigUnitStr,
StdValue, Comment,sortBy = OrigValueStr)
View(tmp_filtered)
Additional vignettes provide a detailed introduction to
rtry
and example workflows for trait data preprocessing and
for geocoding are available at:
Introduction to rtry
(rtry-introduction)
The general workflow (rtry-workflow-general)
rtry
package to preprocess of the data exported from the
TRY databasertry_
functions from importing and
exploring to binding multiple data, as well as selecting, excluding
specific data and removing duplicates, and finally exporting the
preprocess dataPerform (reverse) geocoding (rtry-workflow-geocoding)
rtry
package to perform geocoding and reverse geocoding on
the TRY datartry_geocoding
and
rtry_revgeocoding
vignette("<name_of_vignette>")
The rtry
package is distributed under the CC
BY 4.0 license, with a remark that the (reverse) geocoding functions
provided within the package used the Nominatim developed with
OpenStreetMap. Although the API and the data provided are free to use
for any purpose, including commercial use, note that they are governed
by the Open
Database License (ODbL).