The package zoolog includes functions and reference data to generate and manipulate log-ratios (also known as log size index (LSI) values) from measurements obtained on zooarchaeological material. Log ratios are used to compare the relative (rather than the absolute) dimensions of animals from archaeological contexts (Meadow 1999). Essentially, the method compares archaeological measurements to a standard, producing a value that indicates how much larger or smaller the archaeological specimen is compared to that standard. zoolog is also able to seamlessly integrate data and references with heterogeneous nomenclature, which is internally managed by a zoolog thesaurus.
The methods included in the package were first developed in the framework of the ERC-Starting Grant 716298 ZooMWest (PI S. Valenzuela-Lamas), and were first used in the paper (Trentacoste, Nieto-Espinet, and Valenzuela-Lamas 2018). They are based on the techniques proposed by G. G. Simpson (1941) and G. Simpson, Roe, and Lewontin (1960), which calculates log size index (LSI) values as: \[ \mbox{LSI} = \log_{10} x - \log_{10} x_{\text{ref}} = \log_{10}(x/x_\text{ref}), \] where \(x\) is the considered measure value and \(x_\text{ref}\) is the corresponding reference value. Observe that LSI is defined using logarithms with base 10.
Several different sets of standard reference values are included in the package. These standards include several published and widely used biometric datasets (e.g. S. J. Davis (1996), Albarella and Payne (2005)) as well as other less known standards. These references, as well as the data example provided with the package, are based on the measures and measure abbreviations defined in Von den Driesch (1976) and S. Davis (1992).
In general, zooarchaeological datasets are composed of skeletal remains representing many different anatomical body parts. In investigation of animal size, the analysis of measurements from a given anatomical element provides the best control for the variables affecting size and shape and, as such, it is the preferable option. Unfortunately, this approach is not always viable due to low sample sizes in some archaeological assemblages. This problem can be mitigated by calculating the LSI values for measurements with respect to a reference, which provides a means of aggregating biometric information from different body parts. The resulting log ratios can be compared and statistically analysed under reasonable conditions (Albarella 2002). However, length, width, and depth measurements of the anatomical elements still should not be directly aggregated for statistical analysis.
The package includes a zoolog thesaurus to facilitate its usage by research teams across the globe, and working in different languages and with different recording traditions. The thesaurus enables the zoolog package to recognises many different names for taxa and skeletal elements (e.g. “Bos taurus”, “cattle”, “BT”, “bovino”, “bota”). Consequently, there is no need to use a particular, standardised recording code for the names of different taxa or elements.
The package also includes a zoolog taxonomy to facilitate the management of cases recorded as identified at different taxonomic ranks (Species, Genus, Tribe) and their match with the corresponding references.
We are particularly grateful to Sabine Deschler-Erb and Barbara Stopp, from the University of Basel (Switzerland) for making the reference values of several specimens available through the ICAZ Roman Period Working Group, which have been included here with their permission. We also thank Francesca Slim and Dimitris Filioglou from the University of Groningen, Claudia Minniti from University of Salento, Sierra Harding and Nimrod Marom from the University of Haifa, Carly Ameen and Helene Benkert from the University of Exeter, and Mikolaj Lisowski from the University of York for providing additional reference sets. Allowen Evin (CNRS-ISEM Montpellier) saw potential pitfalls in the use of Davis’ references for sheep, which have been now solved.
The thesaurus has benefited from the contributions from Moussab Albesso, Canan Çakirlar, Jwana Chahoud, Jacopo De Grossi Mazzorin, Sabine Deschler-Erb, Dimitrios Filioglou, Armelle Gardeisen, Sierra Harding, Pilar Iborra, Michael MacKinnon, Nimrod Marom, Claudia Minniti, Francesca Slim, Barbara Stopp, and Emmanuelle Vila.
We are grateful to all of them for their contributions, comments, and help. In addition, users are encouraged to contribute to the thesaurus and other references so that zoolog can be expanded and adapted to any database.
You can install the released version of zoolog from CRAN with:
install.packages("zoolog")
And the development version from GitHub with:
install.packages("devtools")
::install_github("josempozo/zoolog@HEAD", build_vignettes = TRUE) devtools
The package zoolog includes several osteometrical references. Currently, the references include reference values for the main domesticates and their agriotypes (Bos, Ovis, Capra, Sus), red deer (Cervus elaphus), Persian fallow deer (Dama mesopotamica), mountain gazelle (Gazella gazella), donkey (Equus asinus), horse (Equus caballus), European rabbit (Oryctolagus cuniculus), and grey wolf (Canis lupus). These are drawn from a variety of publications and resources (see below). In addition, the user can consider other references, or the provided references can be extended and updated integrating newer research data. Submission of extended/improved references is encouraged. Please, contact the maintainer through the provided email address to make the new reference fully accessible within the package.
These references, as well as the data example provided with the package, are based on the measurements and measure abbreviations defined in Von den Driesch (1976) and S. Davis (1992). Please, note that Davis’ standard SD, equivalent to Von den Driesch’s DD, has been denoted as Davis.SD in order to resolve its incompatibility with Von den Driesch’s SD. This affects only Davis’ sheep reference and was introduced in Release 1.0.0.
The predefined reference sets included in zoolog are
provided in the named list reference
, currently comprising
the following 4 sets:
library(zoolog)
str(reference, max.level = 1)
#> List of 4
#> $ NietoDavisAlbarella:'data.frame': 133 obs. of 4 variables:
#> $ Basel :'data.frame': 176 obs. of 4 variables:
#> $ Combi :'data.frame': 635 obs. of 4 variables:
#> $ Groningen :'data.frame': 246 obs. of 4 variables:
The reference set reference$Combi
is the default
reference for computing the log ratios,
since it includes the most comprehensive reference for each species.
The package also includes a referencesDatabase
collecting the taxon-specific reference standards from all the
considered resources. Each reference set is composed of a different
combination of taxon-specific standards selected from this
referencesDatabase
. The selection is defined by the data
frame referenceSets
:
Bos taurus | Bos primigenius | Ovis aries | Ovis orientalis | Capra hircus | Capra aegagrus | Sus domesticus | Sus scrofa | Cervus elaphus | Dama mesopotamica | Gazella gazella | Equus asinus | Equus caballus | Oryctolagus cuniculus | Canis lupus | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NietoDavisAlbarella | Nieto | Davis | Albarella | ||||||||||||
Basel | Basel | Basel | Basel | Basel | Basel | ||||||||||
Combi | Nieto | Clutton | Clutton | Basel | Basel | Haifa | Haifa | Haifa | Johnstone | Nottingham | Russell | ||||
Groningen | Degerbol | Uerpmann | Uerpmann | Hongo |
A detailed description of the reference data, including its structure, properties, and considered resources can be found in the ReferencesDatabase help page.
A thesaurus set is defined in order to make the package compatible with the different recording conventions and languages used by authors of zooarchaeological datasets. This enables the function LogRatios to match values in the user’s dataset with the corresponding ones in the reference standard, regardless of differences in nomenclature or naming conventions, as long as both terms are included in the relevant thesaurus. The thesaurus also allows the user to standardize the nomenclature of the dataset if desired.
The user can also use other thesaurus sets or modify the provided one. In this latter case, we encourage the user to contact the maintainer at the provided email address so that the additions can be incorporated into the new versions of the package.
Currently, the zoolog thesaurus set includes four thesauri:
A taxonomy for the most typical zooarchaeological taxa has been introduced in the zoolog major release 1.0.0. The taxonomic hierarchy is structured up to the family level:
Species | Genus | Tribe | Subfamily | Family |
---|---|---|---|---|
Bos taurus | Bos | Bovini | Bovinae | Bovidae |
Bos primigenius | Bos | Bovini | Bovinae | Bovidae |
Ovis aries | Ovis | Caprini | Caprinae | Bovidae |
Ovis orientalis | Ovis | Caprini | Caprinae | Bovidae |
Capra hircus | Capra | Caprini | Caprinae | Bovidae |
Capra aegagrus | Capra | Caprini | Caprinae | Bovidae |
Gazella gazella | Gazella | Antilopini | Antilopinae | Bovidae |
Sus domesticus | Sus | Suini | Suinae | Suidae |
Sus scrofa | Sus | Suini | Suinae | Suidae |
Cervus elaphus | Cervus | Cervini | Cervinae | Cervidae |
Dama mesopotamica | Dama | Cervini | Cervinae | Cervidae |
Equus asinus | Equus | Equini | Equinae | Equidae |
Equus caballus | Equus | Equini | Equinae | Equidae |
Oryctolagus cuniculus | Oryctolagus | Leporidae | ||
Canis familiaris | Canis | Canini | Caninae | Canidae |
Canis lupus | Canis | Canini | Caninae | Canidae |
This taxonomy is intended to facilitate the management of cases recorded as identified at different taxonomic ranks and their match with the corresponding references.
The taxonomy is integrated in the function LogRatios, enabling it to automatically
match different species in data and reference that are under the same
genus. For instance, data of Bos taurus can be matched with
reference of Bos primigenius, since both are Bos. It
also enables the function LogRatios
to detect when a case
taxon has been only partially identified and recorded at a higher
taxonomic rank, such as tribe or family, and to suggest the user the set
of possible reference species.
Besides, it is complemented with a series of functions enabling the user to query for the subtaxonomy or the set of species under a queried taxon at any taxonomic rank.
The full list of functions is available under the zoolog help page. We list them here sorted by their prominence for a typical user, and grouped by functionality:
reference$Combi
is
used. The function includes the option ‘joinCategories’ allowing several
taxa (typically Ovis, Capra, and unknown
Ovis/Capra) to be considered together with the same reference
taxon.
reference$NietoDavisAlbarella
, log ratios for goats will
not be calculated unless ‘joinCategories’ is set to indicate that the
Ovis aries standard should also be applied to goats.
CondenseLogs
employs the following
by-default group and prioritization introduced in Trentacoste, Nieto-Espinet, and Valenzuela-Lamas
(2018): Length considers in order of priority
GL, GLl, GLm, and HTC.
Width considers in order of priority BT, Bd,
Bp, SD, Bfd, and Bfp. Depth
considers in order of priority Dd, DD, BG,
and Dp. This order maximises the robustness and reliability of
the measurements, as priority is given to the most abundant, more
replicable, and less age dependent measurements. But users can set their
own features with any group of measures and priorities. The method
“average” extracts the mean per group, ignoring the non-available log
ratios.
InCategory
checks for
equality assuming the equivalencies defined in the given thesaurus. It
is intended for the user to easily select a subset of data without
having to standardize the analysed dataset.
This includes two functions enabling the user to map data with heterogeneous nomenclature into a standard one as defined in a thesaurus:
StandardizeNomenclature
standardizes a character vector
according to a given thesaurus.StandardizeDataSet
standardizes column names and values
of a data frame according to a thesaurus set.This includes three functions enabling the user to query for some information on the subtaxonomy under a queried taxon at any taxonomic rank:
Subtaxonomy
provides the subtaxonomy dataframe
collecting the species (rows) included in the queried taxon, and the
taxonomic ranks (columns) up to its level.SubtaxonomySet
provides the set (character vector
without repetions) of taxa, at any taxonomic rank, under the queried
taxon.GetSpeciesIn
provides the set of species included in
the queried taxon.By default, the subtaxonomy information is extracted from the zoolog taxonomy.
referencesDataSet
or in any other provided by the user.
ReadThesaurus
and WriteThesaurus
) and a
thesaurus set (ReadThesaurusSet
and
WriteThesaurusSet
).
This includes functions to modify and check thesauri:
NewThesaurus
generates an empty thesarus.AddToThesaurus
adds new names and categories to an
existing thesaurus.RemoveRepeatedNames
cleans a thesarus from any repeated
names on any category.ThesaurusAmbiguity
checks if there are names included
in more than one category in a thesaurus.The following examples are designed to be read and run sequentially. They represent a possible pipeline, meaningful for the processing and analysis of a dataset. Only occasionally, a small diversion is included to illustrate some alternatives.
This example reads a dataset from a file in csv format and computes the log-ratios. Then, the cases with no available log-ratios are removed. Finally, the resulting dataset is saved in a file in csv format.
The first step is to set the local path to the folder where you have the dataset to be analysed (this is typically a comma-separated value (csv) file). Here the example dataset from Valenzuela-Lamas (2008) included in the package is used:
library(zoolog)
<- system.file("extdata", "dataValenzuelaLamas2008.csv.gz",
dataFile package = "zoolog")
= read.csv2(dataFile,
data quote = "\"", header = TRUE, na.strings = "",
encoding = "UTF-8")
::kable(head(data)[, -c(6:20,32:64)]) knitr
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 4918 | 10364 | bota | 1 fal | 54.0 | 31.3 | 30.6 | 28.1 | 26.3 | 27.5 | 20.0 | ||||
ALP | 4919 | 10364 | bota | 1 fal | 54.5 | 27.9 | 31.8 | 26.0 | 22.8 | 25.3 | 19.5 | ||||
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | |||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | |||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | ||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 |
To enhance the visibility, we have shown only the most relevant columns.
We now calculate the log-ratios using the function
LogRatios
. Only measurements that have an associated
standard will be included in this calculation. The log values will
appear as new columns with the prefix ‘log’ following the original
columns with the raw measurements:
<- LogRatios(data)
dataWithLog #> Warning: Reference for Sus scrofa used for cases of Sus domesticus.
#> Reference for Sus scrofa used for cases of Sus.
#> Reference for Oryctolagus cuniculus used for cases of Oryctolagus.
#> Reference for Canis lupus used for cases of Canis.
#> Reference for Ovis aries used for cases of Ovis.
#> Set useGenusIfUnambiguous to FALSE if this behaviour is not desired.
#> Warning: Data includes some cases recorded as
#> * Equus (which is a Genus)
#> for which the reference for Equus asinus or Equus caballus could be used.
#> * Caprini (which is a Tribe)
#> for which the reference for Ovis aries or Capra hircus could be used.
#> Set joinCategories as appropriate if you want to use any of them.
::kable(head(dataWithLog)[, -c(6:20,32:64)]) knitr
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 4918 | 10364 | bota | 1 fal | 54.0 | 31.3 | 30.6 | 28.1 | 26.3 | 27.5 | 20.0 | |||||||||||||||||||||||||||
ALP | 4919 | 10364 | bota | 1 fal | 54.5 | 27.9 | 31.8 | 26.0 | 22.8 | 25.3 | 19.5 | |||||||||||||||||||||||||||
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | ||||||||||||||||||||||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | ||||||||||||||||||||||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.4016796 | -0.2116296 | -0.1513050 | -0.0678788 | |||||||||||||||||||||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.1740822 | -0.0828273 |
Observe that LogRatios
has output two warnings above.
The first one informs the user that the data includes some cases of
recorded taxa that did not match any species in the reference, but was
matched by genus. In many cases this can be the behaviour expected by
the user. For instance, using the reference of Sus scrofa for
computing the log ratios of cases of Sus domesticus, can be
acceptable if no better reference is available or if both (sub)species
are being compared. The warning also includes cases recorded by genus, a
recording practice that is understandable when working only with one
species in the genus, such as Oryctolagus or Ovis
above. However, the warning also informs the user that this matching by
genus can be suppressed by setting the parameter
useGenusIfUnambiguous = FALSE
. This can be the case if the
user wants to exclude from the analysis the cases with not matching
species. Besides, this warning could make the user realize that the
wrong reference was set by mistake.
The second warning informs the user that the data includes some cases
recorded with a taxon not specifying a species or genus, but at a higher
taxonomical rank (for instance undecided ovis/capra (equivalent
to tribe Caprini)). It also informs of the cases recorded by
the genus but for which the reference includes more than one species (as
happens with Equus above). In addition, the user is remembered
of the option to use the parameter joinCategories
to
indicate which reference species should be used for them.
These relationships between taxonomical categories is possible thanks to the taxonomy included in the package. The user can include their own taxonomy if desired.
In the following examples, these warnings are suppressed unless some particular message is of interest for them.
If we observe the example dataset more carefully, we can see that the measures recorded for the astragali presents a deviation from the measure definitions in Von den Driesch (1976) and S. Davis (1992). To see this, we can select the cases where the element is an astragalus. The function InCategory allows us to select them with the help of the thesaurus without requiring to know the terms actually used:
<- InCategory(dataWithLog$Os, "astragalus", zoologThesaurus$element)
AScases ::kable(head(dataWithLog[AScases, -c(6:20,32:64)])) knitr
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7503 | ALP | 4410 | 8001 | bota | talus | 58.4 | 31.5 | 31.4 | -0.0927541 | ||||||||||||||||||||||||||||||
7504 | ALP | 1181 | 7011 | bota | talus | 57.7 | 35.7 | 32.8 | -0.0383964 | ||||||||||||||||||||||||||||||
7505 | ALP | 1180 | 7011 | bota | talus | 58.8 | |||||||||||||||||||||||||||||||||
7506 | ALP | 1136 | 7039 | bota | talus | 58.9 | 39.3 | 34.4 | 0.0033279 | ||||||||||||||||||||||||||||||
7507 | ALP | 1678 | 7092 | bota | talus | 59.4 | 39.8 | 34.9 | 0.0088185 | ||||||||||||||||||||||||||||||
7508 | TFC | 617 | 1061 | bota | talus | 56.6 | 34.0 | 32.3 | -0.0595857 |
For the involved taxa, according to the measure definitions,
astragali should have no GL measurement, but GLl.
However, in the example dataset the GLl measurements have been
recorded merged in the GL column. This is a data-entry
simplification that is used by some researchers. It is possible because
GLl is only relevant for the astragalus, while GL is
not applicable to it. Thus, there cannot be any ambiguity between both
measures since they can be identified by the bone element. However,
since the zoolog reference uses the proper measure name for each bone
element (GLl for the astragalus), the reference measure has not
been correctly identified. Consequently, the log ratio logGL
has NA
values and the column logGLl does not
exists.
The same effect happens for the measure GLpe, only relevant for the phalanges.
The optional parameter mergedMeasures facilitates the processing of this type of simplified datasets. For the example data, we can use
<- list(c("GL", "GLl", "GLpe"))
GLVariants <- LogRatios(data, mergedMeasures = GLVariants)
dataWithLog ::kable(head(dataWithLog[AScases, -c(6:20,32:64)])) knitr
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7503 | ALP | 4410 | 8001 | bota | talus | 58.4 | 31.5 | 31.4 | -0.0259788 | -0.0927541 | |||||||||||||||||||||||||||||
7504 | ALP | 1181 | 7011 | bota | talus | 57.7 | 35.7 | 32.8 | -0.0312159 | -0.0383964 | |||||||||||||||||||||||||||||
7505 | ALP | 1180 | 7011 | bota | talus | 58.8 | -0.0230144 | ||||||||||||||||||||||||||||||||
7506 | ALP | 1136 | 7039 | bota | talus | 58.9 | 39.3 | 34.4 | -0.0222764 | 0.0033279 | |||||||||||||||||||||||||||||
7507 | ALP | 1678 | 7092 | bota | talus | 59.4 | 39.8 | 34.9 | -0.0186052 | 0.0088185 | |||||||||||||||||||||||||||||
7508 | TFC | 617 | 1061 | bota | talus | 56.6 | 34.0 | 32.3 | -0.0395753 | -0.0595857 |
This option allows us to automatically select, for each bone element, the corresponding measure present in the reference. Observe that now the log ratios have been computed and assigned to the column logGL.
We could be interested in obtaining the log ratios of all caprines,
including Ovis aries, Capra hircus, and undetermined
Ovis/Capra, with respect to the reference for Ovis
aries. This can be set using the argument
joinCategories
.
<- list(ovar = SubtaxonomySet("caprine"))
caprineCategory <- LogRatios(data, joinCategories = caprineCategory, mergedMeasures = GLVariants)
dataWithLog ::kable(head(dataWithLog)[, -c(6:20,32:64)]) knitr
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 4918 | 10364 | bota | 1 fal | 54.0 | 31.3 | 30.6 | 28.1 | 26.3 | 27.5 | 20.0 | |||||||||||||||||||||||||||
ALP | 4919 | 10364 | bota | 1 fal | 54.5 | 27.9 | 31.8 | 26.0 | 22.8 | 25.3 | 19.5 | |||||||||||||||||||||||||||
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | |||||||||||||||||||||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | |||||||||||||||||||||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | ||||||||||||||||||||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 |
The category to join can be manually defined, but here we have
conveniently used the function Subtaxonomy
applied to the
tribe Caprini:
SubtaxonomySet("caprine")
#> [1] "Ovis aries" "Ovis orientalis" "Capra hircus" "Capra aegagrus"
#> [5] "Ovis" "Capra" "Caprini"
Note that this option does not remove the distinction in the data between the different species, it just indicates that for these taxa the log ratios must be computed from the same reference (“ovar”).
The cases without log-ratios can be removed to facilitate subsequent analyses:
=RemoveNACases(dataWithLog)
dataWithLogPruned::kable(head(dataWithLogPruned[, -c(6:20,32:64)])) knitr
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | |||||||||||||||||||||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | |||||||||||||||||||||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | ||||||||||||||||||||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | ||||||||||||||||||||||||||||
ALP | 4085 | 10253 | cahi | hum | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | ||||||||||||||||||||||||||||
TFC | 24 | 407 | ceel | mc | 262.7 | 41.3 | 30.8 | 25.0 | 21.2 | 41.1 | 27.1 | -0.0335411 |
You may want to write the resulting file in the working directory (you need to set it first):
write.csv2(dataWithLogPruned, "myDataWithLogValues.csv",
quote=FALSE, row.names=FALSE, na="",
fileEncoding="UTF-8")
After calculating log ratios using the LogRatios
function, many rows in the resultant dataframe (dataWithLog in the
example above) may contain multiple log values, i.e. you will have
several log values associated with a particular archaeological specimen.
When analysing log ratios, it is preferential to avoid
over-representation of bones with a greater number of measurements and
account for each specimen only once. The CondenseLogs
function extracts one length, one width, and one depth value from each
row and places these in new Length, Width, and Depth columns. The
‘priority’ method described in Trentacoste,
Nieto-Espinet, and Valenzuela-Lamas (2018) has been set as
default. In this case, the default option has been used:
<- CondenseLogs(dataWithLogPruned)
dataWithSummary ::kable(head(dataWithSummary)[, -c(6:20,32:64,72:86)]) knitr
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logH | Length | Width | Depth |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | -0.1078605 | -0.0891198 | -0.0726593 | ||||||||
ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | -0.0999207 | -0.1242842 | -0.0762046 | ||||||||
ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | -0.0627731 | 0.0490352 | ||||||||||
ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | -0.0873803 | 0.0340867 | ||||||||||||||
ALP | 4085 | 10253 | cahi | hum | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | -0.0701972 | 0.0603162 | ||||||||||||||
TFC | 24 | 407 | ceel | mc | 262.7 | 41.3 | 30.8 | 25.0 | 21.2 | 41.1 | 27.1 | -0.0335411 | -0.0335411 |
Nevertheless, other options (e.g. average of all width log values for a given specimen) can be chosen if preferred.
The integration of the thesaurus functionality facilitates the use of
datasets with heterogeneous nomenclatures, without further
preprocessing. An extensive catalogue of names for equivalent categories
has been integrated in the provided thesaurus set
zoologThesaurus
. These equivalences are internally and
silently managed without requiring any action from the user. However, it
can be also interesting to explicitly standardize the data to make
figures legible to a wider audience. This is especially useful when
different nomenclature for the same concept is found in the same
dataset, for instance “sheep” and “ovis” for the same taxon or “hum” and
“HU” for the bone element.
If we standardize the studied data, we can see that
zoologThesaurus
will change “ovar” to “Ovis aries”, “hum”
to “humerus”, and “Especie” to “Taxon”, for instance.
<- StandardizeDataSet(dataWithSummary)
dataStandardized ::kable(head(dataStandardized)[, -c(6:20,32:64,72:86)]) knitr
Site | N.inv | UE | Taxon | Element | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLC | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logH | Length | Width | Depth |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALP | 3453 | 10410 | Ovis aries | anterior first phalanx | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | -0.1078605 | -0.0891198 | -0.0726593 | ||||||||
ALP | 3455 | 10410 | Ovis aries | anterior first phalanx | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | -0.0999207 | -0.1242842 | -0.0762046 | ||||||||
ALP | 4245 | 7036 | Capra hircus | humerus | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | -0.0627731 | 0.0490352 | ||||||||||
ALP | 4674 | 10227 | Capra hircus | humerus | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | -0.0873803 | 0.0340867 | ||||||||||||||
ALP | 4085 | 10253 | Capra hircus | humerus | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | -0.0701972 | 0.0603162 | ||||||||||||||
TFC | 24 | 407 | Cervus elaphus | metacarpus | 262.7 | 41.3 | 30.8 | 25.0 | 21.2 | 41.1 | 27.1 | -0.0335411 | -0.0335411 |
We may be interested in selecting all caprine elements. This can be
done even without standardizing the data using the function
InCategory
:
<- subset(dataWithSummary, InCategory(Especie,
dataOC SubtaxonomySet("caprine"),
$taxon))
zoologThesaurus::kable(head(dataOC)[, -c(6:20,32:64)]) knitr
Site | N.inv | UE | Especie | Os | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLc | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH | Length | Width | Depth | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | ALP | 3453 | 10410 | ovar | 1fal ant | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | -0.1078605 | -0.0891198 | -0.0726593 | |||||||||||||||||||||||
2 | ALP | 3455 | 10410 | ovar | 1fal ant | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | -0.0999207 | -0.1242842 | -0.0762046 | |||||||||||||||||||||||
3 | ALP | 4245 | 7036 | cahi | hum | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | -0.0627731 | 0.0490352 | |||||||||||||||||||||||||
4 | ALP | 4674 | 10227 | cahi | hum | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | -0.0873803 | 0.0340867 | |||||||||||||||||||||||||||||
5 | ALP | 4085 | 10253 | cahi | hum | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | -0.0701972 | 0.0603162 | |||||||||||||||||||||||||||||
7 | ALP | 3524 | 10122 | ovar | 1fal ant | 27.7 | 9.2 | 11.5 | 7.5 | 9.4 | 8.1 | -0.0983500 | -0.1117591 | -0.1018666 | -0.1148333 | -0.1348773 | -0.0983500 | -0.1348773 | -0.1018666 |
Observe that no standardization is performed in the output subset. To
standardize the subset data, StandardizeDataSet
can be
applied either before or after the subsetting.
<- StandardizeDataSet(dataOC)
dataOCStandardized ::kable(head(dataOCStandardized)[, -c(6:20,32:64)]) knitr
Site | N.inv | UE | Taxon | Element | GL | Bp | Dp | SD | DD | Bd | Dd | BT | GLC | BFd | Dl | logGL | logBp | logDp | logSD | logBd | logDd | logBT | logGLc | logBFd | logDl | logGB | logSLC | logGLP | logBG | logLG | logDPA | logBPC | logLA | logLAR | logSH | logSB | logL | logH | Length | Width | Depth | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | ALP | 3453 | 10410 | Ovis aries | anterior first phalanx | 27.1 | 9.9 | 12.3 | 17.9 | 9.0 | 9.0 | -0.1078605 | -0.0799118 | -0.0726593 | 0.2629585 | -0.0891198 | -0.1078605 | -0.0891198 | -0.0726593 | |||||||||||||||||||||||
2 | ALP | 3455 | 10410 | Ovis aries | anterior first phalanx | 27.6 | 9.6 | 12.2 | 7.6 | 8.9 | 8.3 | -0.0999207 | -0.0932757 | -0.0762046 | -0.1090810 | -0.1242842 | -0.0999207 | -0.1242842 | -0.0762046 | |||||||||||||||||||||||
3 | ALP | 4245 | 7036 | Capra hircus | humerus | 128.3 | 12.9 | 27.4 | 26.6 | 23.6 | 0.5316550 | -0.0299183 | -0.0189191 | 0.0490352 | -0.0627731 | -0.0627731 | 0.0490352 | |||||||||||||||||||||||||
4 | ALP | 4674 | 10227 | Capra hircus | humerus | 26.0 | 25.7 | 22.3 | -0.0416963 | 0.0340867 | -0.0873803 | -0.0873803 | 0.0340867 | |||||||||||||||||||||||||||||
5 | ALP | 4085 | 10253 | Capra hircus | humerus | 27.9 | 27.3 | 23.2 | -0.0110654 | 0.0603162 | -0.0701972 | -0.0701972 | 0.0603162 | |||||||||||||||||||||||||||||
7 | ALP | 3524 | 10122 | Ovis aries | anterior first phalanx | 27.7 | 9.2 | 11.5 | 7.5 | 9.4 | 8.1 | -0.0983500 | -0.1117591 | -0.1018666 | -0.1148333 | -0.1348773 | -0.0983500 | -0.1348773 | -0.1018666 |
Observe also that the distinction between Ovis aries, Capra hircus, and Ovis/Capra has not been removed from the data.
If we were interested only in one summary measure, Width or Length, we could retain the cases including this measure:
<- RemoveNACases(dataOCStandardized, measureNames = "Width")
dataOCWithWidth <- RemoveNACases(dataOCStandardized, measureNames = "Length") dataOCWithLength
which gives respectively 237 (=nrow(dataOCWithWidth)
)
and 149 (=nrow(dataOCWithLength)
) cases.
Condensed log values can be visualised as histograms and box plots using ggplot (Wickham 2011). Here we will look at some examples of plotting values from caprines.
For the example plots we will use the package ggplot2.
We can now create a boxplot for the widths:
ggplot(dataOCStandardized, aes(x = Site, y = Width)) +
geom_boxplot(outlier.shape = NA, na.rm = TRUE) +
geom_jitter(width = 0.2, height = 0, alpha = 1/2, color = 4, na.rm = TRUE) +
theme_bw() +
ggtitle("Caprine widths") +
ylab("Width log-ratio") +
coord_flip()
And another boxplot for the lengths:
ggplot(dataOCStandardized, aes(x = Site, y = Length)) +
geom_boxplot(outlier.shape = NA, na.rm = TRUE) +
geom_jitter(width = 0.2, height = 0, alpha = 1/2, color = 4, na.rm = TRUE) +
theme_bw() +
ggtitle("Caprine lengths") +
ylab("Length log-ratio") +
coord_flip()
We may choose to plot the width data as a histogram:
ggplot(dataOCStandardized, aes(Width)) +
geom_histogram(bins = 30, na.rm = TRUE) +
ggtitle("Caprine widths") +
xlab("Width log-ratio") +
facet_grid(Site ~.) +
theme_bw() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()) +
theme(plot.title = element_text(hjust = 0.5, size = 14),
axis.title.x = element_text(size = 10),
axis.title.y = element_text(size = 10),
axis.text = element_text(size = 10) ) +
scale_y_continuous(breaks = c(0, 10, 20, 30))
Here we reorder the factor levels of
dataOCStandardized$Taxon
to make the order of the boxplots
more intuitive.
<- unique(dataOCStandardized$Taxon)
levels0
levels0#> [1] "Ovis aries" "Capra hircus" "Caprini"
$Taxon <- factor(dataOCStandardized$Taxon,
dataOCStandardizedlevels = levels0[c(1,3,2)])
levels(dataOCStandardized$Taxon)
#> [1] "Ovis aries" "Caprini" "Capra hircus"
and assign specific colours for each category:
<- c("#A2A475", "#D8B70A", "#81A88D")
Ocolour ggplot(dataOCStandardized, aes(x=Site, y=Width)) +
geom_boxplot(aes(fill=Taxon),
notch = TRUE, alpha = 0, lwd = 0.377, outlier.alpha = 0,
width = 0.5, na.rm = TRUE,
position = position_dodge(0.75),
show.legend = FALSE) +
geom_point(aes(colour = Taxon, shape = Taxon),
alpha = 0.7, size = 0.8,
position = position_jitterdodge(jitter.width = 0.3),
na.rm = TRUE) +
scale_colour_manual(values=Ocolour) +
scale_shape_manual(values=c(15, 18, 16)) +
theme_bw(base_size = 8) +
ylab("LSI value") +
ggtitle("Sheep/goat LSI width values")
<- ggplot(dataOCStandardized, aes(Width, fill = Taxon)) +
TaxonSiteWidthHist geom_histogram(bins = 30, alpha = 0.5, position = "identity") +
ggtitle("Sheep/goat Widths") + facet_grid(Site ~ Taxon)
TaxonSiteWidthHist
<- ggplot(dataOCStandardized, aes(Width, fill = Taxon)) +
TaxonSiteWidthHist geom_histogram(bins = 30, alpha = 0.5, position = "identity") +
ggtitle("Sheep/goat Widths") + facet_grid(~Site)
TaxonSiteWidthHist
We may run a statistical test (here a Student t-test) to check whether the differences in log ratio length values between the sites “OLD” and “ALP” are statistically significant:
t.test(Length ~ Site, dataOCStandardized,
subset = Site %in% c("OLD", "ALP"),
na.action = "na.omit")
#>
#> Welch Two Sample t-test
#>
#> data: Length by Site
#> t = -12.23, df = 4.2233, p-value = 0.0001865
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -0.11262797 -0.07165242
#> sample estimates:
#> mean in group ALP mean in group OLD
#> -0.04589475 0.04624545
Similarly, for the differences in log ratio width values:
t.test(Width ~ Site, dataOCStandardized,
subset = Site %in% c("OLD", "ALP"),
na.action = "na.omit")
#>
#> Welch Two Sample t-test
#>
#> data: Width by Site
#> t = -3.0613, df = 8.8203, p-value = 0.01386
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -0.09749890 -0.01448767
#> sample estimates:
#> mean in group ALP mean in group OLD
#> -0.03922705 0.01676624
For testing all possible pairs of sites, the p-values must be adjusted for multiple comparisons:
library(stats)
pairwise.t.test(dataOCStandardized$Width, dataOCStandardized$Site,
pool.sd = FALSE)
#>
#> Pairwise comparisons using t tests with non-pooled SD
#>
#> data: dataOCStandardized$Width and dataOCStandardized$Site
#>
#> ALP OLD
#> OLD 0.042 -
#> TFC 0.450 0.051
#>
#> P value adjustment method: holm
We have invested a lot of time and effort in creating zoolog. Please cite the package if you publish an analysis or results obtained using zoolog. For example, “Log ratios calculation and analysis was done using the package zoolog (Pozo et al. 2022) in R 4.0.3 (R Core Team 2020).”
To get the details of the most recent version of the package, you can
use the R citation function: citation("zoolog")
.
When publishing it is also recommended that the references for
standards are properly cited, as well as any details on selecting
feature values. The details on the source of each of the reference
standards are given in the help page referencesDatabase. For instance,
if using the zoolog reference$NietoDavisAlbarella
and the
default selection method of features, a fair description may be:
“Published references for cattle (Nieto-Espinet
2018), sheep/goat (S. J. Davis
1996) and pigs (Albarella and Payne
2005) were used as standards. One length and one width log ratio
value from each specimen were included in the analysis, with values
selected following the default zoolog ‘priority’ method (Trentacoste, Nieto-Espinet, and Valenzuela-Lamas
2018): length values - GL, GLl, GLm, HTC; width values - BT, Bd,
Bp, SD, Bfd, Bfp.”