Gibbs Sampling for Archaeological Dates

A central function of the eratosthenes package is the gibbs_ad() function, which takes in information about relative sequences and absolute dating constraints, and then samples marginal probability densities for events from the full, joint conditional density, using a Gibbs sampler. This vignette provides details on the use of this function.

The function operates on a continuous timeline. Any calendrical scale is possible, but here it is conventional for CE/AD dates to be positive and BCE/BCE dates to be negative.

The Gibbs sampler is a common Markov Chain Monte Carlo (MCMC) technique, widely used in estimating posterior probabilities in Bayesian inference (a mainstay of calibrating and refining radiocarbon dates) as well as in computing marginal densities. For more information, see for example Geman and Geman (1984), Buck, Cavanagh, and Litton (1996), and Lunn et al. (2013).

Inputs

The core inputs for the gibbs_ad() function are the following:

relative sequence(s) of contexts
finds (optional), associated with a context and type/class (also optional)
absolute constraints (termini post/ante quem)

Relative Sequences of Contexts

Relative sequences of contexts must be in the form of a list, with each object in the list being a vector whose ordering of elements is in agreement with all other elements. See the vignette Aligning Relative Sequences for more information.

The following object contexts provides an example of a valid set of relative sequences:

x <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
y <- c("B", "D", "G", "H", "K")
z <- c("F", "K", "L", "M")
contexts <- list(x, y, z)

seq_check(contexts) # check if the sequences are in agreement
#> [1] TRUE

Finds

Data on finds (i.e., any elements which pertain to a given context) are optional. Each find must be in the form of a list with the following structure:

id: An id number or string of the find, such as an inventory number or bibliographic reference.
assoc: The context to which the find belongs, which must be contained in the relative sequences of contexts.
type: An optional vector or element denoting any types, subtypes, classes, etc., to which the find pertains. If not present, a NULL value must be given.

Each find must in turn be stored in a single list object:

f1 <- list(id = "find01", assoc = "D", type = c("type1", "form1"))
f2 <- list(id = "find02", assoc = "E", type = c("type1", "form2"))
f3 <- list(id = "find03", assoc = "G", type = c("type1", "form1"))
f4 <- list(id = "find04", assoc = "H", type = c("type2", "form1"))
f5 <- list(id = "find05", assoc = "I", type = "type2")
f6 <- list(id = "find06", assoc = "H", type = NULL)
artifacts <- list(f1, f2, f3, f4, f5, f6)

Missing information on types should be supplied with a NULL value.

Finds should have no absolute dating constrains on them. If they do, they should be specified as an absolute constraint.

Absolute Constraints

Absolute constraints are predicated on whether they provide a terminus post quem (t.p.q.) for a context or a terminus ante quem (t.a.q.) for a context. The information on these absolute dates is regarded as external or extrinsic information. For example, a radiocarbon date provides for information on when the sample died, not when its context was formed; a coin type may be known to have had a range of production dates, but the production date of that particular coin may be affected by the stratigraphic context in which it is found. Such constraints may take a variety of forms.

The formatting for a t.p.q. or a t.a.q is the same, as a list in which each constraint contains:

id: An id number or string of the find, such as an inventory number or bibliographic reference.
assoc: The context to which the find belongs, which must be contained in the relative sequences of contexts.
type: An optional vector or element denoting any types, subtypes, classes, etc., to which the find pertains. If not present, a NULL value must be given.
samples: A numeric vector or element containing potential dates of the t.p.q. or t.a.q., i.e., a sample of the probability density function which expresses when that constraint occurred. Common densities would include:
- A single numeric if the constraint is known precisely and certainly.
- Samples of $n$ size from a continuous uniform distribution, runif(n, a, b), if known between two bounds $a$ and $b$, without any more or less certainty about any one date.
- Samples of $n$ size from a bespoke probability density, such as a calibrated radiocarbon date.

Constraints must be contained in two separate list objects, one for t.p.q. and the other for t.a.q.:

# external
coin1 <- list(id = "coin1", assoc = "B", type = NULL, samples = runif(100,-320,-300))
coin2 <- list(id = "coin2", assoc = "G", type = NULL, samples = runif(100,37,41))
destr <- list(id = "destr", assoc = "J", type = NULL, samples = 79)

tpq_info <- list(coin1, coin2)
taq_info <- list(destr)

Additional Arguments

Additional arguments are necessary for the gibbs_ad() function:

samples: the number of Gibbs samples to take (i.e., the number of estimates of any one event). By default set at 10^5.
alpha: the constraint on the earliest possible date to sample. By default set at -5000.
omega: the constraint on the latest possible date to sample. By default set at 1950.
trim: takes a logical value as input, trim specifies whether to remove contexts from the result which lie earlier than the earliest given t.p.q. or later than the latest t.a.q., i.e., contexts whose estimation depends on alpha and omega. By default set at TRUE.
rule: the rule for how to estimate production dates for artifact types, which is described in the following section. By default set at "naive".

Rules for Estimating Production Dates

Since archaeologists are typically interested in dates of production and use as much as deposition, the gibbs_ad() function will return the marginal densities for both production and deposition (from which the estimation of a use date can then be derived).

Estimating the date of the production of a find or find-type however necessitates some assumption, since in principle the absence of evidence is not viewed as evidence of absence. Without stipulating a rule, the earliest production date of any artifact could reach back endlessly into time, since an artifact does not need to have been produced after the initial occupation of a site where it has been found.

Here, two basic rules have been included for determining production dates of finds:

"naive": The earliest potential threshold of a find-type occurs sometime before the first deposition of that type, and after the deposition of the next earliest context. A production date is then chosen uniformly at random between that threshold and the depositional date of that artifact.
"earliest": The earliest potential date of a find-type occurs sometime before the first deposition of that type, and after the deposition of the next earliest context. A production date is then chosen uniformly at random between those two dates.

The "earliest" option will constrain the date of production to the earliest possible instances, while the "naive" option (the default) will select any date between an earliest threshold and the depositional date of the particular find.

If no finds are included in the gibbs_ad() arguments, then only depositional dates for contexts, not production dates, are estimated.

Functionality

The gibbs_ad() function at its core uses a Gibbs sampler, drawing from the full joint conditional density in order to sample marginal densities for dates of deposition (of contexts and finds) and production (of finds).

First, samples are drawn from any t.p.q and t.a.q.. Then, for convenience, the Gibbs sampler proceeds in order of a sequence of contexts based on the merged ranking of all contexts (via synth_rank()). The sampler will identify all contexts and constraints prior and subsequent to any one context, and then will identify the largest prior date and smallest subsequent date, in between which it will uniformly sample a date. One can adjust the number of samples drawn with the samples argument of the function:

dates <- gibbs_ad(contexts, finds = artifacts, samples = 10^4, tpq = tpq_info, taq = taq_info)

Output

The output of the gibbs_ad() function will be a list of class marginals containing the marginal densities of the depositional dates of contexts and finds, if included; production dates are given for finds types, again, if included. Marginal densities are also given for each t.p.q. and each t.a.q., which expresses the probability of their dating given the conditions of the relative sequences of contexts (not independent of them).

$deposition contains the depositional dates of contexts included in the sequences input
$externals contains the dates of the absolute constraints taking the full joint conditional density into account
$production contains the dates of production of artifact types

str(dates)
#> List of 3
#>  $ deposition:List of 9
#>   ..$ B: num [1:10000] 19.3 -148.1 -186.4 -186.8 -290.5 ...
#>   ..$ C: num [1:10000] 45.3 -147.5 -184.7 -170.4 -235.2 ...
#>   ..$ D: num [1:10000] 46.7 -83.2 -168.7 -123.9 -130.5 ...
#>   ..$ E: num [1:10000] 49.7 18.4 -40 -32.5 -98.9 ...
#>   ..$ F: num [1:10000] 52.1 42.3 43.6 32.2 -36.2 ...
#>   ..$ G: num [1:10000] 71.4 65.4 65.2 64.5 66 ...
#>   ..$ H: num [1:10000] 76 66.2 65.5 69.3 66.4 ...
#>   ..$ I: num [1:10000] 76.4 67.1 69.4 70.1 70.9 ...
#>   ..$ J: num [1:10000] 76.8 74.8 73.3 71.6 71.8 ...
#>  $ externals :List of 3
#>   ..$ coin1: num [1:10000] -320 -306 -313 -303 -319 ...
#>   ..$ coin2: num [1:10000] 37 37 38.1 37.3 37.6 ...
#>   ..$ destr: num [1:10000] 79 79 79 79 79 79 79 79 79 79 ...
#>  $ production:List of 4
#>   ..$ type1: num [1:30000] 45.4 46.2 51.9 -135.5 -50.3 ...
#>   ..$ form1: num [1:30000] 46.37 51.18 45.84 -144.97 7.93 ...
#>   ..$ form2: num [1:10000] 48.6 -79.5 -108 -51.1 -123.4 ...
#>   ..$ type2: num [1:20000] 71.4 76 66 65.5 65.3 ...
#>  - attr(*, "class")= chr [1:2] "marginals" "list"