Gibbs Sampling for Archaeological Dates

library(eratosthenes)
require(Rcpp)
#> Loading required package: Rcpp
#> Warning: package 'Rcpp' was built under R version 4.4.1

A central function of the eratosthenes package is the gibbs_ad() function, which takes in information about relative sequences and absolute dating constraints, and then samples marginal probability densities for events from the full, joint conditional density, using a Gibbs sampler. This vignette provides details on the use of this function.

The function operates on a continuous timeline. Any calendrical scale is possible, but here it is conventional for CE/AD dates to be positive and BCE/BCE dates to be negative.

The Gibbs sampler is a common Markov Chain Monte Carlo (MCMC) technique, widely used in estimating posterior probabilities in Bayesian inference (a mainstay of calibrating and refining radiocarbon dates) as well as in computing marginal densities. For more information, see for example Geman and Geman (1984), Buck, Cavanagh, and Litton (1996), and Lunn et al. (2013).

Inputs

The core inputs for the gibbs_ad() function are the following:

Relative Sequences of Contexts

Relative sequences of contexts must be in the form of a list, with each object in the list being a vector whose ordering of elements is in agreement with all other elements. See the vignette Aligning Relative Sequences for more information.

The following object contexts provides an example of a valid set of relative sequences:

x <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
y <- c("B", "D", "G", "H", "K")
z <- c("F", "K", "L", "M")
contexts <- list(x, y, z)

seq_check(contexts) # check if the sequences are in agreement
#> [1] TRUE

Finds

Data on finds (i.e., any elements which pertain to a given context) are optional. Each find must be in the form of a list with the following structure:

Each find must in turn be stored in a single list object:

f1 <- list(id = "find01", assoc = "D", type = c("type1", "form1"))
f2 <- list(id = "find02", assoc = "E", type = c("type1", "form2"))
f3 <- list(id = "find03", assoc = "G", type = c("type1", "form1"))
f4 <- list(id = "find04", assoc = "H", type = c("type2", "form1"))
f5 <- list(id = "find05", assoc = "I", type = "type2")
f6 <- list(id = "find06", assoc = "H", type = NULL)
artifacts <- list(f1, f2, f3, f4, f5, f6)

Missing information on types should be supplied with a NULL value.

Finds should have no absolute dating constrains on them. If they do, they should be specified as an absolute constraint.

Absolute Constraints

Absolute constraints are predicated on whether they provide a terminus post quem (t.p.q.) for a context or a terminus ante quem (t.a.q.) for a context. The information on these absolute dates is regarded as external or extrinsic information. For example, a radiocarbon date provides for information on when the sample died, not when its context was formed; a coin type may be known to have had a range of production dates, but the production date of that particular coin may be affected by the stratigraphic context in which it is found. Such constraints may take a variety of forms.

The formatting for a t.p.q. or a t.a.q is the same, as a list in which each constraint contains:

Constraints must be contained in two separate list objects, one for t.p.q. and the other for t.a.q.:

# external
coin1 <- list(id = "coin1", assoc = "B", type = NULL, samples = runif(100,-320,-300))
coin2 <- list(id = "coin2", assoc = "G", type = NULL, samples = runif(100,37,41))
destr <- list(id = "destr", assoc = "J", type = NULL, samples = 79)

tpq_info <- list(coin1, coin2)
taq_info <- list(destr)

Additional Arguments

Additional arguments are necessary for the gibbs_ad() function:

Rules for Estimating Production Dates

Since archaeologists are typically interested in dates of production and use as much as deposition, the gibbs_ad() function will return the marginal densities for both production and deposition (from which the estimation of a use date can then be derived).

Estimating the date of the production of a find or find-type however necessitates some assumption, since in principle the absence of evidence is not viewed as evidence of absence. Without stipulating a rule, the earliest production date of any artifact could reach back endlessly into time, since an artifact does not need to have been produced after the initial occupation of a site where it has been found.

Here, two basic rules have been included for determining production dates of finds:

The "earliest" option will constrain the date of production to the earliest possible instances, while the "naive" option (the default) will select any date between an earliest threshold and the depositional date of the particular find.

If no finds are included in the gibbs_ad() arguments, then only depositional dates for contexts, not production dates, are estimated.

Functionality

The gibbs_ad() function at its core uses a Gibbs sampler, drawing from the full joint conditional density in order to sample marginal densities for dates of deposition (of contexts and finds) and production (of finds).

First, samples are drawn from any t.p.q and t.a.q.. Then, for convenience, the Gibbs sampler proceeds in order of a sequence of contexts based on the merged ranking of all contexts (via synth_rank()). The sampler will identify all contexts and constraints prior and subsequent to any one context, and then will identify the largest prior date and smallest subsequent date, in between which it will uniformly sample a date. One can adjust the number of samples drawn with the samples argument of the function:

dates <- gibbs_ad(contexts, finds = artifacts, samples = 10^4, tpq = tpq_info, taq = taq_info)

Output

The output of the gibbs_ad() function will be a list of class marginals containing the marginal densities of the depositional dates of contexts and finds, if included; production dates are given for finds types, again, if included. Marginal densities are also given for each t.p.q. and each t.a.q., which expresses the probability of their dating given the conditions of the relative sequences of contexts (not independent of them).

str(dates)
#> List of 3
#>  $ deposition:List of 9
#>   ..$ B: num [1:10000] 19.3 -148.1 -186.4 -186.8 -290.5 ...
#>   ..$ C: num [1:10000] 45.3 -147.5 -184.7 -170.4 -235.2 ...
#>   ..$ D: num [1:10000] 46.7 -83.2 -168.7 -123.9 -130.5 ...
#>   ..$ E: num [1:10000] 49.7 18.4 -40 -32.5 -98.9 ...
#>   ..$ F: num [1:10000] 52.1 42.3 43.6 32.2 -36.2 ...
#>   ..$ G: num [1:10000] 71.4 65.4 65.2 64.5 66 ...
#>   ..$ H: num [1:10000] 76 66.2 65.5 69.3 66.4 ...
#>   ..$ I: num [1:10000] 76.4 67.1 69.4 70.1 70.9 ...
#>   ..$ J: num [1:10000] 76.8 74.8 73.3 71.6 71.8 ...
#>  $ externals :List of 3
#>   ..$ coin1: num [1:10000] -320 -306 -313 -303 -319 ...
#>   ..$ coin2: num [1:10000] 37 37 38.1 37.3 37.6 ...
#>   ..$ destr: num [1:10000] 79 79 79 79 79 79 79 79 79 79 ...
#>  $ production:List of 4
#>   ..$ type1: num [1:30000] 45.4 46.2 51.9 -135.5 -50.3 ...
#>   ..$ form1: num [1:30000] 46.37 51.18 45.84 -144.97 7.93 ...
#>   ..$ form2: num [1:10000] 48.6 -79.5 -108 -51.1 -123.4 ...
#>   ..$ type2: num [1:20000] 71.4 76 66 65.5 65.3 ...
#>  - attr(*, "class")= chr [1:2] "marginals" "list"

References

Buck, C. E., W. G. Cavanagh, and C. D. Litton. 1996. Bayesian Approach to Interpreting Archaeological Data. Chichester: John Wiley & Sons.
Geman, S., and D. Geman. 1984. “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.” IEEE Transactions on Pattern Analysis and Machine Intelligence 6: 721–41.
Lunn, D., C. Jackson, N. Best, A. Thomas, and D. Spiegelhalter. 2013. The BUGS Book: A Practical Introduction to Bayesian Analysis. Boca Raton, FL: CRC Press.