lakhesis: Consensus Seriation for Binary Data

The R package lakhesis provides a heuristic-critical platform for seriating binary data matrices through the exploration, selection, and consensus of partially seriated sequences.

In brief, seriation (sequencing, ordination) involves putting a set of things in an optimal order. In archaeology, seriation can be used to establish a chronological order of contexts and find-types on the basis of their similarity, i.e, that things come into and go out of fashion with a peak moment of popularity. In ecology, the distribution of a species may occur according to a preferred environmental condition that diminishes as that environment changes. There are a number of R functions and packages (especially seriation and vegan) that provide means to seriate or ordinate matrices, especially for frequency or count data. While binary (presence/absence) data are often viewed as a reductive case of frequency data, they can also present their own challenges for seriation. Moreover, not all “incidence matrices” (the matrix of 0/1s that record the joint incidence or occurrence for a row-column pairing) will necessarily be well seriated. The selection of row and column elements in the input is accordingly an intrinsic part of the task of seriation. In this respect, lakhesis seeks to complement existing methods in R, by focusing on binary data. It uses correspondence analysis, a mainstay technique for seriation, which is then fit to a reference curve that represents “ideally” seriated data. Multiple seriations can be run on partial subsets of the initial incidence matrix, which are then recompiled into a single consensus seriation. Critical measures are also provided.

While command line functions can be run in R, the functionality of lakhesis is primarily achieved via the Lakhesis Calculator, a graphical platform in shiny that enables investigators to explore datasets for potential seriated sequences, select them, and then harmonize them into a single consensus seriation. The four panels in the calculator include the following:

The sidebar contains the following commands:

Installation

To obtain the current development version of lakhesis from GitHub, install from GitHub in the R command line with:

library(devtools)
install_github("scollinselliott/lakhesis") 

Usage

To start the Lakhesis Calculator, execute the function LC():

library(lakhesis)
LC()

Note that in uploading a csv file for analysis inside the Lakhesis Calculator, the file should consist of just two columms without headers. If data are already in incidence matrix format, the im.long() function in lakhesis can be used to convert an incidence matrix to be exported into the necessary long format, using the write.table() function to export (see documentation on im.long()).

Bibliography

Hahsler M, Hornik K, Buchcta C (2008). “Getting Things in Order: An Introduction to the R Package seriation.” Journal of Statistical Software, 25, 1-34. doi:10.18637/jss.v025.i03.

Ihm P (2005). “A Contribution to the History of Seriation in Archaeology.” In Weihs C, Gaul W (eds.), Classification - The Ubiquitous Challenge, 307-16. Springer, Berlin.

Nenadic O, Greenacre MJ (2007). “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package.” Journal of Statistical Software, 20, 1-13. doi:10.18637/jss.v020.i03.

ter Braak CJF, Looman, CWN. (1986). “Weighted Averaging, Logistic Regression and the Gaussian Response Model.” Vegetatio 65, 3-11. doi:10.1007/BF00032121.