A dedicated Slack channel has been created for announcements, support and to help build a community of practice around this open source package. You may request an invitation to join from jonathan.callahan@dri.com.
Utility functions for discovering and managing metadata associated
with spatially unique "known locations". Applications include all
fields of environmental monitoring (e.g. air and water quality) where
data are collected at stationary sites.
This package is intended for use in data management activities associated with fixed locations in space. The motivating fields include air and water quality monitoring where fixed sensors report at regular time intervals.
When working with environmental monitoring time series, one of the
first things you have to do is create unique identifiers for each
individual time series. In an ideal world, each environmental time
series would have both a locationID
and a
deviceID
that uniquely identify the specific instrument
making measurements and the physical location where measurements are
made. A unique timeseriesID
could be produced as
locationID_deviceID
. Metadata associated with each
timeseriesID
would contain basic information needed for
downstream analysis including at least:
timeseriesID, locationID, deviceID, longitude, latitude, ...
deviceID
.locationID
.longitude, latitude
.data
table with timeseriesID
column
names.Unfortunately, we are rarely supplied with a truly unique and truly
spatial locationID
. Instead we often use
deviceID
or an associated non-spatial identifier as a
stand-in for locationID
.
Complications we have seen include:
locationID
.locationID
.A solution to all these problems is possible if we store spatial
metadata in simple tables in a standard directory. These tables will be
referred to as collections. Location lookups can be performed
with geodesic distance calculations where a longitude-latitude pair is
assigned to a pre-existing known location if it is within
distanceThreshold
meters of that location. These lookups
will be extremely fast.
If no previously known location is found, the relatively slow (seconds) creation of a new known location metadata record can be performed and then added to the growing collection.
For collections of stationary environmental monitors that only number
in the thousands, this entire collection can be stored as
either a .rda
or .csv
file and will be under a
megabyte in size making it fast to load. This small size also makes it
possible to save multiple collections files, each created with
different locations and/or different distance thresholds to address the
needs of different scientific studies.
Working in this manner solves the problems initially mentioned but also provides further useful functionality:
.csv
or .rda
versions of well populated
tables can be downloaded from a URL and used locally, giving scientists
and analysts working with known locations instant access to
location-specific spatial metadata data that otherwise requires special
software and skills, large datasets and many compute cycles to
generate.Development of this R package has been supported with funding from the following institutions:
Questions regarding further development of the package should be directed to jonathan.callahan@dri.edu.