Datasets generally require a data dictionary that will incorporate
information related to the dataset’s variables and their descriptions,
including the description of any variable options. This information is
crucial, particularly when a dataset will be shared for further
analyses. However, constructing it may be time-consuming. The
dataMeta
package has a collection of functions that are
designed to construct a data dictionary and append it to the original
dataset as an attribute, along with other information generally provided
in other software as metadata. This information will include: the time
and date when it was edited last, the user name, and a main description
of the dataset. Finally, the dataset is saved as an R dataset
(.rds).
Suggestions are most welcome!
Install dataMeta in R from GitHub using devtools:
install.packages("devtools")
library(devtools)
install_github("dmrodz/dataMeta")
Install dataMeta in R from CRAN:
install.packages("dataMeta")
dataMeta workflow
There are three basic steps to building a data dictionary with this package, which are outlined in the figure below. First, a “linker” data frame is created, where the user will add each variable description and also provide a variable “type” or key. These keys/variable types are explained in the vignette (see link above). The variable type will depend on whether the user wants to list all available variable options or if a range of variable values would suffice. Secondly, the main data dictionary is created using the original dataset and the linker data frame. Here, the user will be able to construct any additional variable option descriptions as needed or just build a dictionary with variable names, their descriptions and options. Finally, the user can append the dictionary to the original dataset as an R attribute, along with the date in which the dictionary is created, the author name, and also general R attributes included for data frames. The new dataset with its attributes is then saved as an R dataset (.rds).