The purpose of cepumd is to make working with Consumer Expenditure Surveys (CE) Public-Use Microdata (PUMD) easier toward calculating mean, weighted, annual expenditures (henceforth “mean expenditures”). The challenges cepumd seeks to address deal primarily with pulling together the necessary data toward this end. Some of the overarching ideas underlying the package are as follows:
Use a Tidyverse framework for most operations and be (hopefully) generally Tidyverse friendly
Balance the effort to make the end user’s experience with CE PUMD easier while being flexible enough to allow that user to perform any analysis with the data they wish
Only designed to help users calculate mean expenditures on and of the consumer unit (CU), i.e., not income, not assets, not liabilities, not gifts.
cepumd
cepumd
seeks to address challenges in three categories: data gathering/organization; managing data inconsistencies; and calculating weighted, annual metrics.
ce_hg()
ce_hg()
and ce_uccs()
ce_prepdata()
ce_mean()
or expenditure quantile with ce_quantile()
Install the production version with install.packages("cepumd")
You can install the development version of cepumd
from GitHub, but you’ll first need the devtools
package:
if (!"devtools" %in% installed.packages()[, "Package"]) {
install.packages("devtools", dependencies = TRUE)
}
devtools::install_github("arcenis-r/cepumd")
The workhorse of cepumd
is ce_prepdata()
. It merges the household characteristics file (FMLI/-D) with the corresponding expenditure tabulation file (MTBI/EXPD) for a specified year, adjusts weights for months-in-scope and the number of collection quarters, adjusts some cost values by their periodicity factor (some cost categories are represented as annual figures and others as quarterly). With the recent update it only requires the first 3 arguments to function: the year, the survey type, and one or more valid UCCs. ce_prepdata()
now creates all of the other necessary objects within the function if not provided.
There are two functions for wrangling hierarchical grouping data into more usable formats:
ce_hg()
pulls the requested type of HG file (Interview, Diary, or Integrated) for a specified year.ce_uccs()
filters the HG file for the specified expenditure category and returns either a data frame with only that section of the HG file or the Universal Classification Codes (UCCs) that make up that expenditure category.There are two functions that the user can use to calculate CE summary statistics:
ce_mean()
calculates a mean expenditure, standard error of the mean, coefficient of variation, and an aggregate expenditure.ce_quantiles()
calculates weighted expenditure quantiles. It is important to note that calculating medians for integrated expenditures is not recommended because the calculation involves using weights from both the Diary and Survey instruments.