labelr supports creation and use of multiple types of labels for data.frames and their columns. This is an ad hoc introduction to core and ancillary labelr functionalities and uses cases.
labelr supports the following kinds of labels:
Frame labels - Each data.frame may be given a single “frame label”, which can be used to describe the data set’s key general features or characteristics (e.g., source, date produced or published, high-level contents).
Name labels - Each data.frame column (variable) may be given exactly one name label, which is an extended variable name or brief description of the variable. Name labels are equivalent to what Stata and SAS call “variable labels.”
Value labels - Specific values of a data.frame column (variable) can be labeled as well. The package supports three (3) kinds of value labels.
One-to-one labels - The canonical value-labeling use case entails mapping distinct values of a variable to distinct labels in a one-to-one fashion, so that each value label uniquely identifies a substantive value. For instance, an administrative data set might assign the integers 1-7 to seven distinct racial/ethnic groups, and value labels would be critical in mapping those numbers to socially substantive racial/ethnic category concepts (e.g., Which number corresponds to the category “Asian American?”).
Many-to-one labels - In an alternative use case, value labels may serve to distill or “bucket” distinct variable values in a way that deliberately “throws away” information for purposes of simplification. For example, one may wish to give the single label “Agree” to the responses “Very Strongly Agree,” “Strongly Agree,” and “Agree.” Or one may wish to differentiate self-identified “White” respondents from “People of Color,” applying the latter value label to all categories other than “White.”
Numerical range labels - Finally, one may wish to carve a numerical variable into qualitative bins, such as dichotomizing a variable or dividing it into quantiles. Numerical range labels support one-to-many assignment of a single value label to a range of numerical values for a given variable.
More specifically, labelr functions support the following actions:
Assigning variable value labels, name labels, and a frame label to data.frames and modifying those labels thereafter.
Generating and accessing simple look-up table-style data.frames to inform or remind you about a data.frame’s frame labels, its columns’ name labels, or the value labels that correspond to its unique values.
Swapping out variable (column) names for variable name labels and back again.
Replacing variables’ actual values with their corresponding value labels.
Augmenting a data.frame by adding columns of variable value labels that can exist alongside the original columns (variables) from which they were derived.
Engaging in base::subset()
-like row-filtering, using
value labels to guide the filtering but returning a subsetted data.frame
in terms of the original variable values.
Tabulating value frequencies that can be expressed in terms of raw values or value labels – again, without explicitly modifying or converting the raw data.frame values.
Preserving and restoring a data.frame’s labels in the event that some unsupported R operation destroys them.
Applying a single value-labeling scheme to many variables at once (for example, assigning the same set of Likert-scale labels to all variables that share a common variable name character substring).
Note: To minimize dependencies and reduce unexpected behaviors, key labelr functions will coerce augmented/non-standard data.frames (e.g., tibbles, data.tables) to labeled data.frames of class labeled.data.frame. If you work with non-standard data.frames, the suggested workflow is to affix and use labelr labels before transforming the labeled.data.frame to a one of these other non-standard data.frame classes, if at all. While some augmented data.frames and their functions may “play well” with labelr-style labels and functions, this is not guaranteed. Experiment as desired and at your own discretion.
We’ll start our exploration of core labelr functions with a fake “demographic” data.frame. First, though, let’s load the package labelr.
We’ll use make_demo_data()
(included with labelr) to
create the fictional data set.
add_frame_lab()
We’ll start our labeling session by providing a fittingly fictional high-level description of this fictional data set. labelr calls this a FRAME label.
df <- add_frame_lab(df, frame.lab = "Demographic and reaction time test score
records collected by Royal Statistical Agency of
Fictionaslavica. Data fictionally collected in the year
1987. As published in A. Smithee (1988). Some Fictional Data
for Your Amusement. Mad Magazine, 10(1), 1-24.")
get_frame_lab(df)
### > data.frame
### > 1 df
### > frame.lab
### > 1 Demographic and reaction time test score records collected by Royal Statistical Agency of Fictionaslavica. Data fictionally collected in the year 1987. As published in A. Smithee (1988). Some Fictional Data for Your Amusement. Mad Magazine, 10(1), 1-24.
add_name_labs()
Now, let’s add (some fairly trivial) variable NAME labels
df <- add_name_labs(df, name.labs = c(
"age" = "Age in years",
"raceth" = "Racial/ethnic identity group category",
"gender" = "Gender identity category",
"edu" = "Highest education level attained",
"x1" = "Space Invaders reaction time test scores",
"x2" = "Galaga reaction time test scores"
))
Even if we do nothing else with these name labels, we can access or manipulate a simple lookup table as needed.
get_name_labs(df)
### > var lab
### > 1 id id
### > 2 age Age in years
### > 3 gender Gender identity category
### > 4 raceth Racial/ethnic identity group category
### > 5 edu Highest education level attained
### > 6 x1 Space Invaders reaction time test scores
### > 7 x2 Galaga reaction time test scores
add_val_labs()
Now, let’s do some VALUE labeling. First, let’s use
add_val_labs()
to add one-to-one value labels for the
variable “raceth”.
df <- add_val_labs(df, # data.frame with to-be-value-labeled column
vars = "raceth", # quoted variable name of to-be-labeled col
vals = c(1:7), # to-be-labeled values 1 through 7, inclusive
labs = c(
"White", "Black", "Hispanic", # ordered labels for vals 1-7
"Asian", "AIAN", "Multi", "Other"
),
max.unique.vals = 10 # max number of unique values permitted
)
add_val1()
Now let’s add value labels for the variable “gender.” Function
add_val1
is a variant of add_val_labs
that
allows you to supply the variable name unquoted, provided you are
value-labeling only one variable. (It’s not evident from the above, but
add_val_labs
supports labeling multiple variables at
once).
df <- add_val1(
data = df,
var = gender, # contrast this var argument to the vars argument demo'd above
vals = c(0, 1, 2, 3, 4), # the values to be labeled
labs = c("M", "F", "TR", "NB", "Diff-Term"), # the labels, applied in order, to the vals
max.unique.vals = 10
)
Once again, we can create a lookup table, this time for our
labels-to-values mappings. Because we used add_val_labs()
and add_val
(), each unique value of our value-labeled
variables will (must) have one unique label (one-to-one mapping), and
any unique values that were not explicitly assigned a label were given
one automatically (the value itself, coerced to character as
needed).
get_val_labs(df)
### > var vals labs
### > 1 gender 0 M
### > 2 gender 1 F
### > 3 gender 2 TR
### > 4 gender 3 NB
### > 5 gender 4 Diff-Term
### > 6 gender NA NA
### > 7 raceth 1 White
### > 8 raceth 2 Black
### > 9 raceth 3 Hispanic
### > 10 raceth 4 Asian
### > 11 raceth 5 AIAN
### > 12 raceth 6 Multi
### > 13 raceth 7 Other
### > 14 raceth NA NA
add_quant_labs()
Traditionally, value labels are intended for categorical variables,
such as binary, nominal, or (integer) ordinal variables with limited
numbers of distinct categories. Further, as just noted, value labels
that are added using add_val_labs
(or
add_val1
) are constrained to map one-to-one to distinct
values: No two distinct values could share a value label or vice
versa.
If you wish to relax these constraints and apply a label to a range
of values of a numeric variable, such as labeling each value according
to the quintile or decile to which it belongs, you can use
add_quant_labs()
(or add_quant1
) to do so.
Here, we will use add_quant_labs
with the partial
argument set to TRUE to apply quintile range labels to all
variables of df that have an “x” in their names (i.e., vars
“x1” and “x2”). We demonstrate this capability further at the end of the
separate “Special Topics” vignette.
df_temp <- add_quant_labs(
data = df,
vars = "x",
qtiles = 5,
partial = TRUE
)
get_val_labs(df_temp)
### > var vals labs
### > 1 gender 0 M
### > 2 gender 1 F
### > 3 gender 2 TR
### > 4 gender 3 NB
### > 5 gender 4 Diff-Term
### > 6 gender NA NA
### > 7 raceth 1 White
### > 8 raceth 2 Black
### > 9 raceth 3 Hispanic
### > 10 raceth 4 Asian
### > 11 raceth 5 AIAN
### > 12 raceth 6 Multi
### > 13 raceth 7 Other
### > 14 raceth NA NA
### > 15 x1 82.976 q020
### > 16 x1 95.238 q040
### > 17 x1 106.142 q060
### > 18 x1 117.524 q080
### > 19 x1 157.98 q100
### > 20 x1 NA NA
### > 21 x2 0.22404 q020
### > 22 x2 0.41608 q040
### > 23 x2 0.62034 q060
### > 24 x2 0.80538 q080
### > 25 x2 0.9992 q100
### > 26 x2 NA NA
For these variables, get_val_labs()
shows the quantity
values that define the requested quantile thresholds (in this case,
quintiles), with all values at or below the given threshold (and above
the previous threshold) receiving the corresponding label.
Be careful with setting partial to TRUE like this:
If your data set featured a column called “sex” or that featured the
string “tax” or the suffix “max” in its name,
add_quant_labs()
would attempt to apply the value labeling
scheme to that column as well!
(One more side note: If you wish to apply quantile-based value labels
to all numeric variables at once, you may wish to explore
all_quant_labs()
.)
Moving on. We can use the same function to assign arbitrary, user-specified range labels. Here, we assign numerical range labels based on an arbitrary cutpoint that differentiates values of “x1” and “x2” that are at or below 100 from values that are at or below 150 (but greater than 100).
df_temp <- add_quant_labs(
data = df_temp,
vars = "x",
vals = c(100, 150),
partial = TRUE
)
### > Warning in add_quant_labs(data = df_temp, vars = "x", vals = c(100, 150), :
### >
### > Some of the supplied vals argument values are outside
### > the observed range of var --x2-- values
get_val_labs(df_temp)
### > var vals labs
### > 1 gender 0 M
### > 2 gender 1 F
### > 3 gender 2 TR
### > 4 gender 3 NB
### > 5 gender 4 Diff-Term
### > 6 gender NA NA
### > 7 raceth 1 White
### > 8 raceth 2 Black
### > 9 raceth 3 Hispanic
### > 10 raceth 4 Asian
### > 11 raceth 5 AIAN
### > 12 raceth 6 Multi
### > 13 raceth 7 Other
### > 14 raceth NA NA
### > 15 x1 100 <=100
### > 16 x1 150 <=150
### > 17 x1 NA NA
### > 18 x2 100 <=100
### > 19 x2 150 <=150
### > 20 x2 NA NA
Having demonstrated the basic functionality on our df_temp copy of
df, let’s ignore that data.frame and return our focus to df. We’ll use
add_quant1
to apply quintile range labeling to the variable
“x1” only. Note that add_quant1
is like
add_quant_labs
, but accepts only a single variable, whose
name can be supplied without quotes. The opposite trade-off holds for
add_quant_labs
: The relationship between these two
functions mirrors the relationship between add_val_labs
and
add_val1
.
df <- add_quant1(df, # data.frame
x1, # variable to value-label
qtiles = 5
) # number of quintiles to use in defining numerical range labels
We’ll preserve the “x1” range labels going forward, keeping “x2” unlabeled.
add_m1_lab()
If you wish to apply a single label to multiple distinct values that
are not necessarily part of a numerical range, this can be done through
successive calls to add_m1_lab()
Here, the “m1” is
shorthand for “many to one,” as in “many values get the same one value
label.”
Note that each call to add_m1_lab()
applies a single
value label, so, multiple calls are needed to apply multiple labels.
Here, we illustrate this workflow, applying the label “Some College+” to
values 3, 4, or 5 of the variable “edu”, then applying other distinct
labels to values 1 and 2, respectively.
df <- add_m1_lab(df, "edu", vals = c(3:5), lab = "Some College+")
df <- add_m1_lab(df, "edu", vals = 1, lab = "Not HS Grad")
df <- add_m1_lab(df, "edu", vals = 2, lab = "HSG, No College")
get_val_labs(df)
### > var vals labs
### > 1 gender 0 M
### > 2 gender 1 F
### > 3 gender 2 TR
### > 4 gender 3 NB
### > 5 gender 4 Diff-Term
### > 6 gender NA NA
### > 7 raceth 1 White
### > 8 raceth 2 Black
### > 9 raceth 3 Hispanic
### > 10 raceth 4 Asian
### > 11 raceth 5 AIAN
### > 12 raceth 6 Multi
### > 13 raceth 7 Other
### > 14 raceth NA NA
### > 15 edu 1 Not HS Grad
### > 16 edu 2 HSG, No College
### > 17 edu 3 Some College+
### > 18 edu 4 Some College+
### > 19 edu 5 Some College+
### > 20 edu NA NA
### > 21 x1 82.976 q020
### > 22 x1 95.238 q040
### > 23 x1 106.142 q060
### > 24 x1 117.524 q080
### > 25 x1 157.98 q100
### > 26 x1 NA NA
As with the other value-adding functions, there is a variant of
add_m1_lab
that allows you to value-label a single variable
whose name is unquoted. It is add1m1()
.
All of this is nice, but have we really accomplished anything? A casual view of the data.frame raises some doubts:
head(df_copy, 3) # our pre-labeling copy of the data.frame
### > id age gender raceth edu x1 x2
### > T-1 1 59 1 4 5 120.25 0.5928
### > N-2 2 56 1 1 2 67.12 0.9116
### > D-3 3 54 1 6 3 79.28 0.6993
head(df, 3) # our latest, post-labeling version of same data.frame
### > id age gender raceth edu x1 x2
### > T-1 1 59 1 4 5 120.25 0.5928
### > N-2 2 56 1 1 2 67.12 0.9116
### > D-3 3 54 1 6 3 79.28 0.6993
These two data.frames still look identical.
Rest assured, labeling has introduced some unobtrusive but important features for us to use.
Now that our data.frame has labels, let’s demonstrate some ways that we can use them.
Base R includes the head()
and tail()
functions, which allow you to show the first n or last n rows of a
data.frame. In addition, the “car” package offers a similar function
called some()
, which allows you to show a random n rows of
a data.frame.
labelr provides versions of these functions that will display value labels in place of values, without actually altering the values in the underlying data.frame. Let’s demonstrate each of the three standard functions, followed by its labelr counterpart. Note that the unconventional rownames (e.g., “T-1,” “N-2”) are provided as an aid to help you visually locate a literal row that may appear across calls.
head(df, 5) # Base R function utils::head()
### > id age gender raceth edu x1 x2
### > T-1 1 59 1 4 5 120.25 0.5928
### > N-2 2 56 1 1 2 67.12 0.9116
### > D-3 3 54 1 6 3 79.28 0.6993
### > Q-4 4 46 1 5 4 99.59 0.2243
### > E-5 5 18 1 6 4 90.49 0.0099
headl(df, 5) # labelr function headl() (note the "l")
### > id age gender raceth edu x1 x2
### > T-1 1 59 F Asian Some College+ q100 0.5928
### > N-2 2 56 F White HSG, No College q020 0.9116
### > D-3 3 54 F Multi Some College+ q020 0.6993
### > Q-4 4 46 F AIAN Some College+ q060 0.2243
### > E-5 5 18 F Multi Some College+ q040 0.0099
tail(df, 5) # Base R function utils::tail()
### > id age gender raceth edu x1 x2
### > Z-996 996 63 0 1 4 92.36 0.0447
### > S-997 997 18 0 4 4 147.40 0.2252
### > K-998 998 45 0 5 2 106.87 0.1610
### > I-999 999 46 1 4 2 119.13 0.7666
### > H-1000 1000 68 0 6 5 70.38 0.5123
taill(df, 5) # labelr function taill() (note the extra "l")
### > id age gender raceth edu x1 x2
### > Z-996 996 63 M White Some College+ q040 0.0447
### > S-997 997 18 M Asian Some College+ q100 0.2252
### > K-998 998 45 M AIAN HSG, No College q080 0.1610
### > I-999 999 46 F Asian HSG, No College q100 0.7666
### > H-1000 1000 68 M Multi Some College+ q020 0.5123
set.seed(293)
car::some(df, 5) # car package function car::some()
### > id age gender raceth edu x1 x2
### > F-181 181 44 1 5 2 87.46 0.0965
### > K-248 248 30 1 2 3 129.62 0.4484
### > N-341 341 19 1 5 2 45.21 0.6074
### > F-457 457 58 1 5 4 124.84 0.9890
### > P-458 458 30 1 7 3 96.22 0.5607
set.seed(293)
somel(df, 5) # labelr function somel() (note the "l")
### > id age gender raceth edu x1 x2
### > F-181 181 44 F AIAN HSG, No College q040 0.0965
### > N-341 341 19 F AIAN HSG, No College q020 0.6074
### > P-458 458 30 F Other Some College+ q060 0.5607
### > F-457 457 58 F AIAN Some College+ q100 0.9890
### > K-248 248 30 F Black Some College+ q100 0.4484
Note that some()
and somel()
both return
random rows, but they will not necessarily return the same random rows,
even with the same random number seed.
use_val_labs()
and
uvl()
We can generalize this overlaying (aka “turning on” aka “swapping in”) of value labels to the entire data.frame. For example, we might do this temporarily, to visualize the labels in place of values.
use_val_labs(df)[1:20, ] # headl() is just a more compact shortcut for this
### > id age gender raceth edu x1 x2
### > T-1 1 59 F Asian Some College+ q100 0.5928
### > N-2 2 56 F White HSG, No College q020 0.9116
### > D-3 3 54 F Multi Some College+ q020 0.6993
### > Q-4 4 46 F AIAN Some College+ q060 0.2243
### > E-5 5 18 F Multi Some College+ q040 0.0099
### > K-6 6 45 M Black Some College+ q020 0.9250
### > Y-7 7 57 M White HSG, No College q060 0.9446
### > C-8 8 46 M Hispanic HSG, No College q080 0.4053
### > W-9 9 37 F Black Some College+ q020 0.3998
### > A-10 10 12 F Other HSG, No College q060 0.5857
### > A-11 11 46 M Other Some College+ q020 0.7027
### > S-12 12 28 M Hispanic Some College+ q020 0.6538
### > Z-13 13 15 F AIAN Some College+ q080 0.6267
### > H-14 14 39 F AIAN Some College+ q020 0.8989
### > A-15 15 18 F White Some College+ q100 0.2974
### > B-16 16 48 M Multi Some College+ q080 0.2212
### > H-17 17 39 M AIAN Some College+ q060 0.3127
### > F-18 18 52 M Hispanic Some College+ q060 0.4350
### > F-19 19 33 M Other Some College+ q100 0.2809
### > A-20 20 29 M White Some College+ q060 0.8188
Or we can wrap a call to this function around our data.frame and pass
the result to other functions. Here is an illustration that passes a
use_val_labs()
-wrapped data.frame to the
qsu()
function of the collapse package. To save typing,
we’ll use uvl()
, a more compact alias for
use_val_labs()
.
First we show the unwrapped call to collapse::qsu()
,
followed by an otherwise identical call that wraps the data.frame in
uvl()
. Focus your eyes on the leftmost column of the
console outputs of the respective calls (i.e., the rownames of the
object generated by qsu::collapse()
).
# `collapse::qsu()`
# with labels "off" (i.e., using regular values of "raceth" as by var)
(by_demog_val <- collapse::qsu(df, cols = c("x2"), by = ~raceth))
### > N Mean SD Min Max
### > 1 156 0.5067 0.2696 0.0018 0.9966
### > 2 147 0.4922 0.2755 0.0041 0.9951
### > 3 144 0.4951 0.299 0.0172 0.9992
### > 4 127 0.5461 0.2873 0.006 0.9885
### > 5 155 0.5476 0.2995 0.0076 0.994
### > 6 140 0.5163 0.2798 0.0099 0.9915
### > 7 131 0.5132 0.2786 0.0014 0.9918
# with labels "on" (i.e., using labels, thanks to `uvl()`)
(by_demog_lab <- collapse::qsu(uvl(df), cols = c("x2"), by = ~raceth))
### > N Mean SD Min Max
### > AIAN 155 0.5476 0.2995 0.0076 0.994
### > Asian 127 0.5461 0.2873 0.006 0.9885
### > Black 147 0.4922 0.2755 0.0041 0.9951
### > Hispanic 144 0.4951 0.299 0.0172 0.9992
### > Multi 140 0.5163 0.2798 0.0099 0.9915
### > Other 131 0.5132 0.2786 0.0014 0.9918
### > White 156 0.5067 0.2696 0.0018 0.9966
This second call would achieve the same result if we used
use_val_labs()
, but uvl()
is more compact for
typing and printing purposes.
with_val_labs()
and
wvn
labelr also offers an option to overlay (“swap out”) value labels
using base::with()
-like non-standard evaluation. This is
helpful in a few specific cases.
with(df, table(gender, raceth)) # base::with()
### > raceth
### > gender 1 2 3 4 5 6 7
### > 0 82 65 62 61 61 67 60
### > 1 66 71 74 52 78 65 62
### > 2 3 3 5 8 7 5 3
### > 3 3 6 3 4 7 3 5
### > 4 2 2 0 2 2 0 1
with_val_labs(df, table(gender, raceth)) # labelr::with_val_labs()
### > raceth
### > gender AIAN Asian Black Hispanic Multi Other White
### > Diff-Term 2 2 2 0 0 1 2
### > F 78 52 71 74 65 62 66
### > M 61 61 65 62 67 60 82
### > NB 7 4 6 3 3 5 3
### > TR 7 8 3 5 5 3 3
wvl(df, table(gender, raceth)) # labelr::wvl is a more compact alias
### > raceth
### > gender AIAN Asian Black Hispanic Multi Other White
### > Diff-Term 2 2 2 0 0 1 2
### > F 78 52 71 74 65 62 66
### > M 61 61 65 62 67 60 82
### > NB 7 4 6 3 3 5 3
### > TR 7 8 3 5 5 3 3
In a little bit, we’ll see that we have some parallel options for overlaying (“turning on”) NAME labels.
add_lab_cols()
If all this wrapping and interactive toggling back and forth is making you dizzy, we could do something more permanent.
For example, we can assign the result of a
use_val_labs()
call to an object. The result will be a
data.frame with the same names and dimensions as the one supplied, with
value labels replacing values for all value-labeled variables (or for a
subset of those variables, if you specify them). Those variables will be
coerced to character (if they were not already). Since there is no
simple “undo” facility for this action, it is safest to assign the
result to a new object.
df_labd <- use_val_labs(df)
head(df_labd) # note, this is utils::head(), not labelr::headl()
### > id age gender raceth edu x1 x2
### > T-1 1 59 F Asian Some College+ q100 0.5928
### > N-2 2 56 F White HSG, No College q020 0.9116
### > D-3 3 54 F Multi Some College+ q020 0.6993
### > Q-4 4 46 F AIAN Some College+ q060 0.2243
### > E-5 5 18 F Multi Some College+ q040 0.0099
### > K-6 6 45 M Black Some College+ q020 0.9250
Perhaps better still, we do not need to choose between values and
labels. We can use add_lab_cols()
to preserve all existing
variables (columns), including the value-labeled ones, while adding to
our data.frame an additional labels-as-values column for each
value-labeled column.
Easier done than said. Take a look:
flab()
We also can filter a value-labeled data.frame using value labels, returning a subsetted data.frame in terms of the original values. In other words, we can use the more semantically meaningful value labels to guide our subsetting, even as they remain “invisible” and “in the background” of the returned, filtered data.frame. Again, I find this “easier done than said.”
head(df)
### > id age gender raceth edu x1 x2
### > T-1 1 59 1 4 5 120.25 0.5928
### > N-2 2 56 1 1 2 67.12 0.9116
### > D-3 3 54 1 6 3 79.28 0.6993
### > Q-4 4 46 1 5 4 99.59 0.2243
### > E-5 5 18 1 6 4 90.49 0.0099
### > K-6 6 45 0 2 4 78.55 0.9250
df1 <- flab(df, raceth == "Asian" & gender == "F")
head(df1, 5) # returned df1 is in terms of values, just like df
### > id age gender raceth edu x1 x2
### > T-1 1 59 1 4 5 120.25 0.5928
### > D-40 40 60 1 4 4 78.12 0.9885
### > E-67 67 39 1 4 5 98.21 0.6244
### > I-73 73 36 1 4 2 98.42 0.2102
### > V-80 80 27 1 4 4 122.62 0.3137
headl(df1, 5) # note use of labelr::headl; labels are there
### > id age gender raceth edu x1 x2
### > T-1 1 59 F Asian Some College+ q100 0.5928
### > D-40 40 60 F Asian Some College+ q020 0.9885
### > E-67 67 39 F Asian Some College+ q060 0.6244
### > I-73 73 36 F Asian HSG, No College q060 0.2102
### > V-80 80 27 F Asian Some College+ q100 0.3137
We’ve used these two variables’ value labels to guide our filtering,
without ever explicitly changing the contents of our columns from values
to labels. For instance, note that we did NOT make an explicit call to
use_val_labs()
or add_lab_cols()
before our
call to flab()
. So long as we are providing actually
existing value labels that have been previously applied to the columns
in question, flab()
knows where to find them and how to use
them.
slab()
As with base::subset()
, we can also limit which columns
we return. In this case, we filter on two value-labeled columns and
return a data.frame consisting of only those columns.
df2 <- slab(df, raceth == "Black" & gender == "M", gender, raceth)
head(df2, 10)
### > gender raceth
### > K-6 0 2
### > F-22 0 2
### > E-30 0 2
### > O-46 0 2
### > Q-48 0 2
### > F-72 0 2
### > T-117 0 2
### > K-149 0 2
### > M-161 0 2
### > A-167 0 2
In the case of slab()
, we simply list the desired
columns – unquoted and comma-separated – after the filter
Just as we used use_val_labs()
to swap out values for
value labels, we can use use_name_labs()
to swap out
variable names for variable NAME labels. Let’s illustrate this with the
mtcars data.frame.
First we’ll construct a vector of named labels.
names_labs_vec <- c(
"mpg" = "Miles/(US) gallon",
"cyl" = "Number of cylinders",
"disp" = "Displacement (cu.in.)",
"hp" = "Gross horsepower",
"drat" = "Rear axle ratio",
"wt" = "Weight (1000 lbs)",
"qsec" = "1/4 mile time",
"vs" = "Engine (0 = V-shaped, 1 = straight)",
"am" = "Transmission (0 = automatic, 1 = manual)",
"gear" = "Number of forward gears",
"carb" = "Number of carburetors"
)
Now, we will apply them to mtcars and assign the resulting data.frame to a new data.frame called mt2.
Here is an alternative add_name_labs()
syntax that would
get us to the same end state:
mt2 <- add_name_labs(mtcars,
name.labs = c(
"mpg" = "Miles/(US) gallon",
"cyl" = "Number of cylinders",
"disp" = "Displacement (cu.in.)",
"hp" = "Gross horsepower",
"drat" = "Rear axle ratio",
"wt" = "Weight (1000 lbs)",
"qsec" = "1/4 mile time",
"vs" = "Engine (0 = V-shaped, 1 = straight)",
"am" = "Transmission (0 = automatic, 1 = manual)",
"gear" = "Number of forward gears",
"carb" = "Number of carburetors"
)
)
Now, let’s swap out names for NAME labels.
mt2 <- use_name_labs(mt2)
head(mt2[c(1, 2)])
### > Miles/(US) gallon Number of cylinders
### > Mazda RX4 21.0 6
### > Mazda RX4 Wag 21.0 6
### > Datsun 710 22.8 4
### > Hornet 4 Drive 21.4 6
### > Hornet Sportabout 18.7 8
### > Valiant 18.1 6
Yikes, the longer column names stretch things out quite a bit. Even
so, if we wish to keep our name labels “on” and work with them as our
new column names, one approach is to use get_name_labs
to
get a look-up table, then use copy-and-paste or RStudio auto-complete
capabilities to “hand jam” these into subsequent calls.
For example:
lm(`Miles/(US) gallon` ~ `Number of cylinders`, data = mt2) # pasting in var names
### >
### > Call:
### > lm(formula = `Miles/(US) gallon` ~ `Number of cylinders`, data = mt2)
### >
### > Coefficients:
### > (Intercept) `Number of cylinders`
### > 37.885 -2.876
lm(mpg ~ cyl, data = use_var_names(mt2)) # same result if name labels are "off"
### >
### > Call:
### > lm(formula = mpg ~ cyl, data = use_var_names(mt2))
### >
### > Coefficients:
### > (Intercept) cyl
### > 37.885 -2.876
While this works, freehand typing or copy-and-paste is clunky and
quickly becomes tedious. There are other less painful ways we can use
these NAME labels, once we’ve swapped them in for our original column
names using use_name_labs()
(as in the above example). For
instance, we can take advantage of commands that work over all columns
of a data.frame and, hence, don’t require us to type individual column
names. Here are a few illustrative examples.
sapply(mt2, median) # get the median for every name-labeled variable
### > Miles/(US) gallon
### > 19.200
### > Number of cylinders
### > 6.000
### > Displacement (cu.in.)
### > 196.300
### > Gross horsepower
### > 123.000
### > Rear axle ratio
### > 3.695
### > Weight (1000 lbs)
### > 3.325
### > 1/4 mile time
### > 17.710
### > Engine (0 = V-shaped, 1 = straight)
### > 0.000
### > Transmission (0 = automatic, 1 = manual)
### > 0.000
### > Number of forward gears
### > 4.000
### > Number of carburetors
### > 2.000
collapse::qsu(mt2) # use an external package for more informative descriptives
### > N Mean SD Min Max
### > Miles/(US) gallon 32 20.0906 6.0269 10.4 33.9
### > Number of cylinders 32 6.1875 1.7859 4 8
### > Displacement (cu.in.) 32 230.7219 123.9387 71.1 472
### > Gross horsepower 32 146.6875 68.5629 52 335
### > Rear axle ratio 32 3.5966 0.5347 2.76 4.93
### > Weight (1000 lbs) 32 3.2173 0.9785 1.513 5.424
### > 1/4 mile time 32 17.8487 1.7869 14.5 22.9
### > Engine (0 = V-shaped, 1 = straight) 32 0.4375 0.504 0 1
### > Transmission (0 = automatic, 1 = manual) 32 0.4063 0.499 0 1
### > Number of forward gears 32 3.6875 0.7378 3 5
### > Number of carburetors 32 2.8125 1.6152 1 8
Another approach is to use with_name_labs()
(or its more
compact alias wnl()
), which will automatically display name
labels in place of column names in fairly flexible ways.
with_name_labs()
is an alternative to
use_name_labs()
that you can call on the regular,
name-labeled data.frame. You should not call it on a
data.frame after swapping in name labels with
use_name_labs()
.
With that said, let’s revert back to our original column names, then
we’ll verify that the name labels are still there in the background,
then we’ll take with_name_labs()
for a
spin.
# invert our prior use_name_labs() call
mt2 <- use_var_names(mt2) # revert from name labels back to original colnames
head(mt2[c(1, 2)])
### > mpg cyl
### > Mazda RX4 21.0 6
### > Mazda RX4 Wag 21.0 6
### > Datsun 710 22.8 4
### > Hornet 4 Drive 21.4 6
### > Hornet Sportabout 18.7 8
### > Valiant 18.1 6
# first, show that mt2 now has original column names swapped back in
head(mt2)
### > mpg cyl disp hp drat wt qsec vs am gear carb
### > Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
### > Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
### > Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
### > Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
### > Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
### > Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# verify that the name labels are still present and available in the background
get_name_labs(mt2)
### > var lab
### > 1 mpg Miles/(US) gallon
### > 2 cyl Number of cylinders
### > 3 disp Displacement (cu.in.)
### > 4 hp Gross horsepower
### > 5 drat Rear axle ratio
### > 6 wt Weight (1000 lbs)
### > 7 qsec 1/4 mile time
### > 8 vs Engine (0 = V-shaped, 1 = straight)
### > 9 am Transmission (0 = automatic, 1 = manual)
### > 10 gear Number of forward gears
### > 11 carb Number of carburetors
Note that this sort of switching back and forth between your original
column names and name labels (i.e., use_name_labs()
and
use_var_names()
) assumes you are not
otherwise modifying either set of names in the interim.
Now, pay attention to the variable names in the console output of the
following calls to with_name_labs()
.You’ll be using the
familiar column names in your function call expressions, but their
corresponding name labels will appear in the console output.
# demo with_name_labs() (note that with_name_labs() will achieve same result)
with_name_labs(mt2, t.test(mpg ~ am)) # wnl() is alias for with_name_labs()
### >
### > Welch Two Sample t-test
### >
### > data: Miles/(US) gallon by Transmission (0 = automatic, 1 = manual)
### > t = -3.7671, df = 18.332, p-value = 0.001374
### > alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
### > 95 percent confidence interval:
### > -11.280194 -3.209684
### > sample estimates:
### > mean in group 0 mean in group 1
### > 17.14737 24.39231
with_name_labs(mt2, lm(mpg ~ am))
### >
### > Call:
### > lm(formula = `Miles/(US) gallon` ~ `Transmission (0 = automatic, 1 = manual)`)
### >
### > Coefficients:
### > (Intercept)
### > 17.147
### > `Transmission (0 = automatic, 1 = manual)`
### > 7.245
wnl(mt2, summary(mt2)) # wnl() is alias for with_name_labs()
### > Miles/(US) gallon Number of cylinders Displacement (cu.in.) Gross horsepower
### > Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
### > 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
### > Median :19.20 Median :6.000 Median :196.3 Median :123.0
### > Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
### > 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
### > Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
### > Rear axle ratio Weight (1000 lbs) 1/4 mile time
### > Min. :2.760 Min. :1.513 Min. :14.50
### > 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89
### > Median :3.695 Median :3.325 Median :17.71
### > Mean :3.597 Mean :3.217 Mean :17.85
### > 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90
### > Max. :4.930 Max. :5.424 Max. :22.90
### > Engine (0 = V-shaped, 1 = straight) Transmission (0 = automatic, 1 = manual)
### > Min. :0.0000 Min. :0.0000
### > 1st Qu.:0.0000 1st Qu.:0.0000
### > Median :0.0000 Median :0.0000
### > Mean :0.4375 Mean :0.4062
### > 3rd Qu.:1.0000 3rd Qu.:1.0000
### > Max. :1.0000 Max. :1.0000
### > Number of forward gears Number of carburetors
### > Min. :3.000 Min. :1.000
### > 1st Qu.:3.000 1st Qu.:2.000
### > Median :4.000 Median :2.000
### > Mean :3.688 Mean :2.812
### > 3rd Qu.:4.000 3rd Qu.:4.000
### > Max. :5.000 Max. :8.000
wnl(mt2, xtabs(~gear)) # wnl() is alias for with_name_labs()
### > Number of forward gears
### > 3 4 5
### > 15 12 5
with(mt2, xtabs(~gear)) # compare this base::with() call to wnl() call above
### > gear
### > 3 4 5
### > 15 12 5
Keep in mind that with_name_labs()
is intended for
self-contained calls involving exploratory analysis activities – things
like simple plots, descriptives, and models. The underlying function is
based on simple regular expressions and will throw an
error if you attempt to use it in contexts involving (1) exotic
or non-standard operators, (2) multi-step workflows (e.g., pipes), OR
(3) data management and cleaning commands. Still, as shown above, it
plays well with a range of “workhorse” exploratory and descriptive
commands.
This concludes our whirlwind tour of labelr functionalities. You’ve graduated.
Well, almost. Before you go, here is a list of aliases for common functions. Other than its name, each alias function is identical to (i.e., performs the same operations, returning the same result as) the parent function that it aliases. More concise and more cryptic, these alias functions will save you some typing at the console (and some characters in your scripts).
The available aliases are as follows:
add_val_labs
alias is avl
get_val_labs
alias is gvl
drop_val_labs
alias is dvl
add_val1
alias is avl1
drop_val1
alias is dvl1
add_quant_labs
alias is aql
all_quant_labs
alias is allq
add_quant1
alias is aq1
add_m1_lab
alias is am1l
use_val_labs
alias is uvl
use_val_lab1
alias is uvl1
with_val_labs
alias is wvl
add_lab_cols
alias is alc
add_lab_col1
alias is alc1
add_lab_dummies
is ald
add_lab_dumm1
is ald1
lab_int_to_factor
is int2f
factor_to_lab_int
is f2int
add_name_labs
is anl
get_name_labs
alias is gnl
drop_name_labs
alias is dnl
use_name_labs
alias is unl
use_var_names
alias is uvn
with_name_labs
alias is wnl
with_both_labs
alias is wbl
add_frame_lab
alias is afl
get_frame_lab
alias is gfl
drop_frame_lab
alias is dfl
axis_lab
is alb
as_labeled_data_frame
is aldf
as_base_data_frame
is adf