Santoku 0.9.0 has a few changes.
On the command line, sometimes you’d like to quickly add labels to
your breaks. Now, you can do this simply by adding names to the
breaks
vector:
library(santoku)
chop(1:5, c(1,3,5))
#> [1] [1, 3) [1, 3) [3, 5] [3, 5] [3, 5]
#> Levels: [1, 3) [3, 5]
chop(1:5, c(Low = 1, High = 3, 5))
#> [1] Low Low High High High
#> Levels: Low High
Break names override the labels
argument, but you can
still use this for unnamed breaks:
ages <- sample(12:80, 20)
tab(ages,
c("Under 16" = 0, 16, 25, 35, 45, 55, "65 and over" = 65),
labels = lbl_discrete()
)
#> Under 16 16—24 25—34 35—44 45—54 55—64
#> 1 1 2 3 3 4
#> 65 and over
#> 6
Names can also be used for labels in chop_quantiles()
and chop_proportions()
:
x <- rnorm(10)
chopped <- chop_quantiles(x,
c("Lower tail" = 0, 0.025, "Upper tail" = 0.975)
)
data.frame(x, chopped)
#> x chopped
#> 1 -1.3889 [2.5%, 97.5%)
#> 2 -0.2788 [2.5%, 97.5%)
#> 3 -0.1333 [2.5%, 97.5%)
#> 4 0.6360 [2.5%, 97.5%)
#> 5 -0.2843 [2.5%, 97.5%)
#> 6 -2.6565 Lower tail
#> 7 -2.4405 [2.5%, 97.5%)
#> 8 1.3201 Upper tail
#> 9 -0.3066 [2.5%, 97.5%)
#> 10 -1.7813 [2.5%, 97.5%)
This feature is experimental for now.
close_end
works differentlyThe close_end
parameter is used to right-close the last
break. This used to be applied before breaks were extended to cover
items beyond the explicitly given breaks. We think this was confusing
for users. So now, close_end
is applied only after the
breaks have been extended - i.e. to the very last break.
In 0.8.0:
Notice how the central break [2, 3]
is right-closed.
(The extended break [3, 4]
is right-closed too, because
extended breaks are always closed at the “outer” end.)
In 0.9.0:
Now, close_end
is applied to the final, extended break
[3, 4]
, not to the explicit break [2, 3)
.
close_end
is TRUE
by defaultWe think that for exploratory work, users typically want to include
all the data between the lowest and highest break, inclusive. So,
close_end
is now TRUE
by default.
In 0.8.0:
In 0.9.0:
raw
parameter for chop()
lbl_*
functions have a raw
parameter to use
the raw interval endpoints in labels, rather than e.g. percentiles or
standard deviations. We’ve moved this into the main chop()
function. This makes it easier to use:
chop_mean_sd(x)
#> [1] [-1 sd, 0 sd) [0 sd, 1 sd) [0 sd, 1 sd) [1 sd, 2 sd) [0 sd, 1 sd)
#> [6] [-2 sd, -1 sd) [-2 sd, -1 sd) [1 sd, 2 sd) [0 sd, 1 sd) [-1 sd, 0 sd)
#> Levels: [-2 sd, -1 sd) [-1 sd, 0 sd) [0 sd, 1 sd) [1 sd, 2 sd)
chop_mean_sd(x, raw = TRUE)
#> [1] [-2.03, -0.7314) [-0.7314, 0.5674) [-0.7314, 0.5674) [0.5674, 1.866)
#> [5] [-0.7314, 0.5674) [-3.329, -2.03) [-3.329, -2.03) [0.5674, 1.866)
#> [9] [-0.7314, 0.5674) [-2.03, -0.7314)
#> 4 Levels: [-3.329, -2.03) [-2.03, -0.7314) ... [0.5674, 1.866)
The raw
parameter to lbl_*
functions is
deprecated.
The NEWS file lists other changes, including a new
chop_fn()
function which creates breaks using any arbitrary
function.
We expect this to be the last release before 1.0, when we’ll stabilize the interface and move santoku from “experimental” to “stable”. So, if you have problems or suggestions regarding any of these changes, please file an issue.