
spicy brings polished, console-first data exploration to R for everyday analysis workflows.
spicy is an R package for the first phase of data analysis: inspecting variables, checking distributions, exploring associations, and moving quickly toward usable outputs. It helps you:
varlist() (and its
shortcut vl()), including names, labels, representative
values, classes, number of distinct non-missing values, number of valid
observations, and number of missing observations. Similar to the
“Variable View” in SPSS or the “Variables Manager” in Stata.freq(), and inspect
associations with cross_tab() and cramer_v(),
all with readable console output.mean_n(),
sum_n(), and count_n() with automatic handling
of missing data.table_apa() and export to multiple formats
(wide, long, tinytable,
flextable, excel, clipboard,
word).code_book(),
extending varlist() with searchable summaries and built-in
export options (Copy, CSV, Excel, PDF, Print) via
DT::datatable.copy_clipboard() for fast export to spreadsheets or text
editors.label_from_names(), especially useful for LimeSurvey CSV
exports where headers follow a “name [separator] label” pattern – any
string can be used as the separator (e.g., “.”, ” - “,”:“, etc.).labelled, factor,
Date, POSIXct, and other commonly used
variable types.The goal is simple: make early data exploration faster, clearer, and more pleasant in everyday analysis and data science workflows.
For the stable version, install from CRAN.
install.packages("spicy")You can install the development version of spicy from GitHub with:
# install.packages("pak")
pak::pak("amaltawfik/spicy")Here are some quick examples using built-in datasets:
library(spicy)
library(dplyr)
# Get a summary of all variables in the Viewer
varlist(iris)
# Get a summary of the variables that start with "d" in the Viewer
# Asterisks (*) in the title indicate that the data frame has been subsetted
vl(mtcars, starts_with("d"))
# Get a summary of all variables as a tibble
varlist(iris, tbl = TRUE)
#> # A tibble: 5 × 7
#> Variable Label Values Class N_distinct N_valid NAs
#> <chr> <chr> <chr> <chr> <int> <int> <int>
#> 1 Sepal.Length <NA> 4.3, 4.4, 4.5, ..., 7.9 nume… 35 150 0
#> 2 Sepal.Width <NA> 2, 2.2, 2.3, ..., 4.4 nume… 23 150 0
#> 3 Petal.Length <NA> 1, 1.1, 1.2, ..., 6.9 nume… 43 150 0
#> 4 Petal.Width <NA> 0.1, 0.2, 0.3, ..., 2.5 nume… 22 150 0
#> 5 Species <NA> setosa, versicolor, virgini… fact… 3 150 0
# Tabulate frequencies with sort alphabetically (Z-A)
freq(iris, Species, sort = "name-")
#> Frequency table: Species
#>
#> Category │ Values Freq. Percent
#> ──────────┼────────────────────────────
#> Valid │ virginica 50 33.3
#> │ versicolor 50 33.3
#> │ setosa 50 33.3
#> ──────────┼────────────────────────────
#> Total │ 150 100.0
#>
#> Class: factor
#> Data: iris
# Cross-tab with frequencies
cross_tab(mtcars, cyl, gear)
#> Crosstable: cyl x gear (N)
#>
#> Values │ 3 4 5 │ Total
#> ─────────────┼──────────────────────────┼────────────
#> 4 │ 1 8 2 │ 11
#> 6 │ 2 4 1 │ 7
#> 8 │ 12 0 2 │ 14
#> ─────────────┼──────────────────────────┼────────────
#> Total │ 15 12 5 │ 32
#>
#> Chi-2: 18.0 (df = 4), p = 0.001
#> Cramer's V: 0.53
#> Warning: 6 expected cells < 5 (66.7%). Minimum expected = 1.09. Consider `simulate_p = TRUE` or set globally via `options(spicy.simulate_p = TRUE)`.
# Cross-tab with column percentages
cross_tab(mtcars, cyl, gear, percent = "column")
#> Crosstable: cyl x gear (Column %)
#>
#> Values │ 3 4 5 │ Total
#> ─────────────┼────────────────────────────────────┼────────────
#> 4 │ 6.7 66.7 40.0 │ 34.4
#> 6 │ 13.3 33.3 20.0 │ 21.9
#> 8 │ 80.0 0.0 40.0 │ 43.8
#> ─────────────┼────────────────────────────────────┼────────────
#> Total │ 100.0 100.0 100.0 │ 100.0
#> N │ 15 12 5 │ 32
#>
#> Chi-2: 18.0 (df = 4), p = 0.001
#> Cramer's V: 0.53
#> Warning: 6 expected cells < 5 (66.7%). Minimum expected = 1.09. Consider `simulate_p = TRUE` or set globally via `options(spicy.simulate_p = TRUE)`.
# Cross-tab with row percentages
cross_tab(mtcars, cyl, gear, percent = "row")
#> Crosstable: cyl x gear (Row %)
#>
#> Values │ 3 4 5 │ Total N
#> ─────────────┼─────────────────────────────────┼─────────────────────
#> 4 │ 9.1 72.7 18.2 │ 100.0 11
#> 6 │ 28.6 57.1 14.3 │ 100.0 7
#> 8 │ 85.7 0.0 14.3 │ 100.0 14
#> ─────────────┼─────────────────────────────────┼─────────────────────
#> Total │ 46.9 37.5 15.6 │ 100.0 32
#>
#> Chi-2: 18.0 (df = 4), p = 0.001
#> Cramer's V: 0.53
#> Warning: 6 expected cells < 5 (66.7%). Minimum expected = 1.09. Consider `simulate_p = TRUE` or set globally via `options(spicy.simulate_p = TRUE)`.
# Cross-tab grouped by a single variable
cross_tab(mtcars, cyl, gear, by = am)
#> Crosstable: cyl x gear (N) | am = 0
#>
#> Values │ 3 4 5 │ Total
#> ─────────────┼─────────────────────────┼────────────
#> 4 │ 1 2 0 │ 3
#> 6 │ 2 2 0 │ 4
#> 8 │ 12 0 0 │ 12
#> ─────────────┼─────────────────────────┼────────────
#> Total │ 15 4 0 │ 19
#>
#> Chi-2: 9.0 (df = 2), p = 0.011
#> Cramer's V: 0.69
#> Warning: 5 expected cells < 5 (83.3%). 2 expected cells < 1. Minimum expected = 0.63. Consider `simulate_p = TRUE` or set globally via `options(spicy.simulate_p = TRUE)`.
#>
#> Crosstable: cyl x gear (N) | am = 1
#>
#> Values │ 3 4 5 │ Total
#> ─────────────┼────────────────────────┼────────────
#> 4 │ 0 6 2 │ 8
#> 6 │ 0 2 1 │ 3
#> 8 │ 0 0 2 │ 2
#> ─────────────┼────────────────────────┼────────────
#> Total │ 0 8 5 │ 13
#>
#> Chi-2: 3.8 (df = 2), p = 0.146
#> Cramer's V: 0.54
#> Warning: 6 expected cells < 5 (100%). 1 expected cell < 1. Minimum expected = 0.77. Consider `simulate_p = TRUE` or set globally via `options(spicy.simulate_p = TRUE)`.
# Cross-tab grouped by two variables
cross_tab(mtcars, cyl, gear, by = interaction(vs, am))
#> Crosstable: cyl x gear (N) | vs x am = 0.0
#>
#> Values │ 3 4 5 │ Total
#> ─────────────┼─────────────────────────┼────────────
#> 4 │ 0 0 0 │ 0
#> 6 │ 0 0 0 │ 0
#> 8 │ 12 0 0 │ 12
#> ─────────────┼─────────────────────────┼────────────
#> Total │ 12 0 0 │ 12
#>
#> Crosstable: cyl x gear (N) | vs x am = 1.0
#>
#> Values │ 3 4 5 │ Total
#> ─────────────┼────────────────────────┼────────────
#> 4 │ 1 2 0 │ 3
#> 6 │ 2 2 0 │ 4
#> 8 │ 0 0 0 │ 0
#> ─────────────┼────────────────────────┼────────────
#> Total │ 3 4 0 │ 7
#>
#> Chi-2: 0.2 (df = 1), p = 0.659
#> Cramer's V: 0.17
#> Warning: 4 expected cells < 5 (100%). Minimum expected = 1.29. Consider `simulate_p = TRUE` or set globally via `options(spicy.simulate_p = TRUE)`.
#>
#> Crosstable: cyl x gear (N) | vs x am = 0.1
#>
#> Values │ 3 4 5 │ Total
#> ─────────────┼────────────────────────┼────────────
#> 4 │ 0 0 1 │ 1
#> 6 │ 0 2 1 │ 3
#> 8 │ 0 0 2 │ 2
#> ─────────────┼────────────────────────┼────────────
#> Total │ 0 2 4 │ 6
#>
#> Chi-2: 3.0 (df = 2), p = 0.223
#> Cramer's V: 0.71
#> Warning: 6 expected cells < 5 (100%). 3 expected cells < 1. Minimum expected = 0.33. Consider `simulate_p = TRUE` or set globally via `options(spicy.simulate_p = TRUE)`.
#>
#> Crosstable: cyl x gear (N) | vs x am = 1.1
#>
#> Values │ 3 4 5 │ Total
#> ─────────────┼────────────────────────┼────────────
#> 4 │ 0 6 1 │ 7
#> 6 │ 0 0 0 │ 0
#> 8 │ 0 0 0 │ 0
#> ─────────────┼────────────────────────┼────────────
#> Total │ 0 6 1 │ 7
# Compute row-wise mean/sum (all values must be valid by default) or specific value
df <- data.frame(
var1 = c(10, NA, 30, 40, 50),
var2 = c(5, NA, 15, NA, 25),
var3 = c(NA, 30, 20, 50, 10)
)
df
#> var1 var2 var3
#> 1 10 5 NA
#> 2 NA NA 30
#> 3 30 15 20
#> 4 40 NA 50
#> 5 50 25 10
mean_n(df)
#> [1] NA NA 21.66667 NA 28.33333
sum_n(df)
#> [1] NA NA 65 NA 85
count_n(df, count = 10)
#> [1] 1 0 0 0 1
count_n(df, special = "NA")
#> [1] 1 2 0 1 0
df |> mutate(count30 = count_n(count = 30))
#> var1 var2 var3 count30
#> 1 10 5 NA 0
#> 2 NA NA 30 1
#> 3 30 15 20 1
#> 4 40 NA 50 0
#> 5 50 25 10 0
# Extract labels from column names like "varname. label"
# This format ("name. label") is the default in LimeSurvey CSV exports
# when using: Export results -> Export format: CSV -> Headings: Question code & question text.
# It uses ". " (dot + space) as the default separator between the question code and question text.
df <- tibble::tibble(
"age. Age of respondent" = c(25, 30),
"score. Total score. Manually computed." = c(12, 14)
)
out <- label_from_names(df)
# View assigned labels
labelled::var_label(out)
#> $age
#> [1] "Age of respondent"
#>
#> $score
#> [1] "Total score. Manually computed."Additional exported helpers:
# Association strength from a contingency table
cramer_v(table(mtcars$cyl, mtcars$gear))
# Interactive codebook (requires DT)
code_book(iris)
# APA-ready table builder
table_apa(
data = mtcars,
row_vars = c("vs", "am"),
group_var = "gear",
labels = c("Engine", "Transmission"),
output = "long",
style = "raw"
)
# Low-level ASCII builders used by print methods
tab <- cross_tab(mtcars, cyl, gear)
spicy_print_table(tab)
build_ascii_table(tab)
# Clipboard export helper (interactive session)
copy_clipboard(head(mtcars))All functions can be directly used in pipelines.
spicy?If you use spicy in a publication or teaching
material:
citation("spicy") for the exact BibTeX entry of
your installed version.spicy is: https://doi.org/10.5281/zenodo.15397865.You can also view the source citation file: https://github.com/amaltawfik/spicy/blob/main/inst/CITATION
This package is licensed under the MIT license. See LICENSE for details.