magentabook 0.1.0
- First release. UK HM Treasury Magenta Book policy-evaluation
primitives.
- Provenance is explicit: bundled rubrics carry honest source metadata
distinguishing direct quotations from researcher synthesis. ICC
reference values use a
value_source flag
("table_quote" vs "central_estimate").
- DOIs added to every
@references block where available.
Framework functions (mb_evaluation_plan,
mb_questions, mb_counterfactual,
mb_theory_of_change, etc.) cite the Magenta Book (2020)
chapters they correspond to.
inst/CITATION extended with footer pointing to the
underlying primary sources for the methods implemented (Sherman 1997,
Cohen 1988, Hussey-Hughes 2007, Hemming 2015, Hedges & Hedberg 2007,
Drummond 2015, Cameron & Miller 2015, Stuart 2010).
- Cross-validated against canonical reference implementations:
pwr for two-sample power, sample size, MDE, and
proportion power (within ~3 percentage points;
test-pwr-equivalence.R).
sandwich for mb_did_2x2 cluster-robust SEs
(CR1 / HC1 to within 1e-6;
test-sandwich-equivalence.R).
swCRTdesign for mb_stepped_wedge
(closed-form Hemming approximation tracks the exact Hussey-Hughes
variance to within roughly 0.5x to 2x for typical UK designs;
test-swcrt-equivalence.R). For decision-grade sample-size
work prefer swCRTdesign::swPwr.
BCEA for mb_icer and mb_ceac
(floating-point agreement; test-bcea-equivalence.R).
cobalt for mb_balance_table SMD on
balanced samples (within 1e-8;
test-cobalt-equivalence.R).
mb_stepped_wedge formula argument removed
in favour of the single Hemming/Woertman closed-form approximation,
which is documented as approximate (vs the exact Hussey-Hughes variance
computed by swCRTdesign). The earlier
formula = "hussey_hughes" branch was researcher-derived and
not externally verifiable; it has been removed before the package leaves
disk.
mb_balance_table() added for pre-treatment balance
checks (mean, SD, standardised mean difference, Welch t / chi-squared p,
imbalance flag at user-controlled threshold).
mb_stepped_wedge(formula = c("hemming", "hussey_hughes")):
choose between the Woertman/Hemming closed-form correction (default) and
the Hussey-Hughes (2007) closed form. Both assume balanced design,
complete data, no time-by-treatment interaction; for non-standard
designs use swCRTdesign or clusterPower.
quiet = FALSE argument added to
mb_did_2x2(), mb_its(), and
mb_event_study(). The print method now appends a one-line
reminder that the estimator is canonical and points to specialist
packages (fixest, did, sandwich)
for staggered adoption, autocorrelation, or production work. Set
quiet = TRUE to suppress.
cluster argument added to
mb_event_study(), mirroring mb_did_2x2():
cluster-robust SEs via CR1 with the Stata-style finite-sample correction
(G/(G-1)) * (N-1)/(N-K).
mb_power() @details now states the
normal-approximation assumption explicitly and points to
pwr::pwr.t.test for small N (where the noncentral-t form
differs by ~1-2 percentage points).
- 35 exported functions across 10 families: theory of change,
evaluation planning, power and design, Maryland Scientific Methods
Scale, Magenta Book confidence rating, lightweight estimators
(difference-in-differences, interrupted time series, event study),
cost-effectiveness analysis (CEA, ICER, CEAC, INB, QALY, DALY), realist
/ theory-based scaffolding, reporting, lookups.
- Bundled rubric and reference tables in
inst/extdata/
covering the five-level Maryland SMS rubric, the three-level magentabook
confidence rubric (synthesised across What Works Centre traditions),
reference intra-class correlation values across UK policy domains
(education, health, employment, local government, criminal justice,
housing) tagged with a value_source flag distinguishing
direct table quotations from researcher synthesis, and the canonical
Magenta Book evaluation question taxonomy. Vintage and provenance
metadata accessible via mb_data_versions().
- Provenance is explicit: see the README “Bundled rubrics: provenance”
section for what is verbatim from primary sources and what is
magentabook synthesis.
- Cross-validated against canonical reference implementations (when
installed): power and sample size vs
pwr, cluster-robust
SEs vs sandwich. See
tests/testthat/test-pwr-equivalence.R and
tests/testthat/test-sandwich-equivalence.R.
- Pure computation: no network calls, no API keys.
- Designed as the evaluation companion to the appraisal package
greenbook. See the vignette Cost-effectiveness with
magentabook and greenbook for an end-to-end worked example.