sgmean: Trimmed Mean Compatible with Statgraphics

Introduction

The sgmean package implements a trimmed mean method that replicates the behavior of Statgraphics software. This method differs fundamentally from R built-in mean(…, trim) in how it handles boundary values.

Origin of this Package

During statistical analysis using both R and Statgraphics, a systematic difference was observed between the trimmed mean results produced by each software. Despite an extensive review of the Statgraphics documentation, no explicit description of its trimmed mean algorithm was found.

Through systematic mathematical reverse engineering of Statgraphics output using trial and error with multiple datasets, the underlying proportional discount formula was identified and validated. This package makes that method transparent, reproducible, and available to the R community for the first time.

The Mathematical Difference

Given a sorted vector of n values and a trim fraction, both methods compute k = trim x n. The key difference is:

Practical Example

library(sgmean)
x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 200)
result_r  <- mean(x, trim = 0.05)
result_sg <- sgmean(x, trim = 0.05)
cat("R base  (trim=0.05):", result_r,  "
")
#> R base  (trim=0.05): 65
cat("sgmean  (trim=0.05):", result_sg, "
")
#> sgmean  (trim=0.05): 60.55556
cat("Difference:         ", abs(result_r - result_sg), "
")
#> Difference:          4.444444

When Do Both Methods Agree?

x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 200)
cat("trim=0.10 | R base:", mean(x, trim=0.10), "| sgmean:", sgmean(x, trim=0.10), "
")
#> trim=0.10 | R base: 55 | sgmean: 55
cat("trim=0.05 | R base:", mean(x, trim=0.05), "| sgmean:", sgmean(x, trim=0.05), "
")
#> trim=0.05 | R base: 65 | sgmean: 60.55556
cat("trim=0.15 | R base:", mean(x, trim=0.15), "| sgmean:", sgmean(x, trim=0.15), "
")
#> trim=0.15 | R base: 55 | sgmean: 47.85714

Conclusion

The sgmean method provides a continuous and mathematically consistent trimmed mean for any trim fraction between 0 and 0.99, replicating the behavior of Statgraphics software and avoiding the discontinuities introduced by integer truncation in R built-in mean().