This package includes one function wmwm.test()
, which
performs the two-sample hypothesis test method proposed in (Zeng et al.,
2024) for univariate data when data are not fully observed. This method
is a theoretical extension of Wilcoxon-Mann-Whitney test in the presence
of missing data, which controls the Type I error regardless of values of
missing data.
Bounds of the Wilcoxon-Mann-Whitne test statistic and its p-value will be computed in the presence of missing data. The p-value of the test method proposed in (Zeng et al., 2024) is then returned as the maximum possible p-value of the Wilcoxon-Mann-Whitney test.
You can install the development version of wmwm from GitHub with:
# install.packages("devtools")
::install_github("Yijin-Zeng/Wilcoxon-Mann-Whitney-Test-with-Missing-data") devtools
This is a basic example which shows you how to perform the test with missing data:
library(wmwm)
#### Assume all samples are distinct.
<- c(6.2, 3.5, NA, 7.6, 9.2)
X <- c(0.2, 1.3, -0.5, -1.7)
Y ## By default, when the sample sizes of both X and Y are smaller than 50,
## exact distribution will be used.
wmwm.test(X, Y, ties = FALSE, alternative = 'two.sided')
#> $p.value
#> [1] 0.1904762
#>
#> $bounds.statistic
#> [1] 16 20
#>
#> $bounds.pvalue
#> [1] 0.01587302 0.19047619
#>
#> $alternative
#> [1] "two.sided"
#>
#> $ties.method
#> [1] FALSE
#>
#> $description.bounds
#> [1] "bounds.pvalue is the bounds of the exact p-value"
#>
#> $data.name
#> [1] "X and Y"
## using normality approximation with continuity correction:
wmwm.test(X, Y, ties = FALSE, alternative = 'two.sided', exact = FALSE, correct = TRUE)
#> $p.value
#> [1] 0.1779096
#>
#> $bounds.statistic
#> [1] 16 20
#>
#> $bounds.pvalue
#> [1] 0.01996445 0.17790959
#>
#> $alternative
#> [1] "two.sided"
#>
#> $ties.method
#> [1] FALSE
#>
#> $description.bounds
#> [1] "bounds.pvalue is the bounds of the p-value obtained using normal approximation with continuity correction"
#>
#> $data.name
#> [1] "X and Y"
#### Assume samples can be tied.
<- c(6, 9, NA, 7, 9)
X <- c(0, 1, 0, -1)
Y ## When the samples can be tied, normality approximation will be used.
## By default, lower.boundary = -Inf, upper.boundary = Inf.
wmwm.test(X, Y, ties = TRUE, alternative = 'two.sided')
#> Warning in boundsPValueWithTies(X, Y, alternative = alternative, lower.boundary
#> = lower.boundary, : cannot bound exact p-value with ties
#> $p.value
#> [1] 0.174277
#>
#> $bounds.statistic
#> [1] 16 20
#>
#> $bounds.pvalue
#> [1] 0.01745104 0.17427702
#>
#> $alternative
#> [1] "two.sided"
#>
#> $ties.method
#> [1] TRUE
#>
#> $description.bounds
#> [1] "bounds.pvalue is the bounds of the p-value obtained using normal approximation with continuity correction"
#>
#> $data.name
#> [1] "X and Y"
## specifying lower.boundary and upper.boundary:
wmwm.test(X, Y, ties = TRUE, alternative = 'two.sided', lower.boundary = -1, upper.boundary = 9)
#> Warning in boundsPValueWithTies(X, Y, alternative = alternative, lower.boundary
#> = lower.boundary, : cannot bound exact p-value with ties
#> $p.value
#> [1] 0.1383146
#>
#> $bounds.statistic
#> [1] 16.5 20.0
#>
#> $bounds.pvalue
#> [1] 0.01745104 0.13831461
#>
#> $alternative
#> [1] "two.sided"
#>
#> $ties.method
#> [1] TRUE
#>
#> $description.bounds
#> [1] "bounds.pvalue is the bounds of the p-value obtained using normal approximation with continuity correction"
#>
#> $data.name
#> [1] "X and Y"
The R function stats::wilcox.test()
executes
Wilcoxon-Mann-Whitney two-sample when all samples are observed.
Zeng Y, Adams NM, Bodenham DA. On two-sample testing for data with arbitrarily missing values. arXiv preprint arXiv:2403.15327. 2024 Mar 22.
Mann, Henry B., and Donald R. Whitney. “On a test of whether one of two random variables is stochastically larger than the other.” The annals of mathematical statistics (1947): 50-60.