Significance testing is one of the more lively areas of statistics. In general, the idea is not to make too many mistakes in our conclusions. If this only applied to Type I errors, we could all relax, apply the most conservative tests of significance possible and restrict ourselves to the study of the glaringly obvious. R attends to the problem of significance testing in some ways, but sensibly avoids prescribing methods which may not be appropriate for particular analyses.
p.adjust()
,
but it's a bit awkward to integrate with functions like anova()
that may produce a table with a number of probabilities.
Using the infert
data set, we'll apply the Bonferroni correction to
multiple tests of the prevalence of induced labor within groups defined by
educational attainment in the infert
data set. First let's go
through the function
group.prop.test()
that I found useful for repetitive testing of groups of Bernoulli trial
(success/failure) data where the outcome of interest was which groups differed
from the overall proportion, that is, which groups were better or worse than the
average level of success by a fairly conservative test.
The usual checks of the input data are performed, then the overall proportion is
calculated and the result list is set up, filled with blanks and zeros.
For each group defined by the grouping vector by
, a test of
proportions is conducted, and the adjusted probability stored in the appropriate
element of gptest
. Notice that the formatting of the group names
was performed after the calculation. Otherwise the comparison performed by
subset
would have failed. After the calculation, the results are
printed out and the list of results is returned invisibly. By playing around
with this data, you may discover that a simple test of the contingency table
indicates that the groups do not come from the same population, but in fact
none differ from the average prevalence of induced labor, at least by this test.
When I originally wrote the function, it simply printed out the critical (corrected) p-value at the top of the table, and all of the observed values were compared with that.