When parasites invade paired structures of their host non-randomly, the resulting asymmetry may have both pathological and ecological significance. ASPI has been developed to facilitate the detection and visualization of asymmetric infections. ASPI can detect both consistent bias towards one side, and inconsistent bias in which the left side is favored in some hosts and the right in others. In this vignette ASPI is demonstrated on both observed and simulated parasite distributions. The first step is to load the aspi library:
library(aspi)
Replicated G-tests of goodness-of-fit (Sokal and Rohlf 1995) provide a comprehensive analysis of deviations from bilateral symmetry. This procedure computes four different G-statistics:
G-tests involve logarithmic transformation and so cannot be applied to counts of zero. Moreover, G-tests are not recommended when counts are below 10. For this situation, binomial exact tests are provided as an alternative, but are limited to testing two null hypotheses:
Diplostomum spp. (Trematoda) are common parasites of freshwater fishes. Here we use two datasets describing the distribution of Diplostomum spp. metacercariae in the eyes of fifty ruffe, Gymnocephalus cernuus.
The first dataset comprises the numbers of metacercariae found in each eye, excluding the lens (i.e. the choroid, retina and humors). The dataset is formatted as a data.frame with two columns; the first column is the number of parasites found in the left eye and the 2nd column is the number of parasites found in the right eye. Here are the data for the first ten hosts:
head(diplostomum_eyes_excl_lenses, 10)
## left right
## 1 86 99
## 2 131 199
## 3 195 133
## 4 128 167
## 5 192 211
## 6 212 278
## 7 451 474
## 8 323 312
## 9 394 305
## 10 222 270
The row.names are host IDs. To detect bilateral asymmetry we apply replicated G-tests of goodness-of-fit:
results <- g.test(diplostomum_eyes_excl_lenses)
First we inspect the pooled, heterogeneity and total G-statistics:
results$summary
## Test df G p
## 1 Pooled 1 8.189282 4.213860e-03
## 2 Heterogeneity 49 437.827514 7.448961e-64
## 3 Total 50 446.016796 5.819378e-65
The total G-test statistic is highly significant and so the null hypothesis of overall symmetry can be rejected. The pooled G-test (p-value = 0.004) suggests that the total number of parasites from each side differs from symmetry. Let’s calculate the total number of parasites found on each side:
#total number of parasites on left side
sum(diplostomum_eyes_excl_lenses$left)
## [1] 15026
#total number of parasites on right side
sum(diplostomum_eyes_excl_lenses$right)
## [1] 14534
#ratio
sum(diplostomum_eyes_excl_lenses$left) / sum(diplostomum_eyes_excl_lenses$right)
## [1] 1.033852
However, this slight left bias in the pooled data, doesn’t necessarily mean that we would expect to find a left bias in most hosts. The p-value for the heterogeneity G-test is highly significant, revealing that the proportions of metacercariae in the left and right eyes varies from host to host. The individual G-test identifies which of the hosts have asymmetrical parasite distributions. Here are the results for the first ten hosts:
head(results$hosts, 10)
## Host Left Right G p BH Holm
## 1 1 86 99 0.9142668 0.3389848373 0.452402486 1.000000000
## 2 2 131 199 14.1130061 0.0001721493 0.001434578 0.007746721
## 3 3 195 133 11.7903193 0.0005953952 0.003307751 0.025006599
## 4 4 128 167 5.1710572 0.0229662351 0.055002553 0.711953287
## 5 5 192 211 0.8961138 0.3438258892 0.452402486 1.000000000
## 6 6 212 278 8.9168733 0.0028254783 0.008310230 0.096066262
## 7 7 451 474 0.5719508 0.4494847458 0.548152129 1.000000000
## 8 8 323 312 0.1905607 0.6624503507 0.690052449 1.000000000
## 9 9 394 305 11.3627210 0.0007493296 0.003746648 0.030722515
## 10 10 222 270 4.6903841 0.0303318672 0.068936062 0.879624148
The seven columns of the above data.frame are:
The raw p-value has to be adjusted for multiplicity, because an individual G-test is applied to each host. If the null hypothesis of symmetry is true for all 50 hosts, at a significance level of 5% the probability of getting at least one Type I error (false rejection of the null hypothesis) is \(1-(1-0.05)^{50} = 0.923\); the expected number of significant results obtained purely by chance would be \(50 \times 0.05 = 2.5\). Holm’s procedure controls the familywise error rate (FWER), i.e. the probability of a single Type I error. However, guarding against a single Type I error will be unnecessarily conservative for many studies, especially those surveying a large number of hosts. If a small number of false positives among the set of rejected null hypotheses can be tolerated, then it will be preferable to control the false discovery rate (FDR) using the procedure of Benjamini and Hochberg (1995).
At the conventional significance level of 5%, Holm’s procedure finds 11 of the 50 individual G-tests to be significant:
sum(results$hosts$Holm<0.05)
## [1] 11
Benjamini and Hochberg’s method provides more power to detect asymmetric infections, identifying a total of 19, although one (5% of 19) of these is likely to be a false positive:
sum(results$hosts$BH<0.05)
## [1] 19
These asymmetric infections identified with the aid of Benjamini and Hochberg’s procedure can be classified according to whether they show left or right bias:
# Number of asymmetric infections showing left bias
sum(results$hosts$BH<0.05 & results$hosts$Left>results$hosts$Right)
## [1] 11
# Number of asymmetric infections showing right bias
sum(results$hosts$BH<0.05 & results$hosts$Right>results$hosts$Left)
## [1] 8
We conclude that Diplostomum spp. infections of the eyes (excluding the lenses) show bilateral asymmetry in a large proportion of the sampled ruffe. Moreoever, the bias is inconsistent, with parasites favoring the left eye in some hosts and the right in others.
For visualization the proportion of parasites in each eye can be expressed as a binary log transformed ratio. These ratios can be plotted as a histogram:
plotHistogram(diplostomum_eyes_excl_lenses, nBreaks=20, main="Histogram")
Alternatively, these ratios can be combined with corresponding p-values from individual G-tests in a volcano plot:
plotVolcano(diplostomum_eyes_excl_lenses, test="G", pAdj="BH", sigThresh=0.1,
main="Volcano Plot")
The dashed horizontal line in the volcano plot represents the chosen p-value threshold. Parasite distributions deviating significantly from symmetry are shown as red squares, whereas those not differing significantly from a 1:1 ratio are represented by blue circles.
The second dataset comprises observations on the distribution of Diplostomum spp. metacercariae in the lenses of the eyes of the ruffe. Here are the data for the first ten hosts:
head(diplostomum_lenses, 10)
## left right
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
## 7 0 0
## 8 0 0
## 9 6 5
## 10 0 2
Note that the first eight hosts are not infected. The total number of ruffe with Diplostomum spp. infection of the lens is:
sum(diplostomum_lenses$left>0 | diplostomum_lenses$right>0)
## [1] 31
G-tests involve logarithmic transformation and so cannot be applied to counts of zero, as found in this dataset. Furthermore, G-tests are not recommended when counts are below ten. The g.test function will ignore uninfected hosts in this dataset, but will raise an error, because some hosts only have an infection in one eye. In this situation, binomial exact test should be used instead:
results <- eb.test(diplostomum_lenses)
# pooled test p-value
results$pooled
## [1] 0.8439127
# results for the first ten infected hosts
head(results$hosts, 10)
## Host Left Right p BH Holm
## 9 9 6 5 1.0000000 1.0000000 1.000000
## 10 10 0 2 0.5000000 1.0000000 1.000000
## 28 28 0 3 0.2500000 0.9687500 1.000000
## 46 46 11 8 0.6476059 1.0000000 1.000000
## 15 15 0 1 1.0000000 1.0000000 1.000000
## 16 16 0 7 0.0156250 0.4843750 0.484375
## 17 17 3 2 1.0000000 1.0000000 1.000000
## 18 18 0 5 0.0625000 0.5535714 1.000000
## 19 19 4 7 0.5488281 1.0000000 1.000000
## 20 20 8 2 0.1093750 0.5535714 1.000000
# smallest FDR adjusted p-value for an individual host
min(results$hosts$BH)
## [1] 0.484375
No evidence of asymmetry in lens infections was found by either the pooled test or in the tests of the individual host infections. Note that the eb.test function restricted analysis to the 31 infected hosts:
length(results$hosts[,1])
## [1] 31
Here are further examples of the results returned from replicated G-tests of goodness-of-fit under different scenarios. Simulated datasets have been generated for parasitic infections showing:
g.test(simulated_symmetrical_infection)
## $summary
## Test df G p
## 1 Pooled 1 0.1463309 0.7020666
## 2 Heterogeneity 9 8.7916316 0.4567267
## 3 Total 10 8.9379625 0.5380011
##
## $hosts
## Host Left Right G p BH Holm
## 1 1 86 89 0.051431091 0.8205915 0.9117684 1
## 2 2 96 112 1.231985889 0.2670212 0.5699383 1
## 3 3 92 88 0.088896206 0.7655851 0.9117684 1
## 4 4 112 104 0.296364074 0.5861708 0.8373869 1
## 5 5 90 109 1.816836608 0.1776903 0.5699383 1
## 6 6 117 96 2.073789831 0.1498488 0.5699383 1
## 7 7 98 97 0.005128228 0.9429110 0.9429110 1
## 8 8 106 91 1.143238156 0.2849691 0.5699383 1
## 9 9 98 108 0.485627728 0.4858842 0.8098070 1
## 10 10 84 102 1.744664663 0.1865489 0.5699383 1
In this simulated dataset there are similar numbers of parasites on each side. Pooled, heterogeneity and total G-test statistics are not significant at \(\alpha = 0.05\). Individual G-tests show that parasite distributions do not differ from symmetry in any of the ten hosts.
g.test(simulated_left_bias_heterogeneous_proportions)
## $summary
## Test df G p
## 1 Pooled 1 382.12807 4.282032e-85
## 2 Heterogeneity 9 33.71135 1.003471e-04
## 3 Total 10 415.83942 3.993780e-83
##
## $hosts
## Host Left Right G p BH Holm
## 1 1 165 53 60.38527 7.799434e-15 3.899717e-14 7.019491e-14
## 2 2 235 126 33.43061 7.385238e-09 1.055034e-08 2.954095e-08
## 3 3 203 128 17.14244 3.467862e-05 3.467862e-05 3.467862e-05
## 4 4 219 101 44.55652 2.471191e-11 6.177978e-11 1.729834e-10
## 5 5 246 144 26.98969 2.045435e-07 2.556794e-07 6.136306e-07
## 6 6 233 110 45.10548 1.867023e-11 6.177978e-11 1.493619e-10
## 7 7 189 52 82.73454 9.384707e-20 9.384707e-19 9.384707e-19
## 8 8 193 84 44.07347 3.162780e-11 6.325559e-11 1.897668e-10
## 9 9 170 97 20.21521 6.920056e-06 7.688951e-06 1.384011e-05
## 10 10 199 91 41.20618 1.369887e-10 2.283146e-10 6.849437e-10
In this example there are more parasites on the left than the right in every host. Furthermore, the proportion of parasites on the left and right sides varies betweeen hosts. For example, in host 1 there are three times as many parasites on the left than on the right, whereas in host 3 the ratio is approximately 1.6:1.
Pooled, heterogeneity and total G-test statistics are all highly significant (p<0.001).
Individual G-tests demonstrate a highly significant (FDR corrected p-value < 0.00001) difference between the numbers of parasites found on the left and right sides in all 10 hosts.
g.test(simulated_left_bias_homogeneous_proportions)
## $summary
## Test df G p
## 1 Pooled 1 234.461658 6.344087e-53
## 2 Heterogeneity 9 4.050156 9.080804e-01
## 3 Total 10 238.511814 1.406742e-45
##
## $hosts
## Host Left Right G p BH Holm
## 1 1 183 84 37.59896 8.689057e-10 8.689057e-09 8.689057e-09
## 2 2 175 86 30.96600 2.625882e-08 8.752939e-08 2.100705e-07
## 3 3 133 68 21.40251 3.722836e-06 5.318337e-06 1.553538e-05
## 4 4 83 36 19.07854 1.254475e-05 1.568093e-05 3.763424e-05
## 5 5 100 42 24.39720 7.838253e-07 1.567651e-06 4.773409e-06
## 6 6 69 34 12.13336 4.952778e-04 4.952778e-04 4.952778e-04
## 7 7 111 49 24.66557 6.819156e-07 1.567651e-06 4.773409e-06
## 8 8 163 77 31.51256 1.981540e-08 8.752939e-08 1.783386e-07
## 9 9 138 81 15.00783 1.070662e-04 1.189624e-04 2.141323e-04
## 10 10 157 85 21.74927 3.107077e-06 5.178461e-06 1.553538e-05
Similar to the previous data-set, this example also shows a left-bias. However, in this example the ratio of the number of parasites on the left side to the number on the right is aproximately 2:1 in all hosts.
The pooled and total G-test statistics are highly significant. However, the heterogeneity G-test statistic is not significant at \(\alpha = 0.05\).
All individual G-tests are significant (FDR corrected p-value < 0.001), demonstrating that the left bias occurs in all 10 hosts.
g.test(simulated_asymmetry_inconsistent_bias)
## $summary
## Test df G p
## 1 Pooled 1 0.0784249 7.794434e-01
## 2 Heterogeneity 9 104.2800180 2.137462e-18
## 3 Total 10 104.3584429 7.287066e-18
##
## $hosts
## Host Left Right G p BH Holm
## 1 1 105 117 0.64896489 4.204830e-01 5.256038e-01 1.000000e+00
## 2 2 200 142 9.88395833 1.667259e-03 5.557530e-03 1.333807e-02
## 3 3 74 195 56.42992255 5.823735e-14 5.823735e-13 5.823735e-13
## 4 4 172 182 0.28252346 5.950520e-01 6.611689e-01 1.000000e+00
## 5 5 88 84 0.09303164 7.603579e-01 7.603579e-01 1.000000e+00
## 6 6 194 140 8.76897735 3.063972e-03 7.659930e-03 2.144780e-02
## 7 7 199 166 2.98763967 8.390238e-02 1.398373e-01 4.195119e-01
## 8 8 75 106 5.33565903 2.089344e-02 4.178688e-02 1.253606e-01
## 9 9 157 187 2.61960554 1.055507e-01 1.507867e-01 4.222027e-01
## 10 10 178 108 17.30816045 3.178191e-05 1.589095e-04 2.860372e-04
In this example some hosts have many more parasites on the left than right, whereas others have more on the right than left.
The hetereogeneity and total G-test statistics are both highly significant, but the pooled G-test is not significant at \(\alpha = 0.05\).
If we choose an FDR adjusted p-value of 0.05 as our significance threshold, we find five of ten hosts have asymmetric distributions of parasites. Of these five, three show a left bias and two a right bias. This is an example of inconsistent bias.
Benjamini, Y., and Y. Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society B 57: 289–300.
Holm, S. 1979. “A Simple Sequentially Rejective Multiple Test Procedure.” Scandinavian Journal of Statistics 6: 65–70.
Sokal, R.R., and F.J. Rohlf. 1995. Biometry. 3rd ed. New York: W.H. Freeman; Company.