ASPI - Analysis of Symmetry of Parasitic Infections

Introduction

When parasites invade paired structures of their host non-randomly, the resulting asymmetry may have both pathological and ecological significance. ASPI has been developed to facilitate the detection and visualization of asymmetric infections. ASPI can detect both consistent bias towards one side, and inconsistent bias in which the left side is favored in some hosts and the right in others. In this vignette ASPI is demonstrated on both observed and simulated parasite distributions. The first step is to load the aspi library:

library(aspi)

Detection of deviation from symmetry

Replicated G-tests of goodness-of-fit

Replicated G-tests of goodness-of-fit (Sokal and Rohlf 1995) provide a comprehensive analysis of deviations from bilateral symmetry. This procedure computes four different G-statistics:

Total G - tests null hypothesis that overall the parasite distributions in all individual hosts do not depart from symmetry.
Pooled G - evaluates the null hypothesis that the ratio of the total number of parasites from each side does not differ from symmetry.
Heterogeneity G - examines the null hypothesis that the left:right ratios are the same in all individual hosts.
Individual G - used to identify which individual hosts have asymmetrical parasite distributions.

Binomial exact test

G-tests involve logarithmic transformation and so cannot be applied to counts of zero. Moreover, G-tests are not recommended when counts are below 10. For this situation, binomial exact tests are provided as an alternative, but are limited to testing two null hypotheses:

the ratio of the total number of parasites on each side doesn’t differ from 1:1 (equivalent to the pooled G-test);
the distribution of parasites in an individual host is symmetrical.

Trematode infections in fish eyes

Diplostomum spp. (Trematoda) are common parasites of freshwater fishes. Here we use two datasets describing the distribution of Diplostomum spp. metacercariae in the eyes of fifty ruffe, Gymnocephalus cernuus.

Infections of the eye excluding lens

The first dataset comprises the numbers of metacercariae found in each eye, excluding the lens (i.e. the choroid, retina and humors). The dataset is formatted as a data.frame with two columns; the first column is the number of parasites found in the left eye and the 2nd column is the number of parasites found in the right eye. Here are the data for the first ten hosts:

head(diplostomum_eyes_excl_lenses, 10)

##    left right
## 1    86    99
## 2   131   199
## 3   195   133
## 4   128   167
## 5   192   211
## 6   212   278
## 7   451   474
## 8   323   312
## 9   394   305
## 10  222   270

The row.names are host IDs. To detect bilateral asymmetry we apply replicated G-tests of goodness-of-fit:

results <- g.test(diplostomum_eyes_excl_lenses)

First we inspect the pooled, heterogeneity and total G-statistics:

results$summary

##            Test df          G            p
## 1        Pooled  1   8.189282 4.213860e-03
## 2 Heterogeneity 49 437.827514 7.448961e-64
## 3         Total 50 446.016796 5.819378e-65

The total G-test statistic is highly significant and so the null hypothesis of overall symmetry can be rejected. The pooled G-test (p-value = 0.004) suggests that the total number of parasites from each side differs from symmetry. Let’s calculate the total number of parasites found on each side:

#total number of parasites on left side
sum(diplostomum_eyes_excl_lenses$left)

## [1] 15026

#total number of parasites on right side
sum(diplostomum_eyes_excl_lenses$right)

## [1] 14534

#ratio
sum(diplostomum_eyes_excl_lenses$left) / sum(diplostomum_eyes_excl_lenses$right)

## [1] 1.033852

However, this slight left bias in the pooled data, doesn’t necessarily mean that we would expect to find a left bias in most hosts. The p-value for the heterogeneity G-test is highly significant, revealing that the proportions of metacercariae in the left and right eyes varies from host to host. The individual G-test identifies which of the hosts have asymmetrical parasite distributions. Here are the results for the first ten hosts:

head(results$hosts, 10)

##    Host Left Right          G            p          BH        Holm
## 1     1   86    99  0.9142668 0.3389848373 0.452402486 1.000000000
## 2     2  131   199 14.1130061 0.0001721493 0.001434578 0.007746721
## 3     3  195   133 11.7903193 0.0005953952 0.003307751 0.025006599
## 4     4  128   167  5.1710572 0.0229662351 0.055002553 0.711953287
## 5     5  192   211  0.8961138 0.3438258892 0.452402486 1.000000000
## 6     6  212   278  8.9168733 0.0028254783 0.008310230 0.096066262
## 7     7  451   474  0.5719508 0.4494847458 0.548152129 1.000000000
## 8     8  323   312  0.1905607 0.6624503507 0.690052449 1.000000000
## 9     9  394   305 11.3627210 0.0007493296 0.003746648 0.030722515
## 10   10  222   270  4.6903841 0.0303318672 0.068936062 0.879624148

The seven columns of the above data.frame are:

Host - host identifier
Left - count of parasites on left side
Right - count of parasites on right side
G - Individual G-statistic
p - p-value
BH - p-value adjusted for multiplicity using procedure of Benjamini and Hochberg (1995)
Holm - p-value adjusted for multiplicity using the procedure of Holm (1979)

The raw p-value has to be adjusted for multiplicity, because an individual G-test is applied to each host. If the null hypothesis of symmetry is true for all 50 hosts, at a significance level of 5% the probability of getting at least one Type I error (false rejection of the null hypothesis) is \(1-(1-0.05)^{50} = 0.923\); the expected number of significant results obtained purely by chance would be \(50 \times 0.05 = 2.5\). Holm’s procedure controls the familywise error rate (FWER), i.e. the probability of a single Type I error. However, guarding against a single Type I error will be unnecessarily conservative for many studies, especially those surveying a large number of hosts. If a small number of false positives among the set of rejected null hypotheses can be tolerated, then it will be preferable to control the false discovery rate (FDR) using the procedure of Benjamini and Hochberg (1995).

At the conventional significance level of 5%, Holm’s procedure finds 11 of the 50 individual G-tests to be significant:

sum(results$hosts$Holm<0.05)

## [1] 11

Benjamini and Hochberg’s method provides more power to detect asymmetric infections, identifying a total of 19, although one (5% of 19) of these is likely to be a false positive:

sum(results$hosts$BH<0.05)

## [1] 19

These asymmetric infections identified with the aid of Benjamini and Hochberg’s procedure can be classified according to whether they show left or right bias:

# Number of asymmetric infections showing left bias
sum(results$hosts$BH<0.05 & results$hosts$Left>results$hosts$Right)

## [1] 11

# Number of asymmetric infections showing right bias
sum(results$hosts$BH<0.05 & results$hosts$Right>results$hosts$Left)

## [1] 8

We conclude that Diplostomum spp. infections of the eyes (excluding the lenses) show bilateral asymmetry in a large proportion of the sampled ruffe. Moreoever, the bias is inconsistent, with parasites favoring the left eye in some hosts and the right in others.

For visualization the proportion of parasites in each eye can be expressed as a binary log transformed ratio. These ratios can be plotted as a histogram:

plotHistogram(diplostomum_eyes_excl_lenses, nBreaks=20, main="Histogram")

Alternatively, these ratios can be combined with corresponding p-values from individual G-tests in a volcano plot:

plotVolcano(diplostomum_eyes_excl_lenses, test="G", pAdj="BH", sigThresh=0.1,
main="Volcano Plot")

The dashed horizontal line in the volcano plot represents the chosen p-value threshold. Parasite distributions deviating significantly from symmetry are shown as red squares, whereas those not differing significantly from a 1:1 ratio are represented by blue circles.

Infections of the lens of the eye

The second dataset comprises observations on the distribution of Diplostomum spp. metacercariae in the lenses of the eyes of the ruffe. Here are the data for the first ten hosts:

head(diplostomum_lenses, 10)

##    left right
## 1     0     0
## 2     0     0
## 3     0     0
## 4     0     0
## 5     0     0
## 6     0     0
## 7     0     0
## 8     0     0
## 9     6     5
## 10    0     2

Note that the first eight hosts are not infected. The total number of ruffe with Diplostomum spp. infection of the lens is:

sum(diplostomum_lenses$left>0 | diplostomum_lenses$right>0)

## [1] 31

G-tests involve logarithmic transformation and so cannot be applied to counts of zero, as found in this dataset. Furthermore, G-tests are not recommended when counts are below ten. The g.test function will ignore uninfected hosts in this dataset, but will raise an error, because some hosts only have an infection in one eye. In this situation, binomial exact test should be used instead:

results <- eb.test(diplostomum_lenses)
# pooled test p-value
results$pooled

## [1] 0.8439127

# results for the first ten infected hosts
head(results$hosts, 10)

##    Host Left Right         p        BH     Holm
## 9     9    6     5 1.0000000 1.0000000 1.000000
## 10   10    0     2 0.5000000 1.0000000 1.000000
## 28   28    0     3 0.2500000 0.9687500 1.000000
## 46   46   11     8 0.6476059 1.0000000 1.000000
## 15   15    0     1 1.0000000 1.0000000 1.000000
## 16   16    0     7 0.0156250 0.4843750 0.484375
## 17   17    3     2 1.0000000 1.0000000 1.000000
## 18   18    0     5 0.0625000 0.5535714 1.000000
## 19   19    4     7 0.5488281 1.0000000 1.000000
## 20   20    8     2 0.1093750 0.5535714 1.000000

# smallest FDR adjusted p-value for an individual host
min(results$hosts$BH)

## [1] 0.484375

No evidence of asymmetry in lens infections was found by either the pooled test or in the tests of the individual host infections. Note that the eb.test function restricted analysis to the 31 infected hosts:

length(results$hosts[,1])

## [1] 31

Further examples

Here are further examples of the results returned from replicated G-tests of goodness-of-fit under different scenarios. Simulated datasets have been generated for parasitic infections showing:

symmetry
left bias with the left:right ratio varying between hosts
left bias with the left:right ratio similar in all hosts
asymmetry with inconsistent bias; left bias in some hosts and right in others

Symmetry

g.test(simulated_symmetrical_infection)

## $summary
##            Test df         G         p
## 1        Pooled  1 0.1463309 0.7020666
## 2 Heterogeneity  9 8.7916316 0.4567267
## 3         Total 10 8.9379625 0.5380011
## 
## $hosts
##    Host Left Right           G         p        BH Holm
## 1     1   86    89 0.051431091 0.8205915 0.9117684    1
## 2     2   96   112 1.231985889 0.2670212 0.5699383    1
## 3     3   92    88 0.088896206 0.7655851 0.9117684    1
## 4     4  112   104 0.296364074 0.5861708 0.8373869    1
## 5     5   90   109 1.816836608 0.1776903 0.5699383    1
## 6     6  117    96 2.073789831 0.1498488 0.5699383    1
## 7     7   98    97 0.005128228 0.9429110 0.9429110    1
## 8     8  106    91 1.143238156 0.2849691 0.5699383    1
## 9     9   98   108 0.485627728 0.4858842 0.8098070    1
## 10   10   84   102 1.744664663 0.1865489 0.5699383    1

In this simulated dataset there are similar numbers of parasites on each side. Pooled, heterogeneity and total G-test statistics are not significant at \(\alpha = 0.05\). Individual G-tests show that parasite distributions do not differ from symmetry in any of the ten hosts.

Left bias with the left:right ratio varying between hosts

g.test(simulated_left_bias_heterogeneous_proportions)

## $summary
##            Test df         G            p
## 1        Pooled  1 382.12807 4.282032e-85
## 2 Heterogeneity  9  33.71135 1.003471e-04
## 3         Total 10 415.83942 3.993780e-83
## 
## $hosts
##    Host Left Right        G            p           BH         Holm
## 1     1  165    53 60.38527 7.799434e-15 3.899717e-14 7.019491e-14
## 2     2  235   126 33.43061 7.385238e-09 1.055034e-08 2.954095e-08
## 3     3  203   128 17.14244 3.467862e-05 3.467862e-05 3.467862e-05
## 4     4  219   101 44.55652 2.471191e-11 6.177978e-11 1.729834e-10
## 5     5  246   144 26.98969 2.045435e-07 2.556794e-07 6.136306e-07
## 6     6  233   110 45.10548 1.867023e-11 6.177978e-11 1.493619e-10
## 7     7  189    52 82.73454 9.384707e-20 9.384707e-19 9.384707e-19
## 8     8  193    84 44.07347 3.162780e-11 6.325559e-11 1.897668e-10
## 9     9  170    97 20.21521 6.920056e-06 7.688951e-06 1.384011e-05
## 10   10  199    91 41.20618 1.369887e-10 2.283146e-10 6.849437e-10

In this example there are more parasites on the left than the right in every host. Furthermore, the proportion of parasites on the left and right sides varies betweeen hosts. For example, in host 1 there are three times as many parasites on the left than on the right, whereas in host 3 the ratio is approximately 1.6:1.

Pooled, heterogeneity and total G-test statistics are all highly significant (p<0.001).

A significant total G-test indicates that overall the parasite distributions deviate from symmetry in some way.
A significant pooled G-test shows that the sum of the counts of the parasites from the left side of hosts differs from the sum of the counts of the parasites from the right side of hosts.
A significant heterogeneity G-test reveals that the proportion of parasites found on the left and right sides, varies from host to host.

Individual G-tests demonstrate a highly significant (FDR corrected p-value < 0.00001) difference between the numbers of parasites found on the left and right sides in all 10 hosts.

Left bias with the left:right ratio similar in all hosts

g.test(simulated_left_bias_homogeneous_proportions)

## $summary
##            Test df          G            p
## 1        Pooled  1 234.461658 6.344087e-53
## 2 Heterogeneity  9   4.050156 9.080804e-01
## 3         Total 10 238.511814 1.406742e-45
## 
## $hosts
##    Host Left Right        G            p           BH         Holm
## 1     1  183    84 37.59896 8.689057e-10 8.689057e-09 8.689057e-09
## 2     2  175    86 30.96600 2.625882e-08 8.752939e-08 2.100705e-07
## 3     3  133    68 21.40251 3.722836e-06 5.318337e-06 1.553538e-05
## 4     4   83    36 19.07854 1.254475e-05 1.568093e-05 3.763424e-05
## 5     5  100    42 24.39720 7.838253e-07 1.567651e-06 4.773409e-06
## 6     6   69    34 12.13336 4.952778e-04 4.952778e-04 4.952778e-04
## 7     7  111    49 24.66557 6.819156e-07 1.567651e-06 4.773409e-06
## 8     8  163    77 31.51256 1.981540e-08 8.752939e-08 1.783386e-07
## 9     9  138    81 15.00783 1.070662e-04 1.189624e-04 2.141323e-04
## 10   10  157    85 21.74927 3.107077e-06 5.178461e-06 1.553538e-05

Similar to the previous data-set, this example also shows a left-bias. However, in this example the ratio of the number of parasites on the left side to the number on the right is aproximately 2:1 in all hosts.

The pooled and total G-test statistics are highly significant. However, the heterogeneity G-test statistic is not significant at \(\alpha = 0.05\).

A significant total G-test indicates that overall the parasite distributions deviate from symmetry in some way.
A significant pooled G-test shows that the sum of the counts of the parasites from the left side of hosts differs from the sum of the counts of the parasites from the right side of hosts.
A non-significant heterogeneity G-test suggests that the proportion of parasites found on the left and right sides does not vary between hosts.

All individual G-tests are significant (FDR corrected p-value < 0.001), demonstrating that the left bias occurs in all 10 hosts.

Asymmetry with inconsistent bias

g.test(simulated_asymmetry_inconsistent_bias)

## $summary
##            Test df           G            p
## 1        Pooled  1   0.0784249 7.794434e-01
## 2 Heterogeneity  9 104.2800180 2.137462e-18
## 3         Total 10 104.3584429 7.287066e-18
## 
## $hosts
##    Host Left Right           G            p           BH         Holm
## 1     1  105   117  0.64896489 4.204830e-01 5.256038e-01 1.000000e+00
## 2     2  200   142  9.88395833 1.667259e-03 5.557530e-03 1.333807e-02
## 3     3   74   195 56.42992255 5.823735e-14 5.823735e-13 5.823735e-13
## 4     4  172   182  0.28252346 5.950520e-01 6.611689e-01 1.000000e+00
## 5     5   88    84  0.09303164 7.603579e-01 7.603579e-01 1.000000e+00
## 6     6  194   140  8.76897735 3.063972e-03 7.659930e-03 2.144780e-02
## 7     7  199   166  2.98763967 8.390238e-02 1.398373e-01 4.195119e-01
## 8     8   75   106  5.33565903 2.089344e-02 4.178688e-02 1.253606e-01
## 9     9  157   187  2.61960554 1.055507e-01 1.507867e-01 4.222027e-01
## 10   10  178   108 17.30816045 3.178191e-05 1.589095e-04 2.860372e-04

In this example some hosts have many more parasites on the left than right, whereas others have more on the right than left.

The hetereogeneity and total G-test statistics are both highly significant, but the pooled G-test is not significant at \(\alpha = 0.05\).

A significant total G-test statistic indicates that overall the parasite distributions deviate from symmetry in some way.
A non-significant pooled G-test shows there is no evidence of bias towards one side.
A significant heterogeneity G-test reveals that the proportion of parasites on the left and right sides varies from host to host.

If we choose an FDR adjusted p-value of 0.05 as our significance threshold, we find five of ten hosts have asymmetric distributions of parasites. Of these five, three show a left bias and two a right bias. This is an example of inconsistent bias.

References

Benjamini, Y., and Y. Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society B 57: 289–300.

Holm, S. 1979. “A Simple Sequentially Rejective Multiple Test Procedure.” Scandinavian Journal of Statistics 6: 65–70.

Sokal, R.R., and F.J. Rohlf. 1995. Biometry. 3rd ed. New York: W.H. Freeman; Company.