recapr
Abundance Estimates and Standard Errors
Simulation via Random Draws and Plotting Discrete Distributions
Hypothesis Testing and Power Calculation
Tests of Consistency for the Estimator
Partially and Completely Stratified Estimators
In a simple two-event mark-recapture experiment, point estimates of abundance can be calculated according to the Chapman, Petersen, or Bailey estimators using NChapman()
, NPetersen()
, or NBailey()
, given the sample sizes in the first and second events, and the number of individuals recaptured in the second event. The Petersen estimator is the Maximum Likelihood Estimator (MLE), but the Chapman estimator generally performs better and is often recommended. Both assume sampling without replacement in the second sampling event. By contrast, the Bailey estimator assumes sampling with replacement in the second event.
The argument n1
denotes the sample size in the first event, with n2
denoting the sample size in the second event and m2
denoting the number of marked (recaptured) individuals in the second event.
library(recapr)
NChapman(n1=100, n2=150, m2=20) # Abundance estimate
## [1] 725.2381
vChapman(n1=100, n2=150, m2=20) # estimated variance
## [1] 16348.22
seChapman(n1=100, n2=150, m2=20) # standard error
## [1] 127.8601
Confidence intervals can be generated for the Chapman, Petersen, or Bailey estimators using ciChapman()
, ciPetersen()
, or ciBailey()
. A point estimate is returned as well as two confidence intervals, one using a Wald-type Normal-distribution assumption, and the other calculated by means of bootstrapping capture histories. The bootstrapping interval is likely to have more appropriate bounds than the Normal interval, and has demonstrated better coverage in testing via simulation.
ciChapman(n1=100, n2=150, m2=20)
## $Nhat
## [1] 725.2381
##
## $ciNorm
## [1] 474.6368 975.8394
##
## $ciBoot
## [1] 524.8966 1172.1538
Vectors of random draws can be generated for the Chapman, Petersen, or Bailey estimators using rChapman()
, rPetersen()
, or rBailey()
from an assumed value of total abundance (N
). This may be useful for simulation. Sample sizes n1
and n2
may be specified, but capture probabilities p1
and/or p2
may be used instead. If so, sample size(s) will first be drawn from a binomial distribution for each random draw before drawing from the abundance estimator. This will result in a greater degree of dispersion, but may be appropriate in some cases.
A plotting function is also provided for vectors of discrete (non-continuous) data. Abundance estimates are calculated from count data, with the result having a non-integer but also non-continuous support. It is possible that plotdiscdensity()
may be more appropriate for plotting a discrete (non-continuous) density than a kernel density plot or histogram, as the discontinuity is made explicit.
<- rChapman(length=10000, N=1500, n1=100, n2=120)
draws plotdiscdensity(draws)
Approximate p-values can be returned using pChapman()
, pPetersen()
, or pBailey()
, which use many random draws to simulate a null distribution. The null hypothesis abundance is specified in the nullN
argument, along with sample sizes n1
and n2
. The observed abundance estimate can be specified using estN
, or else the number of recaptures can be used directly, as m2
. The alternative hypothesis can be specified using the alternative
argument, as alternative="less"
, "greater"
, or "2-sided"
.
In the example given below, the null-hypothesis abundance is 500, with 100 individuals observed in the first and second events, with 28 recaptures in the second event.
<- pChapman(nullN=500, n1=100, n2=100, m2=28, alternative="less")
output output
## $estN
## [1] 350.7586
##
## $pval
## [1] 0.02068
plotdiscdensity(rChapman(length=100000, N=500, n1=100, n2=100)) # null distribution
abline(v=500, lwd=2, lty=2) # Null hypothesis abundance plotted as a dashed line
abline(v=output$estN, lwd=2, col=2) # Observed (estimated) abundance plotted as a red line
Power calculation can be done with powChapman()
, powPetersen()
, or powBailey()
, which use random draws from an assumed true (alternative) distribution, given the sample sizes of both events. The nullN
argument specifies the abundance used by the null hypothesis, and the trueN
argument specifies the assumed true abundance used for the power calculation. The n1
and n2
arguments give the sample sizes, and alternative
gives the direction of the alternative hypothesis (defaults to "less"
), with alpha
specifying the level of the test to use.
In the example given below, the power is calculated for a one-tailed test of a null abundance of 500, assuming a true abundance of 400. The test powers are then calculated and plotted for assumed abundances from 250 to 450. If the true abundance is 325, a one-tailed test of \(H_0: N \geq 500\) will have a power of roughly 90%.
powChapman(nullN=500, trueN=400, n1=100, n2=100, nsim=1000)
## [1] 0.333
<- seq(from=250, to=450, by=25)
Ntotry <- sapply(Ntotry, function(x)
power powChapman(nullN=500, trueN=x, n1=100, n2=100, nsim=1000))
plot(Ntotry, power)
Given a best-guess at the true abundance and possibly the sample size in one sampling event, a recommendation for the sample size(s) can be calculated from the desired confidence and relative accuracy with the methods of Robson & Regier (1964) using n2RR()
. Desired estimate confidence and accuracy (elsewhere termed “precision”) of 95% and 10%, respectively, is analogous to estimation “such that the estimated abundance will be within 10% of the true value 95% of the time”.
Recommendations for sample size are provided for two scenarios: if the size of the other sample is as specified, and if the two sample sizes are equal, which is the most efficient for sampling in terms of total sample size (\(n_1+n_2\)). The example below gives the full output for all allowed values of confidence and relative accuracy.
n2RR(N=1000, n1=100)
## $conf_0.99
## Rel acc n2 from specified n1 n if n1=n2
## 1 0.50 305 181
## 2 0.25 550 270
## 3 0.20 639 308
## 4 0.15 746 364
## 5 0.10 863 455
## 6 0.05 961 622
## 7 0.01 999 891
##
## $conf_0.95
## Rel acc n2 from specified n1 n if n1=n2
## 1 0.50 180 136
## 2 0.25 387 210
## 3 0.20 484 244
## 4 0.15 617 298
## 5 0.10 780 385
## 6 0.05 933 555
## 7 0.01 998 862
##
## $conf_0.9
## Rel acc n2 from specified n1 n if n1=n2
## 1 0.50 118 109
## 2 0.25 293 177
## 3 0.20 387 210
## 4 0.15 525 260
## 5 0.10 711 344
## 6 0.05 908 511
## 7 0.01 996 839
##
## $conf_0.85
## Rel acc n2 from specified n1 n if n1=n2
## 1 0.50 81 90
## 2 0.25 233 155
## 3 0.20 320 186
## 4 0.15 455 234
## 5 0.10 652 314
## 6 0.05 882 477
## 7 0.01 995 820
##
## $conf_0.8
## Rel acc n2 from specified n1 n if n1=n2
## 1 0.50 57 76
## 2 0.25 189 139
## 3 0.20 268 168
## 4 0.15 395 213
## 5 0.10 596 289
## 6 0.05 856 448
## 7 0.01 994 803
##
## $conf_0.75
## Rel acc n2 from specified n1 n if n1=n2
## 1 0.50 41 65
## 2 0.25 155 125
## 3 0.20 225 153
## 4 0.15 343 195
## 5 0.10 543 267
## 6 0.05 827 422
## 7 0.01 992 785
Output can be simplified by providing values of confidence and relative accuracy in the conf
and acc
arguments.
n2RR(N=1000, n1=100, conf=c(0.9,0.95), acc=c(0.15,0.1,0.05))
## $conf_0.9
## Rel acc n2 from specified n1 n if n1=n2
## 1 0.15 525 260
## 2 0.10 711 344
## 3 0.05 908 511
##
## $conf_0.95
## Rel acc n2 from specified n1 n if n1=n2
## 1 0.15 617 298
## 2 0.10 780 385
## 3 0.05 933 555
An alternative approach using simulation is provided with plotn2sim()
, in which the relative accuracy is simulated for a range of sample size values for the second event, at different levels of confidence. The values to plot for n2 can be set by n2range
and n2step
, giving the range endpoints and step size, respectively.
plotn2sim(N=1000, n1=100)
Simulation is also used in plotn1n2simmatrix()
, in which the relative accuracy is calculated for a range of sample size values for both sampling events for a given level of confidence, in this case 95%.
plotn1n2simmatrix(N=1000)
For a simple Petersen-type estimator of abundance to be consistent, one of three conditions must be true: either there must be complete mixing in the time between events, or the probability of capture must be equal for all individuals in the first event, or the probability of capture must be equal for all individuals in the second event. This is typically investigated by means of a set of \(\chi^2\) tests for each condition, with failure to reject the null hypothesis of any of the tests indicating that a simple Petersen-type estimator is reasonable to use. If the null hypotheses are rejected in all tests, a completely stratified or partially stratified (Darroch-type) estimator should be used.
The consistencytest()
function is provided for the instance that sampling in both events is stratified in such a way that movement between strata may occur, such as spatial or temporal stratification. The stratification used in events 1 and 2 do not need to be the same. Arguments n1
and n2
accept vectors of sample sizes in each event by strata. Recaptures of marked individuals can be specified in one of two ways: either as vectors of strata membership in m2strata1
and m2strata2
, or as a matrix in stratamat
. Arguments m2strata1
and m2strata2
accept vectors of the first- and second-event stratum membership for each recaptured individual, with only entries of 1, 2, 3, ...
accepted. Alternatively, stratamat
accepts a matrix in which each cell represents the number of recaptures for each combination of event 1 and event 2 strata, with rows corresponding to event 1 strata and columns corresponding to event 2 strata.
In the example below, there were two strata in event 1 (with sample sizes of 284 and 199) and three strata in event 2, and 30 individuals were marked in stratum 1 of event 1 and were recaptured in stratum 1 of event 2. These tests yield strong evidence of incomplete mixing in the time between events and of unequal marked to unmarked ratios among recapture strata, but also no evidence against the probability of re-sighting a released animal being independent of its stratum of origin. Therefore, use of a simple Petersen-type estimator may be considered justified if closure is assumed.
<- matrix(c(30,15,1,0,22,15), nrow=2, ncol=3, byrow=TRUE)
mat consistencytest(n1=c(284,199), n2=c(347,3616,1489), stratamat=mat)
## MIXING TEST
## H0: Movement probabilities from stratum i to stratum j are the same among sections (all theta_ij = theta_j)
##
## 1 2 3 not recaptured
## 1 30 15 1 238
## 2 0 22 15 162
##
## X-squared: 44.43179 df: 3 p-val: 1.221827e-09
##
## EQUAL PROPORTIONS TEST (SPAS terminology)
## H0: Marked to unmarked ratio among recapture strata is constant (Sum_i a_i*theta_ij/U_j = k)
##
## 1 2 3
## recaptured 30 37 16
## unmarked 317 3579 1473
##
## X-squared: 125.4408 df: 2 p-val: 5.766008e-28
##
## COMPLETE MIXING TEST (SPAS terminology)
## H0: Probability of re-sighting a released animal is independent of its stratum of origin (Sum_i theta_ij*p_j = d)
##
## 1 2
## recaptured 46 37
## not recaptured 238 162
##
## X-squared: 0.3185941 df: 1 p-val: 0.5724538
##
Since the decision to use a simple Petersen-type estimator is based on a failure to reject, the power of the tests used must be considered. The powconsistencytest()
function uses simulation as well as Cohen’s non-central \(\chi^2\) method to calculate the power of the consistency tests, given anticipated values of sample sizes by strata and assumed movement probabilities.
The pmat
argument is a matrix of assumed movement probabilities between strata, with rows corresponding to first-event strata and columns corresponding to second-event strata, with an additional column corresponding to the probability of NOT being recaptured in the second event. Values are conditioned on row, that is, by first-event strata. For example, a pmat
with a first row equal to (0.05, 0.1, 0.15, 0.7)
would imply a 5% chance that individuals captured in the first-event strata 1 will be recaptured in second-event strata 1, and a 70% chance that individuals captured in the first-event strata 1 will not be recaptured in event 2.
Because of the row-wise scaling, specifying a row equal to (0.05,0.1, 0.15, 0.7)
would be equivalent to that row having values (1, 2, 3, 14)
.
<- matrix(c(1,2,3,10,3,2,1,10), nrow=2, ncol=4, byrow=TRUE)
mat powconsistencytest(n1=c(100,200), n2=c(100,100,100), pmat=mat)
## MIXING TEST
## H0: Movement probabilities from stratum i to stratum j are the same among sections (all theta_ij = theta_j)
## Sample size (first event): 300 Significance level: 0.05
##
## Null hypothesis cross-classification probabilities:
## 1 2 3 Not recap
## 1 0.0486 0.0417 0.0347 0.2083
## 2 0.0972 0.0833 0.0694 0.4167
##
## Alternative hypothesis cross-classification probabilities:
## 1 2 3 Not recap
## 1 0.0208 0.0417 0.0625 0.2083
## 2 0.1250 0.0833 0.0417 0.4167
##
## Null hypothesis expected counts:
## 1 2 3 Not recap
## 1 14.58 12.5 10.42 62.5
## 2 29.17 25.0 20.83 125.0
##
## Alternative hypothesis expected counts:
## 1 2 3 Not recap
## 1 6.25 12.5 18.75 62.5
## 2 37.50 25.0 12.50 125.0
##
## Power: 0.9496757 Power (from simulation): 0.9541
##
##
## EQUAL PROPORTIONS TEST
## H0: Equal probability of capture among n1 strata (all p_i equal)
## Sample size (second event): 300 Significance level: 0.05
##
## Null hypothesis capture probabilities:
## 1 2 3
## Recaptured 0.1250 0.1250 0.1250
## Unmarked 0.2083 0.2083 0.2083
##
## Alternative hypothesis capture probabilities:
## 1 2 3
## Recaptured 0.1458 0.1250 0.1042
## Unmarked 0.1875 0.2083 0.2292
##
## Null hypothesis expected counts:
## 1 2 3
## Recaptured 37.5 37.5 37.5
## Unmarked 62.5 62.5 62.5
##
## Alternative hypothesis expected counts:
## 1 2 3
## Recaptured 43.75 37.5 31.25
## Unmarked 56.25 62.5 68.75
##
## Power: 0.3532643 Power (from simulation): 0.3515
##
##
## COMPLETE MIXING TEST
## H0: Equal probability of recapture among n2 strata (all p_j equal)
## Sample size (first event): 300 Significance level: 0.05
##
## Null hypothesis capture probabilities:
## 1 2
## Marked 0.1250 0.2500
## Unmarked 0.2083 0.4167
##
## Alternative hypothesis recapture probabilities:
## 1 2
## Marked 0.1250 0.2500
## Unmarked 0.2083 0.4167
##
## Null hypothesis expected counts:
## 1 2
## Marked 37.5 75
## Unmarked 62.5 125
##
## Alternative hypothesis expected counts:
## 1 2
## Marked 37.5 75
## Unmarked 62.5 125
##
## Power: 0.05 Power (from simulation): 0.0526
##
##
In the event that there is no movement observed between strata, or the population is stratified in such a way that there can be no movement between strata (such as by age or sex), a stratified estimator may be needed. The strattest()
function provides a means of conducting the typical \(\chi^2\) tests to investigate the necessity of a stratified estimator.
Since strata membership remains the same in both events, usage is simpler, with n1
, n2
, and m2
accepting vectors of first- and second-event sample sizes and recapture numbers by strata. In the example below, there is strong evidence of unequal capture probabilities in the first event but not in the second, and very strong evidence of differences in respective stratum capture probabilities between the first and second events.
strattest(n1=c(100,100), n2=c(50,200), m2=c(20,15))
##
## Equal proportions test
## H0: Equal probability of capture among n1 strata (all p_i equal)
##
## 1 2
## recaptured 20 15
## unmarked 30 185
##
## X-squared: 32.44394 df: 1 p-val: 1.226811e-08
##
##
## Complete mixing test
## H0: Equal probability of recapture among n2 strata (all p_j equal)
##
## 1 2
## recaptured 20 15
## not recaptured 80 85
##
## X-squared: 0.5541126 df: 1 p-val: 0.4566422
##
##
## First vs. second event test
## H0: Probabilities of capture are the same between first and second events (all p_i equal to respective p_j)
##
## 1 2
## first event 100 100
## second event 50 200
##
## X-squared: 43.66013 df: 1 p-val: 3.906508e-11
##
Similarly, the powstrattest()
function provides power estimates for the tests used in strattest()
, given assumed values of total abundance by strata and anticipated values of sample sizes by strata, using simulation as well as Cohen’s non-central \(\chi^2\) method.
powstrattest(N=c(1000,2000), n1=c(100,200), n2=c(200,200))
## $Test1
## First-event capture probabilities
##
## Detecting capture probabilities for each strata: 0.1 0.1
## As compared to null values: 0.1 0.1
## sample size: 400 and significance level: 0.05
##
## power: 0.05
## power (from simulation): 0.04948
##
## $Test2
## Second-event capture probabilities
##
## Detecting capture probabilities for each strata: 0.2 0.1
## As compared to null values: 0.1333333 0.1333333
## sample size: 300 and significance level: 0.05
##
## power: 0.6707468
## power (from simulation): 0.65945
##
## $Test3
## First-event vs. second-event capture probabilities
##
## Detecting capture probabilities for each strata: 0.6666667 0.5
## As compared to null values: 0.5714286 0.5714286
## sample size: 700 and significance level: 0.05
##
## power: 0.9928497
## power (from simulation): 0.99563
If sampling is stratified in both sampling events and some movement exists between strata but mixing is incomplete in the time between events and tests indicate unequal capture probabilities in both events, a partially stratified (Darroch-type) estimator should be used. The NDarroch()
function provides a means for doing this, and argument usage is the same as that in consistencytest()
.
The function returns abundance estimates and standard errors for each stratum in both sampling events, as well as an overall abundance estimate and standard error. Implementation is currently using Darroch’s method for calculations, and will only accept non-singular input matrices.
<- matrix(c(59,30,1,45,280,38,0,42,25), nrow=3, ncol=3, byrow=TRUE)
mat NDarroch(n1=c(484,1468,399), n2=c(847,6616,2489), stratamat=mat)
## $Nhat_strata1
## [1] 3584.554 16769.123 33699.792
##
## $se_Nhat_strata1
## [1] 1914.125 6902.991 8979.480
##
## $Nhat_strata2
## [1] 6360.409 18069.681 29623.380
##
## $se_Nhat_strata2
## [1] 848.4842 4331.6190 7791.7721
##
## $Nhat
## [1] 54053.47
##
## $se_Nhat
## [1] 4890.999
If no movement can occur between strata, a completely stratified estimator can be used. Functions Nstrat()
, vstrat()
, sestrat()
, and cistrat()
, are provided to compute abundance estimates, estimated variance, standard error, and confidence intervals. In all cases, n1
, n2
, and m2
accept vectors of first- and second-event sample sizes and recapture numbers by strata, and the additional estimator
accepts the type of Petersen-type estimator to use: either "Chapman"
, "Petersen"
, or "Bailey"
. Additionally, rstrat()
generates random draws from given vectors of N
, n1
, and n2
, or with vectors of capture probabilities p1
and/or p2
.
Nstrat(n1=c(100,200), n2=c(100,500), m2=c(10,10))
## [1] 10080
vstrat(n1=c(100,200), n2=c(100,500), m2=c(10,10))
## [1] 6513699
sestrat(n1=c(100,200), n2=c(100,500), m2=c(10,10))
## [1] 2552.195
cistrat(n1=c(100,200), n2=c(100,500), m2=c(10,10))
## $Nstrat
## [1] 10080
##
## $Nhat_by_strat
## [1] 926.3636 9153.6364
##
## $ciNorm_strat
## [1] 5077.79 15082.21
##
## $ciBoot_strat
## [1] 6649.363 20866.843
<- rstrat(length=10000, N=c(500,1000), n1=c(100,200), n2=c(100,500))
draws plotdiscdensity(draws)
<- rstrat(length=100000, N=c(5000,10000), n1=c(500,200), n2=c(500,200))
draws plotdiscdensity(draws)