The experiment consisted of daily sampling, but condensed to weeks. Returning adult salmon were tagged and released at the upstream trap. They were recaptured during carcass surveys
A stratified-Petersen estimator (Darroch, 1961; Plante et al. 1996) is found for the unpooled data and some pooling of the rows/columns.
This will load the SPAS fitting functions and any associated packages needed by SPAS.
The data should be stored as an \(s+1\) by \(t+1\) (before pooling) matrix. The \(s \times t\) upper left matrix is the number of animals released in row stratum \(i\) and recovered in column stratum \(j\). Row \(s+1\) contains the total number of UNMARKED animals recovered in column stratum \(j\). Column \(t+1\) contains the number of animals marked in each row stratum but not recovered in any column stratum. The \(rawdata[s+1, t+1]\) is not used and can be set to 0 or NA. The sum of the entries in each of the first \(s\) rows is then the number of animals marked in each row stratum. The sum of the entries in each of the first \(t\) columns is then the number of animals captured (marked and unmarked) in each column stratum. The row/column names of the matrix may be set to identify the entries in the output.
Here is the raw data for the Harrison River. Notice that very small number of releases and recoveries in week 6.
harrison.2011.chinook.F.csv <- textConnection("
4 , 2 , 1 , 1 , 0 , 0 , 130
12 , 7 , 14 , 1 , 3 , 0 , 330
7 , 11 , 41 , 9 , 1 , 1 , 790
1 , 13 , 40 , 12 , 9 , 1 , 667
0 , 1 , 8 , 8 , 3 , 0 , 309
0 , 0 , 0 , 0 , 0 , 1 , 65
744 , 1187 , 2136 , 951 , 608 , 127 , 0")
har.data <- as.matrix(read.csv(harrison.2011.chinook.F.csv, header=FALSE))
har.data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
A total of 138 fish were tagged and released in week 1. Of these fish, 4 were recovered down stream in column stratum 1; 2 were recovered in column stratum 2; and 130 were never seen again. A total of 744 UNMARKED fish were recovered in column stratum 1.
You can add row and column names to the matrix which will used when the data are printed.
The Stratified-Petersen model is fit to the above data object. The very sparse last row will make fitting the model to the original data difficult with a very flat likelihood surface and so we relax the convergence criteria:
The row.pool.in
and col.pool.in
inform the
function on which rows or columns to pool before the analysis proceeds.
Both parameters use a vector of codes (length \(s\) for row pooing and length \(t\) for column pooling) where rows/columns
are pooled that share the same value.
For example. row.pool.in=c(1,2,3,4,5,6)
would imply that
no rows are pooled, while
row.pool.in=c('a','a','a','b','b','b')
would imply that the
first three rows and last three rows are pooled. The entries in the
vector can be numeric or character; however, using character entries
implies that the final pooled matrix is displayed in the order of the
character entries. I find that using entries such as '123'
to represent pooling rows 1, 2, and 3 to be easiest to use.
The SPAS system only fits models were the number of rows after pooling is less than or equal to the number of columns after pooling.
The results of the model fit is a LARGE list. But the function
SPAS.print.model
produces a nice report
SPAS.print.model(mod..1)
#> Model Name: No restrictions
#> Date of Fit: 2024-01-25 12:19
#> Version of OPEN SPAS used : SPAS-R 2023-03-31
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> Row pooling setup : 1 2 3 4 5 6
#> Col pooling setup : 1 2 3 4 5 6
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 70135 (SE 4503 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool5 pool6 V7
#> pool1 4 2 1 1 0 0 130
#> pool2 12 7 14 1 3 0 330
#> pool3 7 11 41 9 1 1 790
#> pool4 1 13 40 12 9 1 667
#> pool5 0 1 8 8 3 0 309
#> pool6 0 0 0 0 0 1 65
#> 744 1187 2136 951 608 127 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 4121.288
#> Condition number of XX' after logical pooling 4121.288
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 47243.45 ; np: 48 ; AICc: -94390.9
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool5 pool6 psi cap.prob exp factor
#> pool1 3.7 2.6 0.8 0.9 0.0 0 130 0.005 191.2
#> pool2 12.0 7.0 14.0 1.0 3.0 0 330 1.000 0.0
#> pool3 7.0 11.0 41.0 9.0 1.0 1 790 1.000 0.0
#> pool4 1.0 13.8 37.5 11.8 10.9 1 667 0.021 47.6
#> pool5 0.0 1.0 7.7 7.9 3.3 0 309 0.037 26.3
#> pool6 0.0 0.0 0.0 0.0 0.0 1 65 0.012 79.4
#> est unmarked 744.0 1186.0 2139.0 951.0 606.0 127 0 NA NA
#> Pop Est
#> pool1 26523
#> pool2 367
#> pool3 860
#> pool4 36104
#> pool5 8998
#> pool6 5307
#> est unmarked 78159
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool5 pool6 psi cap.prob exp factor
#> pool1 1.6 1.3 0.8 1.0 0.0 0 11.4 0.002 83.1
#> pool2 3.5 2.6 3.7 1.0 1.7 0 18.2 0.000 0.0
#> pool3 2.6 3.3 6.4 3.0 1.0 1 28.1 0.000 0.0
#> pool4 1.0 3.6 5.9 3.5 2.1 1 25.8 0.007 16.7
#> pool5 0.0 1.0 2.7 2.9 2.0 0 17.6 0.072 53.7
#> pool6 0.0 0.0 0.0 0.0 0.0 1 8.1 0.015 94.7
#> est unmarked NA NA NA NA NA NA 0.0 NA NA
#> Pop Est
#> pool1 11462
#> pool2 0
#> pool3 0
#> pool4 12411
#> pool5 17661
#> pool6 6253
#> est unmarked 14676
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 0.843434
#> Chisquare gof df : 0
#> Chisquare gof p : NA
The original data, the data after pooling, estimates and their standard errors are shown. Here the stratified-Petersen estimate of the total number of smolts passing the at the first sampling station is 78,159 with a standard error of 14,676.
Each entry in the output is available in fitted model object. You will need to explore the structure
cat("Names of objects at highest level\n")
#> Names of objects at highest level
names(mod..1)
#> [1] "version" "date" "input" "kappa"
#> [5] "kappa.after.lp" "est" "vcv" "se"
#> [9] "fit.setup" "conditional" "model.info" "gof"
cat("\n\nNames of estimates (both beta and real)\n")
#>
#>
#> Names of estimates (both beta and real)
names(mod..1$est)
#> [1] "N.Chapman" "beta" "real"
cat("\n\nNames of real estimates\n")
#>
#>
#> Names of real estimates
names(mod..1$est$real)
#> [1] "psi" "theta" "cap" "all.expanded" "N"
#> [6] "N.stratum" "exp.factor"
The N
entries refer to the population size; the
N.stratum
entries refer to the individual stratum
population sizes; the cap
entries refer to the estimated
probability of capture in each row stratum; the exp.factor
entries refer to (1-cap)/cap
, or the expansion factor for
each row; the psi
entries refer to the number of animals
tagged but never seen again (the right most column in the input data);
and the theta
entries refer to the expected number of
animals that were tagged in row stratum \(i\) and recovered in column stratum \(j\) (after pooling).
As noted by Darroch (1961), the stratified-Petersen will fail if the matrix of movements is close to singular. This often happens if two rows are proportional to each other. In this case, there is no unique MLE for the probability of capture in the two rows, and they should be pooled. A detailed discussion of pooling is found in Schwarz and Taylor (1998).
Let us now pool the first two rows and the last two rows, but leave rows 3 and 4 alone. Similar, let us pool the last two columns.
The code is
mod..2 <- SPAS.fit.model(har.data, model.id="Pooling some rows",
row.pool.in=c("12","12","3","4","56","56"),
col.pool.in=c(1,2,3,4,56,56))
Notice how we specify the pooling for rows and columns and the choice of entries for the two corresponding vectors. I used character entries for the row pooling to ensure that the pooled matrix is sorted properly (see below); the numeric entries for the columns is ok as is.
The results of the model fit is
SPAS.print.model(mod..2)
#> Model Name: Pooling some rows
#> Date of Fit: 2024-01-25 12:19
#> Version of OPEN SPAS used : SPAS-R 2023-03-31
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> Row pooling setup : 12 12 3 4 56 56
#> Col pooling setup : 1 2 3 4 56 56
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 70135 (SE 4503 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool56 V7
#> pool12 16 9 15 2 3 460
#> pool3 7 11 41 9 2 790
#> pool4 1 13 40 12 10 667
#> pool56 0 1 8 8 4 374
#> 744 1187 2136 951 735 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 231.4653
#> Condition number of XX' after logical pooling 231.4653
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 48053.08 ; np: 28 ; AICc: -96050.16
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool12 14.0 12.6 13.1 2.1 3.3 460 0.019 51.3 26405
#> pool3 7.0 11.0 41.0 9.0 2.0 790 1.000 0.0 860
#> pool4 0.9 15.5 36.9 12.2 10.5 667 0.034 28.8 22132
#> pool56 0.0 1.5 6.8 8.3 4.4 374 0.016 59.9 24042
#> est unmarked 746.0 1180.0 2142.0 951.0 734.0 0 NA NA 73440
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool12 4.0 3.2 3.3 1.4 1.7 21.4 0.006 15.7 7953
#> pool3 2.6 3.3 6.4 3.0 1.4 28.1 0.000 0.0 0
#> pool4 0.9 4.3 6.8 3.5 3.0 25.8 0.019 17.0 12643
#> pool56 0.0 1.6 2.3 2.7 1.7 19.3 0.011 41.5 16388
#> est unmarked NA NA NA NA NA 0.0 NA NA 10424
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 2.764046
#> Chisquare gof df : 1
#> Chisquare gof p : 0.09640417
Here the stratified-Petersen estimate of the total number of smolts passing the at the first sampling station is 73,440 with a standard error of 73,440. which is a slight reduction from the unpooled estimates.
You can physically pool to a single row (and multiple columns) or a single row and single column (which is equivalent to the pooled Petersen estimator). The code and output follow:
mod..3 <- SPAS.fit.model(har.data, model.id="A single row",
row.pool.in=rep(1, nrow(har.data)-1),
col.pool.in=c(1,2,3,4,56,56))
#> Using nlminb to find conditional MLE
#> outer mgc: 5743.898
#> outer mgc: 5161.972
#> outer mgc: 3336.904
#> outer mgc: 6455.214
#> outer mgc: 1528.345
#> outer mgc: 191.7022
#> outer mgc: 5.098792
#> outer mgc: 0.004918951
#> outer mgc: 9.123227e-09
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod..3)
#> Model Name: A single row
#> Date of Fit: 2024-01-25 12:19
#> Version of OPEN SPAS used : SPAS-R 2023-03-31
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> Row pooling setup : 1 1 1 1 1 1
#> Col pooling setup : 1 2 3 4 56 56
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 70135 (SE 4503 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool56 V7
#> 1 24 34 104 31 19 2291
#> 744 1187 2136 951 735 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 1
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 51374.83 ; np: 7 ; AICc: -102735.7
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> 1 27.3 43.4 79.6 34.9 26.8 2291 0.036 27.1 70426
#> est unmarked 741.0 1178.0 2160.0 947.0 727.0 0 NA NA 70426
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> 1 2.1 3.2 5.6 2.6 2.1 47.9 0.002 1.9 4545
#> est unmarked NA NA NA NA NA 0.0 NA NA 4545
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 13.07308
#> Chisquare gof df : 4
#> Chisquare gof p : 0.01092418
mod..4 <- SPAS.fit.model(har.data, model.id="Pooled Peteren",
row.pool.in=rep(1, nrow(har.data)-1),
col.pool.in=rep(1, ncol(har.data)-1))
#> Using nlminb to find conditional MLE
#> outer mgc: 5745.184
#> outer mgc: 4409.677
#> outer mgc: 1566.567
#> outer mgc: 868.5048
#> outer mgc: 336.9617
#> outer mgc: 84.23975
#> outer mgc: 10.91824
#> outer mgc: 0.2630679
#> outer mgc: 0.0001629491
#> outer mgc: 6.139089e-11
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod..4)
#> Model Name: Pooled Peteren
#> Date of Fit: 2024-01-25 12:19
#> Version of OPEN SPAS used : SPAS-R 2023-03-31
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> Row pooling setup : 1 1 1 1 1 1
#> Col pooling setup : 1 1 1 1 1 1
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 70135 (SE 4503 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> 1 V7
#> 1 212 2291
#> 5753 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 1
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 60410.94 ; np: 3 ; AICc: -120815.9
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> 1 psi cap.prob exp factor Pop Est
#> 1 212 2291 0.036 27.1 70426
#> est unmarked 5753 0 NA NA 70426
#>
#> SE of above estimates
#> 1 psi cap.prob exp factor Pop Est
#> 1 14.6 47.9 0.002 1.9 4545
#> est unmarked NA 0.0 NA NA 4545
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 1.852615e-23
#> Chisquare gof df : 0
#> Chisquare gof p : NA
Both models have an estimated abundance of 70,426 with a standard error of 4,545. which is a slight reduction from the unpooled estimates.
Logical pooling is specified the same way as physical pooling except you set the additional argument raw.physical.pool=FALSE.
The code and output are:
mod..5 <- SPAS.fit.model(har.data, model.id="Logical Pooling some rows",
row.pool.in=c("12","12","3","4","56","56"),
row.physical.pool=FALSE,
col.pool.in=c(1,2,3,4,56,56))
#> Using nlminb to find conditional MLE
#> outer mgc: 1732.283
#> outer mgc: 1844.584
#> outer mgc: 1774.949
#> outer mgc: 1054.331
#> outer mgc: 539.3424
#> outer mgc: 91.25263
#> outer mgc: 15.50252
#> outer mgc: 37.78282
#> outer mgc: 8.150885
#> outer mgc: 18.33387
#> outer mgc: 4.894627
#> outer mgc: 20.8335
#> outer mgc: 3.215089
#> outer mgc: 0.8917743
#> outer mgc: 3.374994
#> outer mgc: 0.8508261
#> outer mgc: 3.445172
#> outer mgc: 1.130339
#> outer mgc: 4.519949
#> outer mgc: 5.20765
#> outer mgc: 4.784239
#> outer mgc: 15.37811
#> outer mgc: 1.680505
#> outer mgc: 3.122837
#> outer mgc: 4.996219
#> outer mgc: 1.380738
#> outer mgc: 2.75818
#> outer mgc: 1.274076
#> outer mgc: 1.068575
#> outer mgc: 1.08059
#> outer mgc: 0.503993
#> outer mgc: 0.3235054
#> outer mgc: 0.1586303
#> outer mgc: 0.07574203
#> outer mgc: 0.03208431
#> outer mgc: 0.01271587
#> outer mgc: 0.004643863
#> outer mgc: 0.00172434
#> outer mgc: 0.0006375201
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod..5)
#> Model Name: Logical Pooling some rows
#> Date of Fit: 2024-01-25 12:19
#> Version of OPEN SPAS used : SPAS-R 2023-03-31
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> Row pooling setup : 12 12 3 4 56 56
#> Col pooling setup : 1 2 3 4 56 56
#> Physical pooling : FALSE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 70135 (SE 4503 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool56 V7
#> pool.12 4 2 1 1 0 130
#> pool.12 12 7 14 1 3 330
#> pool.3 7 11 41 9 2 790
#> pool.4 1 13 40 12 10 667
#> pool.56 0 1 8 8 3 309
#> pool.56 0 0 0 0 1 65
#> 744 1187 2136 951 735 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 2.279344e+17
#> Condition number of XX' after logical pooling 231.4653
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 47585.39 ; np: 40 ; AICc: -95090.77
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool.12 3.5 2.8 0.9 1.0 0.0 130 0.019 51.3 7216
#> pool.12 10.5 9.8 12.2 1.0 3.3 330 0.019 51.3 19189
#> pool.3 7.0 11.0 41.0 9.0 2.0 790 1.000 0.0 860
#> pool.4 0.9 15.5 36.9 12.2 10.5 667 0.034 28.8 22132
#> pool.56 0.0 1.5 6.8 8.3 3.3 309 0.016 59.9 20025
#> pool.56 0.0 0.0 0.0 0.0 1.1 65 0.016 59.9 4017
#> est unmarked 746.0 1180.0 2142.0 951.0 734.0 0 NA NA 73440
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool.12 1.8 1.9 0.9 1.0 0.0 11.4 0.006 15.7 2173
#> pool.12 3.4 3.0 3.2 1.0 1.7 18.2 0.006 15.7 5780
#> pool.3 2.6 3.3 6.4 3.0 1.4 28.1 0.000 0.0 0
#> pool.4 0.9 4.3 6.8 3.5 3.0 25.8 0.019 17.0 12643
#> pool.56 0.0 1.6 2.3 2.7 1.6 17.6 0.011 41.5 13649
#> pool.56 0.0 0.0 0.0 0.0 1.0 8.1 0.011 41.5 2738
#> est unmarked NA NA NA NA NA 0.0 NA NA 10424
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 2.764045
#> Chisquare gof df : 1
#> Chisquare gof p : 0.09640423
You can also logically pool to a single row (and multiple columns) or a single row and single column (which is equivalent to the pooled Petersen estimator).
Logical pooling is specified the same way as physical pooling except you set the additional argument raw.physical.pool=FALSE.
The code and output follow:
mod..6 <- SPAS.fit.model(har.data, model.id="A single row - Logical Pool",
row.pool.in=rep(1, nrow(har.data)-1), row.physical.pool=FALSE,
col.pool.in=c(1,2,3,4,56,56))
#> Using nlminb to find conditional MLE
#> outer mgc: 4875.671
#> outer mgc: 3448.625
#> outer mgc: 225.0952
#> outer mgc: 134.1817
#> outer mgc: 21.21392
#> outer mgc: 6.613975
#> outer mgc: 0.9444662
#> outer mgc: 2.197753
#> outer mgc: 0.7886682
#> outer mgc: 0.3014886
#> outer mgc: 0.1124325
#> outer mgc: 0.04158366
#> outer mgc: 0.01532865
#> outer mgc: 0.005643317
#> outer mgc: 0.002076634
#> outer mgc: 0.0008804818
#> outer mgc: 1.938314e-05
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod..6)
#> Model Name: A single row - Logical Pool
#> Date of Fit: 2024-01-25 12:19
#> Version of OPEN SPAS used : SPAS-R 2023-03-31
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> Row pooling setup : 1 1 1 1 1 1
#> Col pooling setup : 1 2 3 4 56 56
#> Physical pooling : FALSE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 70135 (SE 4503 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool56 V7
#> pool.1 4 2 1 1 0 130
#> pool.1 12 7 14 1 3 330
#> pool.1 7 11 41 9 2 790
#> pool.1 1 13 40 12 10 667
#> pool.1 0 1 8 8 3 309
#> pool.1 0 0 0 0 1 65
#> 744 1187 2136 951 735 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 2.279344e+17
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 47580.39 ; np: 37 ; AICc: -95086.79
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool.1 4.5 2.6 0.8 1.1 0.0 130 0.036 27.1 3883
#> pool.1 13.6 8.9 10.7 1.1 4.2 330 0.036 27.1 10326
#> pool.1 8.0 14.0 31.4 10.1 2.8 790 0.036 27.1 24198
#> pool.1 1.1 16.6 30.6 13.5 14.1 667 0.036 27.1 20906
#> pool.1 0.0 1.3 6.1 9.0 4.2 309 0.036 27.1 9257
#> pool.1 0.0 0.0 0.0 0.0 1.4 65 0.036 27.1 1857
#> est unmarked 741.0 1178.0 2160.0 947.0 727.0 0 NA NA 70426
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool.1 2.1 1.8 0.8 1.1 0.0 11.4 0.002 1.9 262
#> pool.1 3.0 3.1 2.8 1.1 2.3 18.2 0.002 1.9 696
#> pool.1 2.6 3.6 4.4 2.9 1.9 28.1 0.002 1.9 1632
#> pool.1 1.1 3.8 4.4 3.2 3.3 25.8 0.002 1.9 1410
#> pool.1 0.0 1.3 2.1 2.8 2.3 17.6 0.002 1.9 624
#> pool.1 0.0 0.0 0.0 0.0 1.4 8.1 0.002 1.9 125
#> est unmarked NA NA NA NA NA 0.0 NA NA 4545
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 13.07307
#> Chisquare gof df : 4
#> Chisquare gof p : 0.01092422
mod..7 <- SPAS.fit.model(har.data, model.id="Pooled Peteren - Logical Pool",
row.pool.in=rep(1, nrow(har.data)-1), row.physical.pool=FALSE,
col.pool.in=rep(1, ncol(har.data)-1))
#> Using nlminb to find conditional MLE
#> outer mgc: 5726.985
#> outer mgc: 4246.546
#> outer mgc: 1692.891
#> outer mgc: 600.5738
#> outer mgc: 871.0579
#> outer mgc: 162.3597
#> outer mgc: 264.4475
#> outer mgc: 31.01478
#> outer mgc: 35.76828
#> outer mgc: 0.9684972
#> outer mgc: 0.02789573
#> outer mgc: 9.564619e-07
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod..7)
#> Model Name: Pooled Peteren - Logical Pool
#> Date of Fit: 2024-01-25 12:19
#> Version of OPEN SPAS used : SPAS-R 2023-03-31
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> Row pooling setup : 1 1 1 1 1 1
#> Col pooling setup : 1 1 1 1 1 1
#> Physical pooling : FALSE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 70135 (SE 4503 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> 1 V7
#> pool.1 8 130
#> pool.1 37 330
#> pool.1 70 790
#> pool.1 76 667
#> pool.1 20 309
#> pool.1 1 65
#> 5753 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is Inf
#> Condition number of XX' after logical pooling 1
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 56584.83 ; np: 13 ; AICc: -113143.7
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> 1 psi cap.prob exp factor Pop Est
#> pool.1 8 130 0.036 27.1 3883
#> pool.1 37 330 0.036 27.1 10326
#> pool.1 70 790 0.036 27.1 24198
#> pool.1 76 667 0.036 27.1 20906
#> pool.1 20 309 0.036 27.1 9257
#> pool.1 1 65 0.036 27.1 1857
#> est unmarked 5753 0 NA NA 70426
#>
#> SE of above estimates
#> 1 psi cap.prob exp factor Pop Est
#> pool.1 2.8 11.4 0.002 1.9 262
#> pool.1 6.1 18.2 0.002 1.9 696
#> pool.1 8.4 28.1 0.002 1.9 1632
#> pool.1 8.7 25.8 0.002 1.9 1410
#> pool.1 4.5 17.6 0.002 1.9 624
#> pool.1 1.0 8.1 0.002 1.9 125
#> est unmarked NA 0.0 NA NA 4545
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 1.478714e-15
#> Chisquare gof df : 0
#> Chisquare gof p : NA
Both models have an estimated abundance of 73,440 with a standard error of 10,424. which is a slight reduction from the unpooled estimates.
Note that the estimates from physical and logical pooling are the same.
Looking at the results from the first 6 models, it appears that rows 3 and 4 are also approximately proportional and should be pooled.
The code and output are:
mod..8 <- SPAS.fit.model(har.data, model.id="Logical Pooling pairs rows",
row.pool.in=c("12","12","34","34","56","56"),
row.physical.pool=FALSE,
col.pool.in=c(1,2,3,4,56,56))
#> Using nlminb to find conditional MLE
#> outer mgc: 3232.035
#> outer mgc: 3083.611
#> outer mgc: 555.5226
#> outer mgc: 393.5743
#> outer mgc: 22.8327
#> outer mgc: 3.85415
#> outer mgc: 4.796064
#> outer mgc: 2.399755
#> outer mgc: 11.41882
#> outer mgc: 2.744014
#> outer mgc: 11.30605
#> outer mgc: 2.712976
#> outer mgc: 10.17069
#> outer mgc: 2.538887
#> outer mgc: 8.868373
#> outer mgc: 7.722014
#> outer mgc: 25.48587
#> outer mgc: 8.481816
#> outer mgc: 17.30982
#> outer mgc: 9.301379
#> outer mgc: 8.80078
#> outer mgc: 33.60389
#> outer mgc: 15.27487
#> outer mgc: 14.17405
#> outer mgc: 14.6973
#> outer mgc: 17.58537
#> outer mgc: 14.67702
#> outer mgc: 13.22858
#> outer mgc: 9.882845
#> outer mgc: 7.90472
#> outer mgc: 5.242273
#> outer mgc: 4.351066
#> outer mgc: 2.923214
#> outer mgc: 2.714339
#> outer mgc: 2.684718
#> outer mgc: 1.637894
#> outer mgc: 1.199361
#> outer mgc: 0.6203809
#> outer mgc: 0.3507094
#> outer mgc: 0.1111873
#> outer mgc: 0.03253222
#> outer mgc: 0.01306298
#> outer mgc: 0.006819012
#> outer mgc: 0.00357779
#> outer mgc: 0.001875019
#> outer mgc: 0.0009945233
#> outer mgc: 0.000737506
#> outer mgc: 0.0001519297
#> Convergence codes from nlminb 1 singular convergence (7)
#> Finding conditional estimate of N
SPAS.print.model(mod..8)
#> Model Name: Logical Pooling pairs rows
#> Date of Fit: 2024-01-25 12:19
#> Version of OPEN SPAS used : SPAS-R 2023-03-31
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> Row pooling setup : 12 12 34 34 56 56
#> Col pooling setup : 1 2 3 4 56 56
#> Physical pooling : FALSE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 70135 (SE 4503 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool56 V7
#> pool.12 4 2 1 1 0 130
#> pool.12 12 7 14 1 3 330
#> pool.34 7 11 41 9 2 790
#> pool.34 1 13 40 12 10 667
#> pool.56 0 1 8 8 3 309
#> pool.56 0 0 0 0 1 65
#> 744 1187 2136 951 735 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 2.279344e+17
#> Condition number of XX' after logical pooling 260.3574
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 47584.73 ; np: 39 ; AICc: -95091.46
#>
#> Code/Message from optimization is: 1 singular convergence (7)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool.12 3.1 2.9 0.9 0.9 0.0 130 0.018 54.9 7718
#> pool.12 9.4 10.3 13.2 0.9 3.2 330 0.018 54.9 20525
#> pool.34 6.7 11.5 40.7 8.9 2.0 790 0.115 7.7 7452
#> pool.34 1.0 13.6 39.7 11.8 10.1 667 0.115 7.7 6438
#> pool.56 0.0 2.5 7.2 6.7 3.4 309 0.010 103.0 34203
#> pool.56 0.0 0.0 0.0 0.0 1.1 65 0.010 103.0 6861
#> est unmarked 748.0 1180.0 2138.0 953.0 734.0 0 NA NA 83198
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool56 psi cap.prob exp factor Pop Est
#> pool.12 1.7 2.0 0.9 0.9 0.0 11.4 0.006 18.7 2579
#> pool.12 3.2 3.4 3.6 0.9 1.8 18.2 0.006 18.7 6858
#> pool.34 2.6 3.6 6.4 3.0 1.4 28.1 0.122 9.2 7887
#> pool.34 1.0 3.9 6.3 3.4 3.2 25.8 0.122 9.2 6814
#> pool.56 0.0 2.6 2.6 2.6 1.6 17.6 0.006 60.7 19983
#> pool.56 0.0 0.0 0.0 0.0 1.1 8.1 0.006 60.7 4009
#> est unmarked NA NA NA NA NA 0.0 NA NA 13337
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 3.834003
#> Chisquare gof df : 2
#> Chisquare gof p : 0.1470472
We extract the results from the models and make a summary report:
model.list <- mget( ls()[grepl("^mod\\.\\.",ls())])
names(model.list)
#> [1] "mod..1" "mod..2" "mod..3" "mod..4" "mod..5" "mod..6" "mod..7" "mod..8"
report <- plyr::ldply(model.list, function(x){
#browser()
data.frame(#version=x$version,
date = as.Date(x$date),
model.id = x$model.info$model.id,
s.a.pool =-1+nrow(x$fit.setup$pooldata),
t.p.pool =-1+ncol(x$fit.setup$pooldata),
logL.cond = x$model.info$logL.cond,
np = x$model.info$np,
AICc = x$model.info$AICc,
gof.chisq = round(x$gof$chisq,1),
gof.df = x$gof$chisq.df,
gof.p = round(x$gof$chisq.p,3),
kappa.after.lp = round(x$kappa.after.lp),
Nhat = round(x$est$real$N),
Nhat.se = round(x$se $real$N))
})
report
#> .id date model.id s.a.pool t.p.pool logL.cond
#> 1 mod..1 2024-01-25 No restrictions 6 6 47243.45
#> 2 mod..2 2024-01-25 Pooling some rows 4 5 48053.08
#> 3 mod..3 2024-01-25 A single row 1 5 51374.83
#> 4 mod..4 2024-01-25 Pooled Peteren 1 1 60410.94
#> 5 mod..5 2024-01-25 Logical Pooling some rows 6 5 47585.39
#> 6 mod..6 2024-01-25 A single row - Logical Pool 6 5 47580.39
#> 7 mod..7 2024-01-25 Pooled Peteren - Logical Pool 6 1 56584.83
#> 8 mod..8 2024-01-25 Logical Pooling pairs rows 6 5 47584.73
#> np AICc gof.chisq gof.df gof.p kappa.after.lp Nhat Nhat.se
#> 1 48 -94390.90 0.8 0 NA 4121 78159 14676
#> 2 28 -96050.16 2.8 1 0.096 231 73440 10424
#> 3 7 -102735.66 13.1 4 0.011 1 70426 4545
#> 4 3 -120815.88 0.0 0 NA 1 70426 4545
#> 5 40 -95090.77 2.8 1 0.096 231 73440 10424
#> 6 37 -95086.79 13.1 4 0.011 1 70426 4545
#> 7 13 -113143.67 0.0 0 NA 1 70426 4545
#> 8 39 -95091.46 3.8 2 0.147 260 83198 13337
Unfortunately, because pooling actually changes the data (internally), it is not possible to do likelihood ratio testing or to use AICc to compare different physical poolings.
Different logical poolings can be compared without problems. In this case, models 1, 5, 6 and 8 can be compared because they involve no pooling, or logical pooling of the rows. Model 8 has the (arithmetically) smallest AIC but is not much preferred over model 6 because the difference in AIC is small.
Both of these model have adequate goodness of fit. The kappa.after.lp for both models is adequate.
In this case, all of methods that lead to a single row give the same estimated population size and standard error and all are equivalent to the Pooled Petersen estimator. The Chapman modification to the Petersen estimator is also given in the output and results are similar.
Different column pooling will give the same fit and so cannot be compared. Refer to the vignette on why column poolings cannot be compared for more details.
Darroch, J. N. (1961). The two-sample capture-recapture census when tagging and sampling are stratified. Biometrika, 48, 241–260. https://www.jstor.org/stable/2332748
Plante, N., L.-P Rivest, and G. Tremblay. (1988). Stratified Capture-Recapture Estimation of the Size of a Closed Population. Biometrics 54, 47-60. https://www.jstor.org/stable/2533994
Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296. https://doi.org/10.1139/f97-238