If you have yet not read the “Start Here” vignette, please do so by running
vignette("start-here", "spsurvey")
spsurvey’s analysis functions are used to analyze data in a variety
of contexts. We focus mainly on using the NLA_PNW
analysis
data to introduce some of these analysis functions. The
NLA_PNW
data contains response variables measured at
several lakes in the Pacific Northwest Region (PNW) of the United
States. There are several variables in NLA_PNW
you will use
throughout this vignette:
SITE_ID
: a unique site identifierWEIGHT
: the design weightsURBAN
: urban land categories (Urban
and
Non-Urban
)STATE
: state name (California
,
Oregon
, and Washington
)BMMI
: benthic macroinvertebrate multi-metric indexBMMI_COND
: benthic macroinvertebrate multi-metric index
condition categories (Poor
and Good
)PHOS_COND
: phosphorous condition categories
(Poor
and Good
)NITR_COND
: nitrogen condition categories
(Poor
Fair
, and Good
)Before proceeding, we load spsurvey by running
library(spsurvey)
Categorical variables are analyzed in spsurvey using the
cat_analysis()
function. The cat_analysis()
function estimates the proportion of observations and the total units
(i.e. extent) that belong to each level of the categorical variable
(total units refer to the total number (point resources), total line
length (linear network), or total area (areal network)). Several useful
pieces of information are returned by cat_analysis()
,
including estimates, standard errors, margins of error, and confidence
intervals. The analysis results contain columns with a .P
and .U
suffixes. The .P
suffix corresponds to
estimates of proportions for each category, while the .U
suffix corresponds to estimates of total units (i.e. extent) for each
category.
To estimate the proportion of total lakes in each nitrogen condition category and the total number of lakes in each nitrogen condition category, run
cat_ests <- cat_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars = "NITR_COND",
weight = "WEIGHT"
)
cat_ests
#> Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1 All_Sites All Sites NITR_COND Fair 24 23.69392 6.194024
#> 2 All_Sites All Sites NITR_COND Good 38 51.35111 7.430172
#> 3 All_Sites All Sites NITR_COND Poor 34 24.95496 5.919180
#> 4 All_Sites All Sites NITR_COND Total 96 100.00000 0.000000
#> MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1 12.14006 11.55386 35.83399 2530.428 693.5593 1359.351
#> 2 14.56287 36.78824 65.91398 5484.120 1223.3708 2397.763
#> 3 11.60138 13.35359 36.55634 2665.103 658.0966 1289.846
#> 4 0.00000 100.00000 100.00000 10679.652 1416.2707 2775.840
#> LCB95Pct.U UCB95Pct.U
#> 1 1171.077 3889.780
#> 2 3086.357 7881.883
#> 3 1375.258 3954.949
#> 4 7903.812 13455.491
The estimate of the proportion of lakes in Good
condition is 51.35% with a 95% confidence interval of (36.8%, 65.9%),
while the estimate of the total number of lakes in Good
condition is 5484 lakes with a 95% confidence interval of (3086, 7882).
In each case, the estimated standard error and margin of error is given.
The confidence level can be changed using the conf
argument. If more than one categorical variable is of interest, then
vars
can be a vector of variables and separate analyses are
performed for each variable.
Sometimes the goal is to estimate proportions and totals separately
for different subsets of the population – these subsets are called
subpopulations. To estimate the proportion of total lakes and in each
nitrogen condition category the total number of lakes in each nitrogen
condition category separately for California
,
Oregon
, and Washington
lakes, run
cat_ests_sp <- cat_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars = "NITR_COND",
weight = "WEIGHT",
subpop = "STATE"
)
cat_ests_sp
#> Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1 STATE California NITR_COND Fair 6 8.239162 4.712515
#> 2 STATE California NITR_COND Good 8 73.722638 13.359879
#> 3 STATE California NITR_COND Poor 5 18.038200 11.909638
#> 4 STATE California NITR_COND Total 19 100.000000 0.000000
#> 5 STATE Oregon NITR_COND Fair 8 27.152211 9.763817
#> 6 STATE Oregon NITR_COND Good 26 59.670307 9.717093
#> 7 STATE Oregon NITR_COND Poor 13 13.177483 4.901892
#> 8 STATE Oregon NITR_COND Total 47 100.000000 0.000000
#> 9 STATE Washington NITR_COND Fair 10 30.396139 11.938582
#> 10 STATE Washington NITR_COND Good 4 22.711979 11.878964
#> 11 STATE Washington NITR_COND Poor 16 46.891882 13.041148
#> 12 STATE Washington NITR_COND Total 30 100.000000 0.000000
#> MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1 9.236359 0.000000 17.47552 208.4605 87.63415 171.7598
#> 2 26.184881 47.537757 99.90752 1865.2692 940.12797 1842.6170
#> 3 23.342462 0.000000 41.38066 456.3876 299.29104 586.5997
#> 4 0.000000 100.000000 100.00000 2530.1172 978.65831 1918.1350
#> 5 19.136729 8.015482 46.28894 1298.8470 526.66732 1032.2490
#> 6 19.045152 40.625155 78.71546 2854.3752 674.02641 1321.0675
#> 7 9.607532 3.569950 22.78501 630.3551 198.49966 389.0522
#> 8 0.000000 100.000000 100.00000 4783.5773 706.53216 1384.7776
#> 9 23.399190 6.996949 53.79533 1023.1210 437.69351 857.8635
#> 10 23.282341 0.000000 45.99432 764.4755 453.48899 888.8221
#> 11 25.560181 21.331701 72.45206 1578.3606 556.63044 1090.9756
#> 12 0.000000 100.000000 100.00000 3365.9571 741.89397 1454.0855
#> LCB95Pct.U UCB95Pct.U
#> 1 36.70069 380.2202
#> 2 22.65222 3707.8861
#> 3 0.00000 1042.9873
#> 4 611.98220 4448.2523
#> 5 266.59801 2331.0960
#> 6 1533.30774 4175.4427
#> 7 241.30288 1019.4072
#> 8 3398.79970 6168.3549
#> 9 165.25751 1880.9845
#> 10 0.00000 1653.2976
#> 11 487.38500 2669.3362
#> 12 1911.87165 4820.0426
If more than one type of subpopulation is of interest, then
subpop
can be a vector of subpopulation variables and
separate analyses are performed for each subpopulation. If both
vars
and subpops
are vectors, separate
analyses are performed for each variable and subpopulation
combination.
Analysis results for all sites (ignoring subpopulations) can be
presented alongside the subpopulation analysis results using the
All_Sites
argument:
cat_ests_sp <- cat_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars = "NITR_COND",
weight = "WEIGHT",
subpop = "STATE",
All_Sites = TRUE
)
cat_ests_sp
#> Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1 STATE California NITR_COND Fair 6 8.239162 4.712515
#> 2 STATE California NITR_COND Good 8 73.722638 13.359879
#> 3 STATE California NITR_COND Poor 5 18.038200 11.909638
#> 4 STATE California NITR_COND Total 19 100.000000 0.000000
#> 5 STATE Oregon NITR_COND Fair 8 27.152211 9.763817
#> 6 STATE Oregon NITR_COND Good 26 59.670307 9.717093
#> 7 STATE Oregon NITR_COND Poor 13 13.177483 4.901892
#> 8 STATE Oregon NITR_COND Total 47 100.000000 0.000000
#> 9 STATE Washington NITR_COND Fair 10 30.396139 11.938582
#> 10 STATE Washington NITR_COND Good 4 22.711979 11.878964
#> 11 STATE Washington NITR_COND Poor 16 46.891882 13.041148
#> 12 STATE Washington NITR_COND Total 30 100.000000 0.000000
#> 13 All_Sites All Sites NITR_COND Fair 24 23.693923 6.194024
#> 14 All_Sites All Sites NITR_COND Good 38 51.351112 7.430172
#> 15 All_Sites All Sites NITR_COND Poor 34 24.954965 5.919180
#> 16 All_Sites All Sites NITR_COND Total 96 100.000000 0.000000
#> MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1 9.236359 0.000000 17.47552 208.4605 87.63415 171.7598
#> 2 26.184881 47.537757 99.90752 1865.2692 940.12797 1842.6170
#> 3 23.342462 0.000000 41.38066 456.3876 299.29104 586.5997
#> 4 0.000000 100.000000 100.00000 2530.1172 978.65831 1918.1350
#> 5 19.136729 8.015482 46.28894 1298.8470 526.66732 1032.2490
#> 6 19.045152 40.625155 78.71546 2854.3752 674.02641 1321.0675
#> 7 9.607532 3.569950 22.78501 630.3551 198.49966 389.0522
#> 8 0.000000 100.000000 100.00000 4783.5773 706.53216 1384.7776
#> 9 23.399190 6.996949 53.79533 1023.1210 437.69351 857.8635
#> 10 23.282341 0.000000 45.99432 764.4755 453.48899 888.8221
#> 11 25.560181 21.331701 72.45206 1578.3606 556.63044 1090.9756
#> 12 0.000000 100.000000 100.00000 3365.9571 741.89397 1454.0855
#> 13 12.140064 11.553860 35.83399 2530.4285 693.55933 1359.3513
#> 14 14.562870 36.788242 65.91398 5484.1199 1223.37078 2397.7627
#> 15 11.601379 13.353585 36.55634 2665.1033 658.09658 1289.8456
#> 16 0.000000 100.000000 100.00000 10679.6517 1416.27075 2775.8397
#> LCB95Pct.U UCB95Pct.U
#> 1 36.70069 380.2202
#> 2 22.65222 3707.8861
#> 3 0.00000 1042.9873
#> 4 611.98220 4448.2523
#> 5 266.59801 2331.0960
#> 6 1533.30774 4175.4427
#> 7 241.30288 1019.4072
#> 8 3398.79970 6168.3549
#> 9 165.25751 1880.9845
#> 10 0.00000 1653.2976
#> 11 487.38500 2669.3362
#> 12 1911.87165 4820.0426
#> 13 1171.07716 3889.7798
#> 14 3086.35722 7881.8826
#> 15 1375.25770 3954.9489
#> 16 7903.81199 13455.4913
To estimate the proportion of total lakes in each nitrogen condition
category and the total number of lakes in each nitrogen condition
category stratified by URBAN
category (whether the lake is
classified as Urban
or Non-Urban
), run
strat_cat_ests <- cat_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars = "NITR_COND",
weight = "WEIGHT",
stratumID = "URBAN"
)
strat_cat_ests
#> Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1 All_Sites All Sites NITR_COND Fair 24 23.69392 6.027083
#> 2 All_Sites All Sites NITR_COND Good 38 51.35111 7.472377
#> 3 All_Sites All Sites NITR_COND Poor 34 24.95496 5.882487
#> 4 All_Sites All Sites NITR_COND Total 96 100.00000 0.000000
#> MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1 11.81287 11.88106 35.50679 2530.428 683.8837 1340.387
#> 2 14.64559 36.70552 65.99670 5484.120 1229.9485 2410.655
#> 3 11.52946 13.42550 36.48443 2665.103 653.6357 1281.102
#> 4 0.00000 100.00000 100.00000 10679.652 1440.4796 2823.288
#> LCB95Pct.U UCB95Pct.U
#> 1 1190.041 3870.816
#> 2 3073.465 7894.775
#> 3 1384.001 3946.206
#> 4 7856.363 13502.940
To then compute these estimates separately for
California
, Oregon
, and
Washington
, run
strat_cat_ests_sp <- cat_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars = "NITR_COND",
weight = "WEIGHT",
stratumID = "URBAN",
subpop = "STATE"
)
strat_cat_ests_sp
#> Type Subpopulation Indicator Category nResp Estimate.P StdError.P
#> 1 STATE California NITR_COND Fair 6 8.239162 5.481100
#> 2 STATE California NITR_COND Good 8 73.722638 13.653267
#> 3 STATE California NITR_COND Poor 5 18.038200 12.512366
#> 4 STATE California NITR_COND Total 19 100.000000 0.000000
#> 5 STATE Oregon NITR_COND Fair 8 27.152211 9.592356
#> 6 STATE Oregon NITR_COND Good 26 59.670307 9.845022
#> 7 STATE Oregon NITR_COND Poor 13 13.177483 5.325784
#> 8 STATE Oregon NITR_COND Total 47 100.000000 0.000000
#> 9 STATE Washington NITR_COND Fair 10 30.396139 11.383474
#> 10 STATE Washington NITR_COND Good 4 22.711979 9.039351
#> 11 STATE Washington NITR_COND Poor 16 46.891882 12.109215
#> 12 STATE Washington NITR_COND Total 30 100.000000 0.000000
#> MarginofError.P LCB95Pct.P UCB95Pct.P Estimate.U StdError.U MarginofError.U
#> 1 10.74276 0.000000 18.98192 208.4605 87.91849 172.3171
#> 2 26.75991 46.962726 100.00000 1865.2692 953.91452 1869.6381
#> 3 24.52379 0.000000 42.56199 456.3876 306.37270 600.4795
#> 4 0.00000 100.000000 100.00000 2530.1172 994.94678 1950.0599
#> 5 18.80067 8.351538 45.95288 1298.8470 531.02949 1040.7987
#> 6 19.29589 40.374417 78.96620 2854.3752 685.86488 1344.2705
#> 7 10.43835 2.739137 23.61583 630.3551 177.79364 348.4691
#> 8 0.00000 100.000000 100.00000 4783.5773 737.53853 1445.5490
#> 9 22.31120 8.084941 52.70734 1023.1210 435.45383 853.4738
#> 10 17.71680 4.995177 40.42878 764.4755 433.94100 850.5087
#> 11 23.73363 23.158256 70.62551 1578.3606 554.91047 1087.6045
#> 12 0.00000 100.000000 100.00000 3365.9571 748.29764 1466.6364
#> LCB95Pct.U UCB95Pct.U
#> 1 36.1434 380.7775
#> 2 0.0000 3734.9073
#> 3 0.0000 1056.8671
#> 4 580.0574 4480.1771
#> 5 258.0483 2339.6457
#> 6 1510.1048 4198.6457
#> 7 281.8859 978.8242
#> 8 3338.0283 6229.1262
#> 9 169.6472 1876.5948
#> 10 0.0000 1614.9842
#> 11 490.7561 2665.9652
#> 12 1899.3207 4832.5935
Continuous variables are analyzed in spsurvey using the
cont_analysis()
function. The cont_analysis()
function estimates cumulative distribution functions (CDFs),
percentiles, and means of continuous variables. By default, all these
quantities are estimated (though this can be changed using the
statsitics
argument to cont_analysis()
). For
the quantities requiring estimation, several useful pieces of
information are returned by cont_analysis()
, including
estimates, standard errors, margins of error, and confidence intervals.
The .P
suffix corresponds to estimates of proportions for
each variable, while the .U
suffix corresponds to estimates
of total units (i.e. extent) for each variable.
To estimate the cumulative distribution function (CDF), certain
percentiles, and means of BMMI
, run
cont_ests <- cont_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars = "BMMI",
weight = "WEIGHT"
)
To view the analysis results for the mean estimates, run
cont_ests$Mean
#> Type Subpopulation Indicator nResp Estimate StdError MarginofError
#> 1 All_Sites All Sites BMMI 96 56.50929 1.782278 3.4932
#> LCB95Pct UCB95Pct
#> 1 53.01609 60.00249
Similarly, the CDF and select percentile estimates can be viewed (the output is omitted here) by running
cont_ests$CDF
cont_ests$Pct
To visualize the CDF estimates, run
plot(cont_ests$CDF)
The solid line indicates the CDF estimates, and the dashed lines
indicate lower and upper 95% confidence interval bounds for the CDF
estimates. cdf_plot()
can equivalently be used in place of
plot()
(cdf_plot()
is currently maintained for
backwards compatibility with previous spsurvey versions).
To estimate the CDF, certain percentiles, and means of
BMMI
separately for each state, run
cont_ests_sp <- cont_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars = "BMMI",
weight = "WEIGHT",
subpop = "STATE"
)
To view the analysis results for the mean estimates, run
cont_ests_sp$Mean
#> Type Subpopulation Indicator nResp Estimate StdError MarginofError LCB95Pct
#> 1 STATE California BMMI 19 50.48964 4.049094 7.936079 42.55357
#> 2 STATE Oregon BMMI 47 61.29675 2.581032 5.058730 56.23802
#> 3 STATE Washington BMMI 30 54.23036 3.143924 6.161977 48.06838
#> UCB95Pct
#> 1 58.42572
#> 2 66.35548
#> 3 60.39234
To estimate the CDF, certain percentiles, and means of
BMMI
for a design stratified by URBAN
category, run
strat_cont_ests <- cont_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars = "BMMI",
weight = "WEIGHT",
stratumID = "URBAN"
)
To view the analysis results for the mean estimates, run
strat_cont_ests$Mean
#> Type Subpopulation Indicator nResp Estimate StdError MarginofError
#> 1 All_Sites All Sites BMMI 96 56.50929 1.795959 3.520015
#> LCB95Pct UCB95Pct
#> 1 52.98928 60.02931
To then compute these estimates separately for each state, run
strat_cont_ests_sp <- cont_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars = "BMMI",
weight = "WEIGHT",
stratumID = "URBAN",
subpop = "STATE",
)
To view the analysis results for the mean estimates, run
strat_cont_ests_sp$Mean
#> Type Subpopulation Indicator nResp Estimate StdError MarginofError LCB95Pct
#> 1 STATE California BMMI 19 50.48964 4.110046 8.055543 42.43410
#> 2 STATE Oregon BMMI 47 61.29675 1.630357 3.195441 58.10131
#> 3 STATE Washington BMMI 30 54.23036 2.930517 5.743708 48.48665
#> UCB95Pct
#> 1 58.54519
#> 2 64.49219
#> 3 59.97407
Alongside the cat_analysis()
and
cont_analysis()
functions, spsurvey offers functions for
estimating attributable risk, relative risk, risk difference, change,
and trend. Attributable risk analysis, relative risk analysis, and risk
difference analysis quantify the attributable risk, relative risk, and
risk difference, respectively, of environmental resources being in poor
condition after exposure to a stressor. Attributable risk analysis is
performed using the attrisk_analysis()
function, relative
risk analysis is performed using the relrisk_analysis()
function, and risk difference analysis is performed using the
diffrisk_analysis()
function. Change and trend analysis
capture the behavior of environmental resources between two samples,
while trend analysis generalizes this approach to include more than two
samples. Often, change and trend analyses are performed to study an
environmental resource through time. Change analysis is performed using
the change_analysis()
function, and trend analysis is
performed using the trend_analysis()
function. The
attrisk_analysis()
, relrisk_analysis()
,
diffrisk_analysis()
, change_analysis()
and
trend_analysis()
functions all share very similar syntax
with the cat_analysis()
and cont_analysis()
functions.
The attributable risk is defined as \[1 - \frac{P(Response = Poor | Stressor = Good)}{P(Response = Poor)},\] where \(P(\cdot)\) is a probability and \(P(\cdot | \cdot)\) is a conditional probability. The attributable risk measures the proportion of the response variable in poor condition that could be eliminated if the stressor was always in good condition. To estimate the attributable risk of benthic macroinvertebrates with a phosphorous condition stressor, run
attrisk_ests <- attrisk_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars_response = "BMMI_COND",
vars_stressor = "PHOS_COND",
weight = "WEIGHT"
)
attrisk_ests
#> Type Subpopulation Response Stressor nResp Estimate StdError_log
#> 1 All_Sites All Sites BMMI_COND PHOS_COND 96 0.6201042 0.624808
#> MarginofError_log LCB95Pct UCB95Pct WeightTotal Count_RespPoor_StressPoor
#> 1 1.224601 -0.292713 0.8883582 10679.65 5
#> Count_RespPoor_StressGood Count_RespGood_StressPoor Count_RespGood_StressGood
#> 1 7 18 66
#> Prop_RespPoor_StressPoor Prop_RespPoor_StressGood Prop_RespGood_StressPoor
#> 1 0.03971418 0.01738181 0.158931
#> Prop_RespGood_StressGood
#> 1 0.783973
The relative risk is defined as \[\frac{P(Response = Poor | Stressor = Poor)}{P(Response = Poor | Stressor = Good)},\] which measures the risk of the response variable being in poor condition relative to the stressor’s condition. To estimate the relative risk of benthic macroinvertebrates being in poor condition with a phosphorous condition category stressor, run
relrisk_ests <- relrisk_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars_response = "BMMI_COND",
vars_stressor = "PHOS_COND",
weight = "WEIGHT"
)
relrisk_ests
#> Type Subpopulation Response Stressor nResp Estimate Estimate_num
#> 1 All_Sites All Sites BMMI_COND PHOS_COND 96 9.217166 0.1999252
#> Estimate_denom StdError_log MarginofError_log LCB95Pct UCB95Pct WeightTotal
#> 1 0.02169053 0.8753855 1.715724 1.657555 51.25389 10679.65
#> Count_RespPoor_StressPoor Count_RespPoor_StressGood Count_RespGood_StressPoor
#> 1 5 7 18
#> Count_RespGood_StressGood Prop_RespPoor_StressPoor Prop_RespPoor_StressGood
#> 1 66 0.03971418 0.01738181
#> Prop_RespGood_StressPoor Prop_RespGood_StressGood
#> 1 0.158931 0.783973
The risk difference is defined as \[P(Response = Poor | Stressor = Poor) - P(Response = Poor | Stressor = Good),\] which measures the risk of the response variable being in poor condition differenced by the stressor’s condition. To estimate the risk difference of benthic macroinvertebrates being in poor condition with a phosphorous condition category stressor, run
diffrisk_ests <- diffrisk_analysis(
NLA_PNW,
siteID = "SITE_ID",
vars_response = "BMMI_COND",
vars_stressor = "PHOS_COND",
weight = "WEIGHT"
)
diffrisk_ests
#> Type Subpopulation Response Stressor nResp Estimate
#> 1 All_Sites All Sites BMMI_COND PHOS_COND 96 0.1782347
#> Estimate_StressPoor Estimate_StressGood StdError MarginofError LCB95Pct
#> 1 0.1999252 0.02169053 0.1139557 0.223349 -0.0451143
#> UCB95Pct WeightTotal Count_RespPoor_StressPoor Count_RespPoor_StressGood
#> 1 0.4015837 10679.65 5 7
#> Count_RespGood_StressPoor Count_RespGood_StressGood Prop_RespPoor_StressPoor
#> 1 18 66 0.03971418
#> Prop_RespPoor_StressGood Prop_RespGood_StressPoor Prop_RespGood_StressGood
#> 1 0.01738181 0.158931 0.783973
By default, the levels of the variables in vars_response
and vars_stressor
are assumed to equal "Poor"
(event occurs) or "Good"
(event does not occur). If those
default levels do not match the levels of the variables in
vars_response
and vars_stressor
, the levels of
vars_response
and vars_stressor
must be
explicitly stated using the response_levels
and
stressor_levels
arguments, respectively. Similar to
cat_analysis()
and cont_analysis()
from the
previous sections, subpopulations and stratification are incorporated
using subpops
and stratumID
, respectively. For
more on attributable and relative risk in an environmental resource
context, see Van Sickle and Paulsen (2008).
To demonstrate change analysis, we use the NRSA_EPA7
data. There are several variables in NRSA_EPA7
you will use
next:
SITE_ID
: a unique site identifierWEIGHT
: the survey design weightsNITR_COND
: nitrogen condition category
(Good
, Fair
, and Poor
)BMMI
: benthic macroinvertebrate multi-metric indexYEAR
: probability sample (survey) yearTo estimate the change between samples (time points) for
BMMI
(a continuous variable) and NITR_COND
(a
categorical variable), run
change_ests <- change_analysis(
NRSA_EPA7,
siteID = "SITE_ID",
vars_cont = "BMMI",
vars_cat = "NITR_COND",
surveyID = "YEAR",
weight = "WEIGHT"
)
The surveyID
argument is the variable in the data
distinguishing the different samples (YEAR
in the previous
example).
To view the analysis results for NITR_COND
(the
categorical variable), run
change_ests$catsum
#> Survey_1 Survey_2 Type Subpopulation Indicator Category DiffEst.P
#> 1 2008-09 2013-14 All_Sites All Sites NITR_COND Fair -1.8867976
#> 2 2008-09 2013-14 All_Sites All Sites NITR_COND Good -2.9648182
#> 3 2008-09 2013-14 All_Sites All Sites NITR_COND Not Assessed -0.2447633
#> 4 2008-09 2013-14 All_Sites All Sites NITR_COND Poor 5.0963791
#> StdError.P MarginofError.P LCB95Pct.P UCB95Pct.P DiffEst.U StdError.U
#> 1 3.5366139 6.9316358 -8.8184334 5.0448382 -2919.8839 4990.4714
#> 2 4.7633367 9.3359683 -12.3007865 6.3711502 -4597.9365 6772.4175
#> 3 0.2072643 0.4062305 -0.6509938 0.1614672 -363.1305 306.1656
#> 4 5.8020788 11.3718655 -6.2754864 16.4682446 6598.4548 19777.9218
#> MarginofError.U LCB95Pct.U UCB95Pct.U nResp_1 Estimate.P_1 StdError.P_1
#> 1 9781.1441 -12701.028 6861.260 37 11.2929433 2.7579622
#> 2 13273.6943 -17871.631 8675.758 22 18.5076549 3.6712147
#> 3 600.0735 -963.204 236.943 1 0.2447633 0.2072643
#> 4 38764.0144 -32165.560 45362.469 119 69.9546385 4.3636869
#> MarginofError.P_1 LCB95Pct.P_1 UCB95Pct.P_1 Estimate.U_1 StdError.U_1
#> 1 5.4055067 5.887437 16.6984500 16754.1954 3998.9767
#> 2 7.1954486 11.312206 25.7031034 27457.9317 5388.5407
#> 3 0.4062305 0.000000 0.6509938 363.1305 306.1656
#> 4 8.5526691 61.401969 78.5073076 103784.6070 12266.4461
#> MarginofError.U_1 LCB95Pct.U_1 UCB95Pct.U_1 nResp_2 Estimate.P_2 StdError.P_2
#> 1 7837.8504 8916.345 24592.046 28 9.406146 2.213884
#> 2 10561.3456 16896.586 38019.277 34 15.542837 3.035055
#> 3 600.0735 0.000 963.204 0 0.000000 0.000000
#> 4 24041.7926 79742.814 127826.400 112 75.051018 3.823919
#> MarginofError.P_2 LCB95Pct.P_2 UCB95Pct.P_2 Estimate.U_2 StdError.U_2
#> 1 4.339133 5.067013 13.74528 13834.31 2985.463
#> 2 5.948599 9.594238 21.49144 22860.00 4102.349
#> 3 0.000000 0.000000 0.00000 0.00 0.000
#> 4 7.494743 67.556274 82.54576 110383.06 15514.525
#> MarginofError.U_2 LCB95Pct.U_2 UCB95Pct.U_2
#> 1 5851.400 7982.911 19685.71
#> 2 8040.456 14819.539 30900.45
#> 3 0.000 0.000 0.00
#> 4 30407.911 79975.151 140790.97
Estimates are provided for the difference between the two samples and
for each of the two individual samples (the _1
and
_2
suffixes).
To view the analysis results for BMMI
(the continuous
variable), run
change_ests$contsum_mean
#> Survey_1 Survey_2 Type Subpopulation Indicator DiffEst StdError
#> 1 2008-09 2013-14 All_Sites All Sites BMMI 3.971559 2.561155
#> MarginofError LCB95Pct UCB95Pct nResp_1 Estimate_1 StdError_1
#> 1 5.019771 -1.048211 8.99133 179 25.88274 1.468744
#> MarginofError_1 LCB95Pct_1 UCB95Pct_1 nResp_2 Estimate_2 StdError_2
#> 1 2.878686 23.00405 28.76142 174 29.8543 2.098166
#> MarginofError_2 LCB95Pct_2 UCB95Pct_2
#> 1 4.112331 25.74196 33.96663
Though we don’t show an example here, the median can be estimated
using the test
argument in
change_anlaysis()
.
Trend analysis generalizes change analysis to more than two samples
(usually time points). Though we omit an example here, the arguments to
trend_analysis()
are very similar to the arguments for
change_analysis()
. One difference is that
trend_analysis()
contains arguments that specify which
statistical model to apply to the estimates from each sample.
The default variance estimator in spsurvey is the local neighborhood
variance estimator (Stevens and Olsen, 2003). The local neighborhood
variance estimator incorporates the spatial locations of design sites
into the variance estimation process. Because the local neighborhood
variance estimator incorporates spatial locations, the resulting
variance estimate tends to be smaller than variance estimates from
approaches ignoring spatial locations. The local neighborhood variance
estimator requires x-coordinates and y-coordinates of the design sites.
When the analysis data is an sf
object, the coordinates
from the data’s geometry are used. When the analysis data is just a data
frame, x-coordinates and y-coordinates (from an appropriate coordinate
reference system) must be provided using the xcoord
and
ycoord
arguments, respectively, of the analysis function
being used.
Several additional variance estimation options are available in
spsurvey’s analysis functions through the vartype
and
jointprob
arguments.
Sickle, J. V., & Paulsen, S. G. (2008). Assessing the attributable risks, relative risks, and regional extents of aquatic stressors. Journal of the North American Benthological Society, 27(4), 920-931.
Stevens Jr, D. L., & Olsen, A. R. (2003). Variance estimation for spatially balanced samples of environmental resources. Environmetrics, 14(6), 593-610.