The accelerometry package has functions for processing time series accelerometer data from research studies. It was primarily developed for uniaxial minute-to-minute count data. It is also compatible with triaxial data, and could potentially also be used with epochs other than 1 minute.
If you have “raw data,” meaning your device recorded actual acceleration data at 30 Hz or some other sampling frequency, this package probably won’t be very helpful. Then again, you can usually convert raw data to counts after data collection, e.g. using ActiLife software for ActiGraph devices.
C++ is used for most of the functions, through the Rcpp package (Eddelbuettel and François 2011), so they should be pretty fast. For example, the nhanesaccel package, which relies on accelerometry’s functions to process data from the 2003-2006 National Health and Nutrition Examination Survey (NHANES), can process that dataset about 90 times faster than a SAS script provided by the National Cancer Institute (Van Domelen and Pittard 2014).
The package is available to download from either CRAN or GitHub. To install from CRAN:
install.packages("accelerometry")
And to install from GitHub (devtools package must be installed):
library("devtools")
install_github("vandomed/accelerometry")
Converting time series accelerometer data to meaningful physical activity variables typically requires 3 steps:
In using the accelerometry package, you can either call separate functions to implement these steps (e.g. nonwear for 1, bouts and intensities for 2, and personvars for 3) or call an umbrella function that does it all internally (e.g. process_uni or process_tri). The composite functions should be sufficient for most users.
I’ll use data from the first 5 subjects in NHANES 2003-2004 (included with the package) to illustrate the various functions. Here’s what the dataset looks like:
head(unidata)
#> seqn paxday paxinten
#> [1,] 21005 1 0
#> [2,] 21005 1 0
#> [3,] 21005 1 0
#> [4,] 21005 1 0
#> [5,] 21005 1 0
#> [6,] 21005 1 0
dim(unidata)
#> [1] 50400 3
There are 3 variables – seqn
is the subject ID, paxday
is the day of the week (1 = Sunday, …, 7 = Saturday), and paxinten
is the count value recorded for each minute – and 50,400 observations (1 obs/min x 1,440 min/day x 7 days/subject x 5 subjects).
Here’s what 7 days of data looks like, for the 3rd subject in the dataset, SEQN = 21007. You can see the 7 separate clouds of data corresponding to each day of the week.
seqn <- unidata[, "seqn"]
counts <- unidata[, "paxinten"]
counts.21007 <- counts[seqn == 21007]
plot(counts.21007, main = "Counts for Subject 21007")
weartime
There are two distinct approaches for identifying non-wear time:
Approach 2 is seriously flawed, in my opinion, because it is very possible for somebody wearing the device to record 0 counts for several hours consecutively. But this is the only option for studies like NHANES, where journals were not part of the protocol.
The weartime function is for implementing the signal-based approach of removing non-wear time. The default behavior is to classify as non-wear any interval 60 minutes or longer with 0 counts. To do that for our subject of interest:
wear1 <- weartime(counts.21007)
If you have reason to prefer a 90- rather than 60-minute window size, and perhaps want to allow for up 2 nonzero count values as long as they are below 100, you could use:
wear2 <- weartime(counts.21007,
window = 90,
tol = 2,
tol_upper = 99)
Let’s see how these two compare (multiplying 0/1 vectors by 5000 and 5100 just to make it easier to visualize):
par(mfrow = c(1, 2))
plot(counts.21007, main = "Subject 21007 (whole week)")
points(wear1 * 5000, type = "l", col = "blue")
points(wear2 * 5100, type = "l", col = "red")
plot(counts.21007, xlim = c(4321, 5760), main = "Subject 21007 (day 4)")
points(wear1 * 5000, type = "l", col = "blue")
points(wear2 * 5100, type = "l", col = "red")
It looks like the main difference here is an extra non-wear period for the 60-minute method on the 4th day of monitoring. Personally, I wouldn’t go higher than 60 minutes. Even at 60, you’ll have no chance of removing short non-wear periods, e.g. when subjects are showering or swimming for exercise. That’s all going to be classified as sedentary time, unfortunately.
bouts
A “bout” is a sustained period of activity of some intensity for some time. Bouted MVPA (moderate-to-vigorous physical activity) is a popular bout variable because physical activity guidelines are often based on MVPA accumulated in bouts of \(\ge\) 10 minutes (Troiano et al. 2008). Bouted sedentary time can also be interesting, given recent enthusiasm for studying sedentary behaviors.
Here is how you would use bouts to identify bouted MVPA for our subject of interest. Note that 2020 is a common cutpoint used to classify MVPA (Troiano et al. 2008).
mvpa <- bouts(counts = counts.21007,
bout_length = 10,
thresh_lower = 2020)
sum(mvpa)
#> [1] 0
Zero bouted MVPA for the entire week - not good! That’s actually not uncommon in NHANES. And we’re using a strict definition of a bout, not allowing for any minutes to dip below 2020 counts during a 10-minute window. We could be more liberal and define bouted MVPA as any 10-minute interval with counts \(\ge\) 2020, allowing for 1-2 minutes with counts < 2020, as long as they are still \(\ge\) 100. To implement that algorithm:
mvpa <- bouts(counts = counts.21007,
bout_length = 10,
thresh_lower = 2020,
tol = 2,
tol_lower = 100)
sum(mvpa)
#> [1] 10075
With this more forgiving algorithm, the subject is considered to have 21 rather than 0 minutes of bouted MVPA over the course of the week.
intensities
This is a simple function that calculates the number of minutes spent in 5 different intensity levels, and the number of counts accumulated from each intensity level. Default behavior is to categorize counts as follows:
It also lumps intensities 2-3, 4-5, and 2-5 together, for a total of 8 categories. Applied to our subject:
(intensity.data <- intensities(counts = counts.21007[wear1 == 1]))
#> [1] 3379 2069 790 247 12 2859 259 3118
#> [9] 62933 702722 953350 717948 91473 1656072 809421 2465493
I subsetted just the valid wear time for this subject, to avoid classifying non-wear time as sedentary time. The first 8 numbers are time spent in the various intensities (e.g. 3379 minutes of sedentary time), and the next 8 are counts accumulated from those intensities (e.g. 62933 counts accumulated during sedentary time).
sedbreaks
There is some evidence in the epidemiological literature that “sedentary breaks” are beneficial (Stamatakis et al. 2018), i.e. that breaking up prolonged periods of sedentary time can lead to improved health. To count the number of sedentary breaks over the 7-day period for our subject of interest:
breakcount <- sedbreaks(counts = counts.21007,
weartime = wear1)
sum(breakcount)
#> [1] 729
This subject had 729 sedentary breaks over the 7-day period, or a little more than 100 a day. The definition of a sedentary break here is having a minute with counts \(\ge\) 100 after a minute of counts < 100. Note that we had to give sedbreaks the wear time vector; this prevents the first minute after each non-wear periods being classified as a sedentary break.
process_uni
With uniaxial accelerometer data, you can use process_uni to calculate physical activity variables for each subject. To calculate a few basic activity variables for our subject of interest, you can run:
(averages.21007 <- process_uni(counts.21007))
#> id valid_days valid_wk_days valid_we_days include valid_min counts
#> [1,] 1 7 5 2 1 928.1429 361203.7
#> cpm steps
#> [1,] 402.281 NaN
It looks like all 7 days were valid for analysis, meaning they had at least 10 hours of wear time. The subject averaged about 928 minutes (15.5 hours) of wear time per day. The variables counts
and cpm
(counts per minute of wear time) are probably the two most popular measures of total physical activity volume; this subject averaged about 361,000 counts per day, and 402.3 counts per minute of wear time.
All we specified was the vector of counts, so the function used default values for everything, which basically means the data-processing parameters that I personally prefer. But if you look at the help file (?process_uni
), you’ll see a long list of parameters that you can adjust if you like.
Oftentimes researchers want more detailed variables than just counts and cpm. You can request a lot more variables with the brevity
input. If we set it to 2 rather than 1:
averages.21007 <- process_uni(counts = counts.21007,
brevity = 2)
colnames(averages.21007)
#> [1] "id" "valid_days" "valid_wk_days"
#> [4] "valid_we_days" "include" "valid_min"
#> [7] "counts" "cpm" "steps"
#> [10] "sed_min" "light_min" "life_min"
#> [13] "mod_min" "vig_min" "lightlife_min"
#> [16] "mvpa_min" "active_min" "sed_percent"
#> [19] "light_percent" "life_percent" "mod_percent"
#> [22] "vig_percent" "lightlife_percent" "mvpa_percent"
#> [25] "active_percent" "sed_counts" "light_counts"
#> [28] "life_counts" "mod_counts" "vig_counts"
#> [31] "lightlife_counts" "mvpa_counts" "active_counts"
#> [34] "sed_bouted_10min" "sed_bouted_30min" "sed_bouted_60min"
#> [37] "sed_breaks" "max_1min_counts" "max_50min_counts"
#> [40] "max_10min_counts" "max_30min_counts" "num_mvpa_bouts"
#> [43] "num_vig_bouts" "mvpa_bouted" "vig_bouted"
#> [46] "guideline_min"
Now we get a lot more variables, with various measures of intensity, sedentary behaviors, and bouted activity. Let’s look at a few of the variables:
averages.21007[, c("sed_percent", "sed_bouted_60min", "max_1min_counts")]
#> sed_percent sed_bouted_60min max_1min_counts
#> 0.5035413 52.4285714 7173.2857143
The subject on average spent 50.4% of wear time sedentary, 570.7 minutes per day in sedentary bouts \(\ge\) 1 hour, and achieved a max 5-minute count average of 3371 counts/min.
If you set brevity = 3
, you also get hourly count averages, which can be interesting to plot for individual subjects and/or across subgroups.
If you want a daily summary rather than averages across all valid days, you can set return_form = "daily"
.
(daily.21007 <- process_uni(counts = counts.21007,
return_form = "daily"))
#> id day valid_day valid_min counts cpm steps
#> [1,] 1 1 1 737 438907 595.5319 NA
#> [2,] 1 2 1 953 400517 420.2697 NA
#> [3,] 1 3 1 941 316556 336.4038 NA
#> [4,] 1 4 1 826 386072 467.3995 NA
#> [5,] 1 5 1 1098 293143 266.9791 NA
#> [6,] 1 6 1 1284 437497 340.7298 NA
#> [7,] 1 7 1 658 255734 388.6535 NA
process_tri
This function is very similar to process_uni, but works on triaxial rather than uniaxial accelerometer data. To illustrate, we can use the tridata
dataset included with the package:
head(tridata)
#> vert ap ml
#> [1,] 0 0 0
#> [2,] 0 0 0
#> [3,] 0 0 0
#> [4,] 0 0 0
#> [5,] 0 0 0
#> [6,] 0 0 0
This is not real data and doesn’t closely resemble real data; I generated it as 0’s from midnight to 8 am and then multivariate normal for the rest of the day. The 3 columns are intended to represent counts in 3 axes: vertical, anteroposterior (AP), and mediolateral (ML).
Let’s see what happens when we run process_tri:
(daily.tri <- process_tri(tridata))
#> id day valid_day valid_min counts_vert counts_ap counts_ml counts_sum
#> [1,] 1 1 1 959 1544623 1529602 1522205 4596430
#> [2,] 1 2 1 959 1590602 1588969 1572505 4752076
#> [3,] 1 3 1 960 1524408 1514676 1598054 4637138
#> [4,] 1 4 1 960 1522137 1510912 1555942 4588991
#> [5,] 1 5 1 960 1534686 1547806 1522350 4604842
#> [6,] 1 6 1 959 1528161 1503414 1535805 4567380
#> [7,] 1 7 1 960 1556011 1538843 1569277 4664131
#> counts_vm cpm_vert cpm_ap cpm_ml cpm_sum cpm_vm steps
#> [1,] 3153402 1610.660 1594.997 1587.284 4792.941 3288.219 NA
#> [2,] 3247601 1658.605 1656.902 1639.734 4955.241 3386.445 NA
#> [3,] 3169986 1587.925 1577.787 1664.640 4830.352 3302.068 NA
#> [4,] 3149399 1585.559 1573.867 1620.773 4780.199 3280.624 NA
#> [5,] 3179197 1598.631 1612.298 1585.781 4796.710 3311.663 NA
#> [6,] 3137262 1593.494 1567.689 1601.465 4762.649 3271.389 NA
#> [7,] 3210668 1620.845 1602.961 1634.664 4858.470 3344.446 NA
Results are similar to what process_uni produced, but instead of a single counts
and cpm
variable for each day, we have 5: one for each axis of the accelerometer, plus one for the triaxial sum and one for the triaxial vector magnitude.
I won’t give a full tutorial here, but to illustrate adjusting data-processing parameters, suppose we wanted to define non-wear time as 45 minutes of 0 counts in all 3 axes, as opposed to 60 minutes of 0 counts in the vertical axis only. You would run:
(daily.tri <- process_tri(counts = tridata,
nonwear_axis = "sum",
nonwear_window = 45))
#> id day valid_day valid_min counts_vert counts_ap counts_ml counts_sum
#> [1,] 1 1 1 960 1544623 1531284 1522205 4598112
#> [2,] 1 2 1 960 1590602 1588969 1574682 4754253
#> [3,] 1 3 1 960 1524408 1514676 1598054 4637138
#> [4,] 1 4 1 960 1522137 1510912 1555942 4588991
#> [5,] 1 5 1 960 1534686 1547806 1522350 4604842
#> [6,] 1 6 1 960 1528161 1506124 1539040 4573325
#> [7,] 1 7 1 960 1556011 1538843 1569277 4664131
#> counts_vm cpm_vert cpm_ap cpm_ml cpm_sum cpm_vm steps
#> [1,] 3155084 1608.982 1595.088 1585.630 4789.700 3286.546 NA
#> [2,] 3249778 1656.877 1655.176 1640.294 4952.347 3385.185 NA
#> [3,] 3169986 1587.925 1577.787 1664.640 4830.352 3302.068 NA
#> [4,] 3149399 1585.559 1573.867 1620.773 4780.199 3280.624 NA
#> [5,] 3179197 1598.631 1612.298 1585.781 4796.710 3311.663 NA
#> [6,] 3141482 1591.834 1568.879 1603.167 4763.880 3272.378 NA
#> [7,] 3210668 1620.845 1602.961 1634.664 4858.470 3344.446 NA
Eddelbuettel, Dirk. 2013. Seamless R and C++ Integration with Rcpp. New York: Springer. doi:10.1007/978-1-4614-6868-4.
Eddelbuettel, Dirk, and James Joseph Balamuta. 2017. “Extending R with C++: A Brief Introduction to Rcpp.” PeerJ Preprints 5 (August): e3188v1. doi:10.7287/peerj.preprints.3188v1.
Eddelbuettel, Dirk, and Romain François. 2011. “Rcpp: Seamless R and C++ Integration.” Journal of Statistical Software 40 (8): 1–18. doi:10.18637/jss.v040.i08.
Stamatakis, Emmanuel, Ulf Ekelund, Ding Ding, Mark Hamer, Adrian E Bauman, and I-Min Lee. 2018. “Is the Time Right for Quantitative Public Health Guidelines on Sitting? A Narrative Review of Sedentary Behaviour Research Paradigms and Findings.” Br J Sports Med. BMJ Publishing Group Ltd; British Association of Sport; Exercise Medicine, bjsports–2018.
Troiano, Richard P, David Berrigan, Kevin W Dodd, Louise C Masse, Timothy Tilert, Margaret McDowell, and others. 2008. “Physical Activity in the United States Measured by Accelerometer.” Medicine and Science in Sports and Exercise 40 (1). WILLIAMS & WILKINS: 181.
Van Domelen, Dane, and W. Stephen Pittard. 2014. “Flexible R Functions for Processing Accelerometer Data, with Emphasis on Nhanes 2003-2006.” The R Journal 6 (2). https://journal.r-project.org/archive/2014/RJ-2014-024/RJ-2014-024.pdf.