The REDI package implements the Robust Exponential Decreasing Index (REDI). It represents a measure of cumulated workload, is robust to missing data and provides control of the decreasing influence of the workload over time.
REDI provides various functions to format data, compute REDI and visualise results in a simple and convenient way.
Issa Moussa, Arthur Leroy et al. (2019): Robust Exponential Decreasing Index (REDI): adaptive and robust method for computing cumulated workload. BMJ Open Sport & Exercise Medicine, https://bmjopensem.bmj.com/content/bmjosem/5/1/e000573.full.pdf.
You can install the development version of REDI from GitHub with:
#install.packages("devtools")
::install_github("Grenouil/REDI") devtools
Here is a basic example on how to simulate a dataset with the adequate format, compute REDI values over time and display results.
library(REDI)
set.seed(42)
## Generate a synthetic dataset with the correct format
= simu_db()
data
data#> # A tibble: 366 x 2
#> Input Output
#> <date> <dbl>
#> 1 2022-01-01 54.3
#> 2 2022-01-02 NA
#> 3 2022-01-03 51.1
#> 4 2022-01-04 NA
#> 5 2022-01-05 NA
#> 6 2022-01-06 49.7
#> 7 2022-01-07 54.8
#> 8 2022-01-08 NA
#> 9 2022-01-09 NA
#> 10 2022-01-10 NA
#> # i 356 more rows
## Compute REDI over all observations for different Lambda values and display results
= redi(data) res
res#> # A tibble: 1,098 x 4
#> Input Output REDI Lambda
#> <date> <dbl> <dbl> <dbl>
#> 1 2023-01-01 NA 49.2 0.05
#> 2 2022-12-31 49.5 49.2 0.05
#> 3 2022-12-30 NA 49.2 0.05
#> 4 2022-12-29 NA 49.2 0.05
#> 5 2022-12-28 NA 49.2 0.05
#> 6 2022-12-27 47.3 49.2 0.05
#> 7 2022-12-26 45.0 49.3 0.05
#> 8 2022-12-25 49.1 49.7 0.05
#> 9 2022-12-24 52.6 49.7 0.05
#> 10 2022-12-23 NA 49.5 0.05
#> # i 1,088 more rows
For a advanced use of the package, here is a quick step-by-step guide.
In order to test different features of the package, the
simu_db()
function is provided as a handy way to generate
synthetic data with the correct format for subsequent REDI
computations.
set.seed(42)
## Generate a synthetic dataset, containing dates (Inputs) from '2022-01-01' to '2023-01-01' and observations (Outputs) following a Gaussian distribution (mean = 50, var = 10), with 50% of missing values.
<- simu_db(start_date = '2022-01-01',
simu_data end_date = '2023-01-01',
by = 'day',
output_distrib = 'Gaussian',
ratio_missing = 0.5,
mean = 50,
var = 10)
simu_data#> # A tibble: 366 x 2
#> Input Output
#> <date> <dbl>
#> 1 2022-01-01 54.3
#> 2 2022-01-02 NA
#> 3 2022-01-03 51.1
#> 4 2022-01-04 NA
#> 5 2022-01-05 NA
#> 6 2022-01-06 49.7
#> 7 2022-01-07 54.8
#> 8 2022-01-08 NA
#> 9 2022-01-09 NA
#> 10 2022-01-10 NA
#> # i 356 more rows
As displayed above, any dataset processed in REDI should provide 2
columns: one corresponding to Input
values (e.g.
time) and another to Output
values (e.g.
workload).
format_data()
However, a real-life dataset will generally not have the correct
format to compute REDI directy. Therefore, the
format_data()
function is designed to help with this
process by identifying the columns corresponding to the Input (e.g the
date) and the Output (e.g. the workload) variables. The Input column
should be defined with a correct Date
type. The function
will automatically identify missing values between each observations,
considering the by
argument as the time increment (for
instance with ‘day’, the default, each day between two observed dates is
considered missing). Finally, Output values for duplicated Input values
can be summarised according to the summarise_duplicate
argument.
## Create a dummy real-life dataset
<- data.frame(
raw_db 'Var1' = 1:100,
'Var2' = rnorm(n = 100, mean = 50, sd = 10),
'Var3' = c(
as.Date("2022/1/1"),
seq(from = as.Date("2022/1/3"), by = "day", length.out = 99)
)
)
head(raw_db)
#> Var1 Var2 Var3
#> 1 1 58.84844 2022-01-01
#> 2 2 51.63431 2022-01-03
#> 3 3 41.39630 2022-01-04
#> 4 4 59.74555 2022-01-05
#> 5 5 43.81340 2022-01-06
#> 6 6 44.72518 2022-01-07
## Convert the dataset to the correct format (adding the missing data point on 2022-01-02)
<- format_data(
db data = raw_db,
input = 'Var3',
output = 'Var2',
by = 'day',
format = '%Y%m%d'
)
db#> # A tibble: 101 x 2
#> Input Output
#> <date> <dbl>
#> 1 2022-01-01 58.8
#> 2 2022-01-02 NA
#> 3 2022-01-03 51.6
#> 4 2022-01-04 41.4
#> 5 2022-01-05 59.7
#> 6 2022-01-06 43.8
#> 7 2022-01-07 44.7
#> 8 2022-01-08 32.4
#> 9 2022-01-09 45.0
#> 10 2022-01-10 46.8
#> # i 91 more rows
compute_redi()
To compute a single REDI value, simply use the
compute_redi()
function. It will correspond to the REDI
value for the most recent observed Input in the dataset, using all data
from the past. Feel free to adapt the \(\lambda\) coefficient, controlling the
exponential decay of weights over time, depending on the context.
## Compute REDI at the current date (2022-04-11 in this example)
compute_redi(db, coef = 0.05)
#> [1] 49.85942
max(db$Input)
#> [1] "2022-04-11"
loop_redi()
To sequentially compute REDI for all Input
values in the
dataset with speed-up vectorised operations, one can use the
loop_redi()
function.
## Apply loop_redi() to tcompute REDI for all dates in the dataset
<- loop_redi(data = db, coef = 0.5)
db_redi
db_redi#> # A tibble: 101 x 4
#> Input Output REDI Lambda
#> <date> <dbl> <dbl> <dbl>
#> 1 2022-04-11 50.1 51.4 0.5
#> 2 2022-04-10 63.9 52.3 0.5
#> 3 2022-04-09 40.9 44.9 0.5
#> 4 2022-04-08 48.3 47.4 0.5
#> 5 2022-04-07 48.7 46.8 0.5
#> 6 2022-04-06 37.9 45.6 0.5
#> 7 2022-04-05 54.4 50.6 0.5
#> 8 2022-04-04 36.9 48.2 0.5
#> 9 2022-04-03 55.0 55.5 0.5
#> 10 2022-04-02 63.0 55.8 0.5
#> # i 91 more rows
plot_redi()
The plot_redi()
function is proposed to display the
results, and provides several options to personalise the graphs.
## Display results as time series of REDI values
plot_redi(redi = db_redi,
x_axis = 'Date',
y_axis = 'Workload',
plot_data = TRUE)
#> Warning: Removed 1 rows containing missing values (`geom_point()`).
One can customise graphs by:
x_axis
and y_axis
;plot_data
to FALSE
.redi()
As presented in the nutshell example, all the previous steps
(formatting, computations, plotting) are wrapped into the
redi()
function. In addition to the arguments of the
previous functions, it is also possible to provide a vector to the
coef
argument to display results for different \(\lambda\) values.
## Apply redi() on db and provide a vector of coefficients.
<- redi(data = db, coef = c(0.1, 0.2, 0.3), plot = TRUE) db_full_redi
db_full_redi#> # A tibble: 303 x 4
#> Input Output REDI Lambda
#> <date> <dbl> <dbl> <dbl>
#> 1 2022-04-11 50.1 50.2 0.1
#> 2 2022-04-10 63.9 50.2 0.1
#> 3 2022-04-09 40.9 48.7 0.1
#> 4 2022-04-08 48.3 49.5 0.1
#> 5 2022-04-07 48.7 49.7 0.1
#> 6 2022-04-06 37.9 49.8 0.1
#> 7 2022-04-05 54.4 51.0 0.1
#> 8 2022-04-04 36.9 50.7 0.1
#> 9 2022-04-03 55.0 52.1 0.1
#> 10 2022-04-02 63.0 51.8 0.1
#> # i 293 more rows