Introduction

There are some rules of thumb that I follow when using the TreatmentPatterns package. These rules tend to work well in most situations, across databases and datasets.

TLDR

  • minPostCombinationWindow <= minEraDuration.
  • combinationWindow >= minEraDuration.
  • Small cohorts should not be considered.
  • Pathways with a low count should not be considered.

Cohorts

When creating cohorts, it is important to keep in mind that the subjects will be dived across pathways. Lets assume we have 10000 subjects in a fictitious cohort. Let’s also assume we have 5 event cohorts.

The total number of potential pathways, assuming only mono therapies equals \(events^{evnets}\), assuming we do not allow for any re-occuring treatments it would still equal to \(!5\).

Assuming our 5 event cohorts this would equal to:

5^5
## [1] 3125
factorial(5)
## [1] 120

Settings

The minEraDuration, combinationWindow, and minPostCombinationWindow have significant effects on how the treatment pathways are built. Conciser the following example:

library(dplyr)

cohort_table <- tribble(
  ~cohort_definition_id, ~subject_id, ~cohort_start_date,    ~cohort_end_date,
  1,                     1,           as.Date("2020-01-01"), as.Date("2021-01-01"),
  2,                     1,           as.Date("2020-01-01"), as.Date("2020-01-20"),
  3,                     1,           as.Date("2020-01-22"), as.Date("2020-02-28"),
  4,                     1,           as.Date("2020-02-20"), as.Date("2020-03-3")
)

cohort_table
## # A tibble: 4 × 4
##   cohort_definition_id subject_id cohort_start_date cohort_end_date
##                  <dbl>      <dbl> <date>            <date>         
## 1                    1          1 2020-01-01        2021-01-01     
## 2                    2          1 2020-01-01        2020-01-20     
## 3                    3          1 2020-01-22        2020-02-28     
## 4                    4          1 2020-02-20        2020-03-03

Assume that the target cohort is cohort_definition_id: 1, the rest are event cohorts.

cohort_table <- cohort_table %>%
  mutate(duration = as.numeric(cohort_end_date - cohort_start_date))

cohort_table
## # A tibble: 4 × 5
##   cohort_definition_id subject_id cohort_start_date cohort_end_date duration
##                  <dbl>      <dbl> <date>            <date>             <dbl>
## 1                    1          1 2020-01-01        2021-01-01           366
## 2                    2          1 2020-01-01        2020-01-20            19
## 3                    3          1 2020-01-22        2020-02-28            37
## 4                    4          1 2020-02-20        2020-03-03            12

As you can see, the duration of the treatments are: 19, 37 and 12 days. Also cohort 3 overlaps with treatment 4 for 8 days.

We can compute the overlap as follows:

cohort_table <- cohort_table %>%
  # Filter out target cohort
  filter(cohort_definition_id != 1) %>%
  mutate(overlap = case_when(
    # If the result of the next cohort_end_date is NA, set 0
    is.na(lead(cohort_end_date)) ~ 0,
    # Compute duration of cohort_end_date - next cohort_start_date
    # 2020-02-28 - 2020-02-20 = -8
    .default = as.numeric(cohort_end_date - lead(cohort_start_date))))

cohort_table
## # A tibble: 3 × 6
##   cohort_definition_id subject_id cohort_start_date cohort_end_date duration
##                  <dbl>      <dbl> <date>            <date>             <dbl>
## 1                    2          1 2020-01-01        2020-01-20            19
## 2                    3          1 2020-01-22        2020-02-28            37
## 3                    4          1 2020-02-20        2020-03-03            12
## # ℹ 1 more variable: overlap <dbl>

We see that the overlap between treatment 2 and 3 is -2, so rather than an overlap there is a gap between these treatments. Between treatment 3 and 4 there is an 8 day overlap. There is no next treatment after treatment 4, so the overlap is 0, let’s assume our minEraDuration = 5.

We can draw it out like so:

2:   -------------------
3:                        -------------------------------------
4:                                                     ------------

If we set our minCombinationWindow = 5, the combination would be computed for cohort 3 and 4. This would leave us with the following treatments:

2:   -------------------
3:                        -----------------------------
3+4:                                                   --------
4:                                                             ----

Treatment 3 now lasts 11 days; Treatment 4 lasts 4 days; and combination treatment 3+4 lasts 8 days. If our minPostCombinationDuration is not set properly, we can filter out either too many, or too little treatments.

Assuming we would set minPostCombinationDuration = 10, we would lose treatment 4 and combination treatment 3+4. This would leave us with the following paths:

2:   -------------------
3:                        -----------------------------

Pathway: 2-3

As a rule of thumb the setting the minPostCombinationDuration <= minEraDuration seems to yield reasonable results. This would leave us with the following paths minPostCombinationDuration = 5:

2:   -------------------
3:                        -----------------------------
3+4:                                                   --------

Pathway: 2-3-3+4