In this vignette we will introduce how to create a drug users
cohorts. A cohort is a set of people that satisfy a certain inclusion
criteria during a certain time frame. The cohort object is defined in :
vignette("cdm_reference", package = "omopgenerics")
.
The function generateDrugUtilisationCohortSet
is used to
generate cohorts of drug users based on the drug_exposure
table and a conceptSet.
These cohorts can be subsetted to the exposures of interest applying the different inclusion criteria:
Require that entries are in a certain date range
requireDrugInDateRange()
.
Subset to the first entry
requireIsFirstDrugEntry()
.
Require a certain time in observation before the entries
requireObservationBeforeDrug()
.
Require a certain time before exposure
requirePriorDrugWashout()
.
cdm_reference
objectThe first thing that we need is a cdm_reference
object
to our OMOP CDM instance. You can learn how to create cdm references
using CDMConnector here:
vignette("a04_DBI_connection_examples", package = "CDMConnector")
.
The DrugUtilisation packages contains some mock data that can be useful to test the package:
library(DrugUtilisation)
cdm <- mockDrugUtilisation(numberIndividuals = 100, seed = 1)
cdm
#>
#> ── # OMOP CDM reference (duckdb) of DUS MOCK ───────────────────────────────────
#> • omop tables: person, observation_period, concept, concept_ancestor,
#> drug_strength, concept_relationship, drug_exposure, condition_occurrence,
#> observation, visit_occurrence
#> • cohort tables: cohort1, cohort2
#> • achilles tables: -
#> • other tables: -
To create a basic drug users cohort we need two things:
There are three possible forms of a conceptSet:
codelist
object, see
vignette("codelists", package = "omopgenerics")
conceptSet <- list(acetaminophen = c(1, 2, 3)) |> omopgenerics::newCodelist()
conceptSet
#>
#> ── 1 codelist ──────────────────────────────────────────────────────────────────
#>
#> - acetaminophen (3 codes)
conceptSet$acetaminophen
#> [1] 1 2 3
conceptSetExpression
object, see
vignette("codelists", package = "omopgenerics")
conceptSet <- list(acetaminophen = dplyr::tibble(
concept_id = 1125315,
excluded = FALSE,
descendants = TRUE,
mapped = FALSE
)) |>
omopgenerics::newConceptSetExpression()
conceptSet
#>
#> ── 1 conceptSetExpression ──────────────────────────────────────────────────────
#>
#> - acetaminophen (1 concept criteria)
conceptSet$acetaminophen
#> # A tibble: 1 × 4
#> concept_id excluded descendants mapped
#> <dbl> <lgl> <lgl> <lgl>
#> 1 1125315 FALSE TRUE FALSE
The package CodelistGenerator can be very useful to create conceptSet.
For example we can create a conceptSet based in an ingredient with
getDrugIngredientCodes()
:
codes <- getDrugIngredientCodes(cdm = cdm, name = "acetaminophen")
codes[["161_acetaminophen"]]
#> [1] 1125315 1125360 2905077 43135274
We could also use the function codesFromConceptSet()
to
read a concept set from a json file:
The gapEra
parameter is used to join exposures into
episodes, let’s say for example we have an individual with 4 drug
exposures that we are interested in. The first two overlap each other,
then there is a gap of 29 days and two consecutive exposures:
If we would create the episode with gapEra = 0, we would have 3 resultant episodes, the first two that overlap would be joined in a single episode, but then the other two would be independent:
#> # A tibble: 3 × 4
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 1 2020-01-01 2020-02-15
#> 2 1 1 2020-03-15 2020-04-19
#> 3 1 1 2020-04-20 2020-05-15
If, instead we would use a gapEra = 1, we would have 2 resultant episodes, the first two that overlap would be joined in a single episode (as before), now the two consecutive exposures would be joined in a single episode:
#> # A tibble: 2 × 4
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 1 2020-01-01 2020-02-15
#> 2 1 1 2020-03-15 2020-05-15
The result would be the same for any value between 1 and 28 (gapEra \(\in\) [1, 28]).
Whereas, if we would use a gapEra = 29 all the records would be collapsed into a single episode:
#> # A tibble: 1 × 4
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 1 2020-01-01 2020-05-15
We will then create now a cohort with all the drug users of acetaminophen with a gapEra of 30 days.
codes <- getDrugIngredientCodes(cdm = cdm, name = "acetaminophen")
names(codes) <- "acetaminophen"
cdm <- generateDrugUtilisationCohortSet(cdm = cdm, name = "acetaminophen_cohort", conceptSet = codes, gapEra = 30)
cdm
#>
#> ── # OMOP CDM reference (duckdb) of DUS MOCK ───────────────────────────────────
#> • omop tables: person, observation_period, concept, concept_ancestor,
#> drug_strength, concept_relationship, drug_exposure, condition_occurrence,
#> observation, visit_occurrence
#> • cohort tables: cohort1, cohort2, acetaminophen_cohort
#> • achilles tables: -
#> • other tables: -
NOTE that the name
argument is used to create the new
table in the cdm object. For database backends this is the name of the
table that will be created.
We can compare what we see with what we would expect; if we look at the individual with more records we can see how all of them are joined into a single exposure as the records overlap each other:
cdm$drug_exposure |>
dplyr::filter(drug_concept_id %in% !!codes$acetaminophen & person_id == 69)
#> # Source: SQL [4 x 7]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.1/:memory:]
#> drug_exposure_id person_id drug_concept_id drug_exposure_start_date
#> <int> <int> <dbl> <date>
#> 1 201 69 2905077 2002-01-29
#> 2 203 69 2905077 2001-04-23
#> 3 205 69 2905077 2001-03-04
#> 4 204 69 43135274 2003-10-16
#> # ℹ 3 more variables: drug_exposure_end_date <date>,
#> # drug_type_concept_id <dbl>, quantity <dbl>
cdm$acetaminophen_cohort |>
dplyr::filter(subject_id == 69)
#> # Source: SQL [1 x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 69 2001-03-04 2004-06-02
In this case gapEra did not have a big impact as we can see in the attrition:
attrition(cdm$acetaminophen_cohort)
#> # A tibble: 2 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 71 62 1 Initial qualify…
#> 2 1 70 62 2 join exposures …
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
We can see this particular case of this individual:
cdm$drug_exposure |>
dplyr::filter(drug_concept_id %in% !!codes$acetaminophen & person_id == 50)
#> # Source: SQL [2 x 7]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.1/:memory:]
#> drug_exposure_id person_id drug_concept_id drug_exposure_start_date
#> <int> <int> <dbl> <date>
#> 1 143 50 1125360 2017-04-14
#> 2 144 50 43135274 2017-04-01
#> # ℹ 3 more variables: drug_exposure_end_date <date>,
#> # drug_type_concept_id <dbl>, quantity <dbl>
In this case we have 3 exposures separated by 3 days, so if we use the 30 days gap both exposures are joined into a single episode, whereas if we would use a gapEra smaller than 3 we would consider them as different episodes.
cdm$acetaminophen_cohort |>
dplyr::filter(subject_id == 50)
#> # Source: SQL [1 x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 50 2017-04-01 2017-04-23
We can access the other cohort attributes using the adequate functions. In settings we can see that the gapEra used is recorded or with cohortCodelist we can see which was the codelist used to create the cohort.
settings(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 3
#> cohort_definition_id cohort_name gap_era
#> <int> <chr> <chr>
#> 1 1 acetaminophen 30
cohortCount(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <int> <int>
#> 1 1 70 62
cohortCodelist(cdm$acetaminophen_cohort, cohortId = 1)
#>
#> ── 1 codelist ──────────────────────────────────────────────────────────────────
#>
#> - acetaminophen (4 codes)
Once we have created our base cohort using a conceptSet and a gapEra we can apply different restrictions:
requirePriorDrugWashout()
requireIsFirstDrugEntry()
requireObservationBeforeDrug()
requireDrugInDateRange()
requirePriorDrugWashout()
To require that the cohort entries (drug episodes) are incident we
would usually define a time (days
) where the individual is
not exposed to the drug. This can be achieved using
requirePriorDrugWashout()
function. In this example we
would restrict to individuals with 365 days of no exposure:
The result will be a cohort with the individuals that fulfill the criteria:
cdm$acetaminophen_cohort
#> # Source: table<main.acetaminophen_cohort> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 35 2005-11-10 2006-09-09
#> 2 1 23 2003-10-29 2003-11-27
#> 3 1 73 2011-04-02 2011-06-19
#> 4 1 44 2017-01-01 2018-03-30
#> 5 1 25 2020-01-04 2020-02-05
#> 6 1 99 2015-02-03 2020-11-01
#> 7 1 77 1966-09-15 1966-12-27
#> 8 1 57 2020-04-17 2020-11-19
#> 9 1 87 2020-10-02 2021-01-02
#> 10 1 45 2017-07-03 2018-07-06
#> # ℹ more rows
This would also get recorded in the attrition, counts and settings.
In the settings a new column with the specified parameter used:
settings(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 4
#> cohort_definition_id cohort_name gap_era prior_use_washout
#> <int> <chr> <chr> <chr>
#> 1 1 acetaminophen 30 365
The counts will be updated:
cohortCount(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <int> <int>
#> 1 1 66 62
And the attrition will have a new line:
attrition(cdm$acetaminophen_cohort)
#> # A tibble: 3 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 71 62 1 Initial qualify…
#> 2 1 70 62 2 join exposures …
#> 3 1 66 62 3 require prior u…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
The name
argument can be used to put the result into a
different table in our cdm (by default the function updates the current
cohort table). Whereas the cohortId
argument is used to
apply this criteria to only a restricted set of cohorts (by default the
same criteria is applied to all the cohort records). To show this in an
example we will create two cohorts (metformin and simvastatin) inside a
table named my_cohort
and then apply the inclusion criteria
to only one of them (simvastatin) and save the result to a table named:
my_new_cohort
codes <- getDrugIngredientCodes(cdm = cdm, name = c("metformin", "simvastatin"))
cdm <- generateDrugUtilisationCohortSet(cdm = cdm, name = "my_cohort", conceptSet = codes, gapEra = 30)
cdm
#>
#> ── # OMOP CDM reference (duckdb) of DUS MOCK ───────────────────────────────────
#> • omop tables: person, observation_period, concept, concept_ancestor,
#> drug_strength, concept_relationship, drug_exposure, condition_occurrence,
#> observation, visit_occurrence
#> • cohort tables: cohort1, cohort2, acetaminophen_cohort, my_cohort
#> • achilles tables: -
#> • other tables: -
settings(cdm$my_cohort)
#> # A tibble: 2 × 3
#> cohort_definition_id cohort_name gap_era
#> <int> <chr> <chr>
#> 1 1 6809_metformin 30
#> 2 2 36567_simvastatin 30
cdm$my_new_cohort <- cdm$my_cohort |>
requirePriorDrugWashout(days = 365, cohortId = 2, name = "my_new_cohort")
cdm
#>
#> ── # OMOP CDM reference (duckdb) of DUS MOCK ───────────────────────────────────
#> • omop tables: person, observation_period, concept, concept_ancestor,
#> drug_strength, concept_relationship, drug_exposure, condition_occurrence,
#> observation, visit_occurrence
#> • cohort tables: cohort1, cohort2, acetaminophen_cohort, my_cohort,
#> my_new_cohort
#> • achilles tables: -
#> • other tables: -
attrition(cdm$my_new_cohort)
#> # A tibble: 5 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 55 48 1 Initial qualify…
#> 2 1 52 48 2 join exposures …
#> 3 2 53 48 1 Initial qualify…
#> 4 2 51 48 2 join exposures …
#> 5 2 51 48 3 require prior u…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
requireIsFirstDrugEntry()
To require that the cohort entry (drug episodes) is the first one of
the available ones we can use the requireIsFirstDrugEntry()
function. See example:
The result will be a cohort with the individuals that fulfill the criteria:
cdm$acetaminophen_cohort
#> # Source: table<main.acetaminophen_cohort> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 45 2017-07-03 2018-07-06
#> 2 1 95 2001-06-13 2010-01-28
#> 3 1 26 2015-07-22 2020-07-18
#> 4 1 51 2017-09-04 2017-09-17
#> 5 1 58 1995-08-29 2002-06-12
#> 6 1 62 2005-01-24 2006-12-05
#> 7 1 73 2011-04-02 2011-06-19
#> 8 1 8 2019-09-14 2020-12-31
#> 9 1 23 2003-10-29 2003-11-27
#> 10 1 44 2017-01-01 2018-03-30
#> # ℹ more rows
This would also get recorded in the attrition, counts and settings on top of the already exiting ones.
In the settings a new column with the specified parameter used:
settings(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 5
#> cohort_definition_id cohort_name gap_era prior_use_washout limit
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 acetaminophen 30 365 first_entry
The counts will be updated:
cohortCount(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <int> <int>
#> 1 1 62 62
And the attrition will have a new line:
attrition(cdm$acetaminophen_cohort)
#> # A tibble: 4 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 71 62 1 Initial qualify…
#> 2 1 70 62 2 join exposures …
#> 3 1 66 62 3 require prior u…
#> 4 1 62 62 4 require is the …
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
requireObservationBeforeDrug()
To require that a cohort entry (drug episodes) has a certain time of
prior observation we can use the
requireObservationBeforeDrug()
function. See example:
The result will be a cohort with the individuals that fulfill the criteria:
cdm$acetaminophen_cohort
#> # Source: table<main.acetaminophen_cohort> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 45 2017-07-03 2018-07-06
#> 2 1 95 2001-06-13 2010-01-28
#> 3 1 26 2015-07-22 2020-07-18
#> 4 1 51 2017-09-04 2017-09-17
#> 5 1 58 1995-08-29 2002-06-12
#> 6 1 62 2005-01-24 2006-12-05
#> 7 1 73 2011-04-02 2011-06-19
#> 8 1 8 2019-09-14 2020-12-31
#> 9 1 23 2003-10-29 2003-11-27
#> 10 1 44 2017-01-01 2018-03-30
#> # ℹ more rows
This would also get recorded in the attrition, counts and settings on top of the already exiting ones.
In the settings a new column with the specified parameter used:
settings(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 6
#> cohort_definition_id cohort_name gap_era prior_use_washout limit
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 acetaminophen 30 365 first_entry
#> # ℹ 1 more variable: prior_drug_observation <chr>
The counts will be updated:
cohortCount(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <int> <int>
#> 1 1 33 33
And the attrition will have a new line:
attrition(cdm$acetaminophen_cohort)
#> # A tibble: 5 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 71 62 1 Initial qualify…
#> 2 1 70 62 2 join exposures …
#> 3 1 66 62 3 require prior u…
#> 4 1 62 62 4 require is the …
#> 5 1 33 33 5 require prior o…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
requireDrugInDateRange()
To require that a cohort entry (drug episodes) has a certain date
within an specific range we can use the
requireDrugInDateRange()
function. In general you would
like to apply this restriction to the incident date (cohort_start_date),
but the function is flexible and you can use it to restrict to any other
date. See example:
cdm$acetaminophen_cohort <- cdm$acetaminophen_cohort |>
requireDrugInDateRange(
indexDate = "cohort_start_date",
dateRange = as.Date(c("2000-01-01", "2020-12-31"))
)
The result will be a cohort with the individuals that fulfill the criteria:
cdm$acetaminophen_cohort
#> # Source: table<main.acetaminophen_cohort> [?? x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 45 2017-07-03 2018-07-06
#> 2 1 95 2001-06-13 2010-01-28
#> 3 1 26 2015-07-22 2020-07-18
#> 4 1 51 2017-09-04 2017-09-17
#> 5 1 62 2005-01-24 2006-12-05
#> 6 1 73 2011-04-02 2011-06-19
#> 7 1 8 2019-09-14 2020-12-31
#> 8 1 23 2003-10-29 2003-11-27
#> 9 1 44 2017-01-01 2018-03-30
#> 10 1 57 2020-04-17 2020-11-19
#> # ℹ more rows
This would also get recorded in the attrition, counts and settings on top of the already exiting ones.
In the settings a new column with the specified parameter used:
settings(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 6
#> cohort_definition_id cohort_name gap_era prior_use_washout limit
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 acetaminophen 30 365 first_entry
#> # ℹ 1 more variable: prior_drug_observation <chr>
The counts will be updated:
cohortCount(cdm$acetaminophen_cohort)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <int> <int>
#> 1 1 24 24
And the attrition will have a new line:
attrition(cdm$acetaminophen_cohort)
#> # A tibble: 6 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 71 62 1 Initial qualify…
#> 2 1 70 62 2 join exposures …
#> 3 1 66 62 3 require prior u…
#> 4 1 62 62 4 require is the …
#> 5 1 33 33 5 require prior o…
#> 6 1 24 24 6 require cohort_…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
If you just want to restrict on the lower or upper bound you can just leave the other element as NA and then no condition will be applied, see for example:
cdm$my_new_cohort <- cdm$my_new_cohort |>
requireDrugInDateRange(dateRange = as.Date(c(NA, "2010-12-31")))
attrition(cdm$my_new_cohort)
#> # A tibble: 7 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 55 48 1 Initial qualify…
#> 2 1 52 48 2 join exposures …
#> 3 1 23 21 3 require cohort_…
#> 4 2 53 48 1 Initial qualify…
#> 5 2 51 48 2 join exposures …
#> 6 2 51 48 3 require prior u…
#> 7 2 18 18 4 require cohort_…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
It is very important to know that the different restrictions are not commutable operations and that different order can lead to different results. Let’s see the following example where we have an individual with 4 cohort entries:
In this case we will see the result of combining in different ways 4 inclusion criteria:
requireIsFirstDrugEntry()
requirePriorDrugWashout(days = 365)
requireObservationBeforeDrug(days = 365)
requireDrugInDateRange(dateRange = as.Date(c("2011-01-01", "2012-12-31)))
If we would apply the initially the first requirement and then the washout one we would end with only the first record:
Whereas if we would apply initially the washout criteria and then the first one the resulting exposure would be the fourth one:
If we would apply the initially the first requirement and then the minObs one we would end with no record in the cohort:
Whereas if we would apply initially the minObs criteria and then the first one there would be an exposure selected, the second one:
If we would apply the initially the first requirement and then the 2011-2012 one we would end with no record in the cohort:
Whereas if we would apply initially the 2011-2012 criteria and then the first one there would be an exposure selected, the second one:
If we would apply the initially the washout requirement and then the minObs one we would end with only the last record selected:
Whereas if we would apply initially the minObs criteria and then the washout one the second and the fourth exposures are the ones that would be selected:
If we would apply initially the washout requirement and then the 2011-2012 one no record would be selected:
Whereas if we would apply initially the 2011-2012 criteria and then the washout one the second record would be included:
Finally requireObservationBeforeDrug
and
requireDrugInDateRange
will always be commutable operations
so the other of this two will always be the same.
Having all this into account the recommended order to apply criteria would be:
Require a prior drug washout or require first drug entry (particular case).
Require a prior observation before the drug episode.
Require the drugs to be in a certain date range.
Although this is the recommended order, your study design may have a different required specification, for example you may be interested on the first exposure that fulfills some criteria. Thus making applying the require first drug entry at the end.