| Title: | Poisson Super Learner |
| Version: | 0.1.1 |
| Description: | Provides tools for fitting piece-wise constant hazard models for survival and competing risks data, including ensemble hazard estimation via the Super Learner framework. The package supports estimation of survival functions and absolute risk predictions from fitted cause-specific hazard models. For the Super Learner framework see van der Laan, Polley and Hubbard (2007) <doi:10.2202/1544-6115.1309>. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| LinkingTo: | Rcpp |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | data.table, sampling, riskRegression |
| Imports: | Rcpp, methods, lava, Matrix, glmnet, mgcv |
| Suggests: | knitr, rmarkdown, survival, prodlim, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | yes |
| Packaged: | 2026-04-01 07:01:29 UTC; pwt887 |
| Author: | Gabriele Pittarello [aut, cre], Helene Rytgaard [aut], Thomas Gerds [aut] |
| Maintainer: | Gabriele Pittarello <gabriele.pittarello@sund.ku.dk> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-04 10:00:02 UTC |
poissonsuperlearner: Poisson Super Learner
Description
Provides tools for fitting piece-wise constant hazard models for survival and competing risks data, including ensemble hazard estimation via the Super Learner framework. The package supports estimation of survival functions and absolute risk predictions from fitted cause-specific hazard models. For the Super Learner framework see van der Laan, Polley and Hubbard (2007) doi:10.2202/1544-6115.1309.
Author(s)
Maintainer: Gabriele Pittarello gabriele.pittarello@sund.ku.dk
Authors:
Helene Rytgaard hely@sund.ku.dk
Thomas Gerds tag@biostat.ku.dk
GAM learner via mgcv::bam
Description
Learner_gam is a Reference Class implementing the learner interface
used by Superlearner() and fit_learner().
Arguments
covariates |
|
cross_validation |
|
Details
User-facing API: users should only initialize the learner and pass it
to Superlearner() / fit_learner(). The methods private_fit() and
private_predictor() are part of the internal learner interface and are
not meant to be called directly by users.
Wrapper role: this class wraps mgcv::bam in a piecewise-constant hazard
workflow. The package-specific contribution is to provide a convenient
interface for the long-format Poisson likelihood with offsets for time at risk,
and optional node terms encoding the baseline hazard, while forwarding standard
mgcv::bam arguments supplied via ....
Model
Let 0=t_0 < t_1 < \cdots < t_m denote time knots and
define interval indicators I_k(t)=1\{t\in(t_k,t_{k+1}]\}.
The piecewise-constant hazard model with an additive predictor is
\lambda(t \mid x) = \sum_{k=0}^{m} I_k(t)\,\exp\{\eta(x) + \gamma_k\}.
The additive predictor \eta(x) is constructed from covariates
(smooth terms such as s(age) and/or linear terms) and estimated by mgcv.
Fields
covariates(character)Terms used to build the additive predictor (may include
s()terms).cross_validation(logical)Workflow flag; see Details.
intercept(logical)Whether to include an intercept.
add_nodes(logical)If
TRUE, include interval ("node") effects encoding the baseline hazard.formula(character)Formula string passed to
mgcv::bam.learner(function)Backend fitter (
mgcv::bam).fit_arguments(list)Additional arguments forwarded to
mgcv::bam.
Methods (internal learner interface)
initialize(...)Construct and configure the learner. This is the only method users should call.
private_fit(data, ...)Internal. Fits a Poisson GAM with offset
log(tij)on long-format data.private_predictor(model, newdata, ...)Internal. Predicts hazards on the response scale.
Examples
lrn <- Learner_gam(covariates = c("s(age)", "value_LDL"))
Penalized Poisson learner via glmnet
Description
Learner_glmnet is a Reference Class implementing the learner interface
used by Superlearner() and fit_learner().
Details
User-facing API: users are expected to initialize the learner (i.e.,
call Learner_glmnet(...)) and pass the resulting object to
Superlearner() or fit_learner(). The remaining methods documented below
(e.g., private_fit(), private_predictor()) are part of the internal learner
interface and are not meant to be called directly by users.
Wrapper role: this class is a user-friendly wrapper around the existing
glmnet implementation. The package-specific contribution is to provide a
piecewise-constant hazard workflow: create the long-format Poisson data with
offsets for time at risk, add the interval ("node") structure for the baseline
hazard when requested, and forward standard glmnet arguments supplied at
initialization to the backend fitter.
Model
Let 0=t_0 < t_1 < \cdots < t_m denote time knots and
define interval indicators I_k(t)=1\{t\in(t_k,t_{k+1}]\}.
The piecewise-constant hazard model is
\lambda(t \mid x) = \sum_{k=0}^{m} I_k(t)\,\lambda_k(x),
\qquad \lambda_k(x) = \exp(\beta^\top x + \gamma_k).
Penalization is applied to the regression coefficients through the glmnet
elastic-net penalty. If you want node (baseline) terms to be unpenalized, use
penalty.factor via ... (and set it consistently with how your design matrix
encodes nodes).
Fields
covariates(character)Names of covariate columns used in the model.
cross_validation(logical)If
TRUE, chooseslambdabyglmnet::cv.glmnet.intercept(logical)Whether to include an intercept in the backend fit.
add_nodes(logical)If
TRUE, include interval ("node") effects encoding the baseline hazard.lambda(numeric)If
cross_validation=FALSE, thelambdaused in the final fit.formula(character)Formula string used to create the design matrix in long format.
learner(function)Backend fitter (
glmnet::glmnetorglmnet::cv.glmnet).fit_arguments(list)Additional arguments forwarded to the backend fitter.
Methods (internal learner interface)
initialize(...)Construct and configure the learner. This is the only method users should call.
private_fit(data, ...)Internal. Fits a Poisson model with offset
log(tij)on long-format data.private_predictor(model, newdata, ...)Internal. Predicts hazards on the response scale for long-format
newdata.
Examples
lrn <- Learner_glmnet(covariates = c("age", "sex"), alpha = 1, cross_validation = TRUE)
HAL learner for piecewise Poisson hazards
Description
Learner_hal is a Reference Class implementing the learner interface
used by Superlearner() and fit_learner().
Details
User-facing API: users should only initialize the learner and pass it
to Superlearner() / fit_learner(). The methods private_fit() and
private_predictor() (and any basis-construction helpers) are part of the
internal learner interface and are not meant to be called directly by users.
Wrapper role: this class provides a piecewise-constant hazard wrapper around
a HAL-style indicator-basis construction, estimated by L1-penalized Poisson
regression using a glmnet backend. The package-specific contribution is to
(i) construct the long-format Poisson representation with offsets for time at
risk, (ii) generate indicator bases compatible with piecewise hazards, and
(iii) forward backend fitting arguments supplied via ....
Model
Let 0=t_0 < t_1 < \cdots < t_m denote time knots and
define interval indicators I_k(t)=1\{t\in(t_k,t_{k+1}]\}.
The HAL piecewise-constant hazard model is
\lambda(t \mid x) = \sum_{k=0}^{m} I_k(t)\,\exp\{f(t,x)\},
where f(t,x) is approximated by a finite linear combination of
indicator basis functions.
Two-covariate illustration
Let x=(x_1,x_2) be two covariates and let
t_0 < t_1 < \cdots < t_R be time grid points used to
create step functions in time. Choose covariate cutpoints
c_{1,1},\ldots,c_{1,K_1} for x_1 and
c_{2,1},\ldots,c_{2,K_2} for x_2.
Define indicator bases:
B_r(t) = 1\{t_r \le t\}
B_{1,p}(x) = 1\{c_{1,p} \le x_1\}
B_{2,q}(x) = 1\{c_{2,q} \le x_2\}
A main-effects HAL approximation on the log-hazard scale can be written as:
f_\beta(t,x) = \beta_0
+ \sum_{r=1}^R \beta_r B_r(t)
+ \sum_{r=1}^R\sum_{p=1}^{K_1} \beta_{r,1,p} B_r(t) B_{1,p}(x)
+ \sum_{r=1}^R\sum_{q=1}^{K_2} \beta_{r,2,q} B_r(t) B_{2,q}(x).
If max_degree >= 2, the learner additionally includes interaction bases such as
\sum_{r=1}^R\sum_{p=1}^{K_1}\sum_{q=1}^{K_2}
\beta_{r,12,pq} B_r(t) B_{1,p}(x) B_{2,q}(x).
How reference class parameters map to the model
covariatesCovariate columns used to build covariate indicator bases.
num_knotsControls the number of cutpoints per covariate used for indicator bases.
max_degreeMaximum interaction order included in the basis expansion.
add_nodesIf
TRUE, includes interval ("node") structure for the baseline hazard.interceptWhether the backend penalized regression includes an intercept term.
cross_validationIf
TRUE, selects the penalty level usingglmnet::cv.glmnet.fit_argumentsAdditional arguments forwarded to the
glmnetbackend (e.g.nfolds).
Fields
covariates(character)Names of covariate columns used in the basis.
cross_validation(logical)Whether to use
cv.glmnetto select the penalty.intercept(logical)Backend intercept flag.
add_nodes(logical)Whether node (time-interval) effects are included.
max_degree(integer)Maximum interaction order.
num_knots(numeric)Knots used for basis construction.
lambda_opt(numeric)Selected penalty level when using cross-validation.
fit_arguments(list)Extra backend arguments forwarded to
glmnet.
Methods (internal learner interface)
initialize(...)Construct and configure the learner. This is the only method users should call.
private_fit(data, ...)Internal. Builds bases and fits the penalized Poisson model with offset
log(tij).private_predictor(model, newdata, ...)Internal. Evaluates the fitted approximation and returns hazards on the response scale.
Examples
lrn <- Learner_hal(covariates = c("age", "sex"), max_degree = 2L, num_knots = c(10L, 5L))
Fit a Poisson Super Learner ensemble
Description
Fits an ensemble of cause-specific piecewise-constant hazard models using a long-format Poisson representation and combines them through a meta-learner (stacking).
Usage
Superlearner(
data,
id = "id",
status = "status",
event_time = NULL,
learners,
number_of_nodes = NULL,
nodes = NULL,
variable_transformation = NULL,
nfold = 3,
...
)
Arguments
data |
|
id |
|
status |
|
event_time |
|
learners |
|
number_of_nodes |
|
nodes |
|
variable_transformation |
Optional transformation specification passed to
|
nfold |
|
... |
Additional arguments currently ignored. |
Details
Internally, the function:
builds a time grid (
nodes) and converts the subject-level data to a long Poisson format;fits each base learner once on the full long data for each cause;
removes learners that already fail on the full data;
uses
nfoldcross-validation to obtain out-of-sample base-learner predictions (Z1,Z2, ...) for stacking;removes learners whose cross-validated prediction column is entirely missing for at least one cause;
fits a cause-specific meta-learner on the retained stacked predictions.
If all learners fail on the full data, the function stops with an error.
If only one learner remains after the full-data screening step or after the
cross-validation screening step, no meta-learner is fit. In that case,
metalearner is NULL, each superlearner[[k]]$meta_learner_fit is NULL,
and prediction is based directly on the stored fitted base learner.Numeric
learner positions always refer to the learners actually retained in the
fitted object.
Value
An object of class poisson_superlearner, stored as a named list
with the following components:
learners:
the retained base learner objects.
metalearner:
the meta-learner object used for stacking. If no stacking is performed because
only one learner remains, metalearner is NULL.
superlearner:
a list of length data_info$n_crisks, one entry per cause. For cause k,
superlearner[[k]] is a list with two elements:
-
learners_fit: the fitted base learner object or objects for causek. If more than one learner is retained, this is alistwith one fitted object per retained learner. If only one learner remains, this is the single fitted learner object itself. -
meta_learner_fit: the fitted cause-specific meta-learner for causek. If no stacking is performed, this isNULL.
cross_validation_deviance:
a data.table with columns learner and deviance, giving the mean
cross-validated Poisson deviance for each retained base learner. This
component is present when cross-validated model comparison is available.
data_info:
a list of bookkeeping information used for prediction and interpretation,
containing:
-
id: identifier column name used. -
status: status column name used. -
event_time: event-time column name used. -
nodes: numeric vector of node cut points used for the piecewise grid. -
nfold: number of folds used for stacking. -
maximum_followup: maximum observed follow-up time. -
n_crisks: number of event types detected. -
learners_labels: character vector of retained learner labels. -
variable_transformation: the transformation specification passed invariable_transformation, orNULL.
Examples
data <- simulateStenoT1(50, competing_risks = TRUE)
learners <- list(
glm = Learner_glmnet(
covariates = c("sex", "value_LDL"),
lambda = 0,
cross_validation = FALSE
),
ridge = Learner_glmnet(
covariates = c("sex", "value_LDL"),
alpha = 0,
lambda = 0.01,
cross_validation = FALSE
)
)
fit <- Superlearner(
data = data,
id = "id",
status = "status_cvd",
event_time = "time_cvd",
learners = learners,
number_of_nodes = 3,
nfold = 2
)
Extract coefficients from a fitted base learner
Description
Convenience method to extract (cause-specific) model coefficients from a fitted
base_learner returned by fit_learner().
Usage
## S3 method for class 'base_learner'
coef(object, cause = NULL, ...)
Arguments
object |
|
cause |
|
... |
Passed to the underlying |
Details
For competing risks, fit_learner() fits one model per cause, stored in
object$learner_fit[[k]] for k = 1, 2, ..., K. This method simply dispatches
to the underlying model’s coef() method for each fitted object.
Learner-dependent output. The returned coefficient object depends on the
base learner used (e.g. a numeric vector, a sparse matrix, a list, etc.).
This method does not post-process or rename coefficients; it returns the output
of coef(object$learner_fit[[k]], ...) unchanged.
Value
If cause is a single integer, returns the coefficient object produced by
coef() for that cause-specific fitted model.
If cause = NULL, returns a list of length object$data_info$n_crisks,
where element [[k]] contains coefficients for cause k.
If no fitted model is present (object$learner_fit is NULL), signals a message
and returns invisible(object).
Examples
d <- simulateStenoT1(50, competing_risks = TRUE)
lrn <- Learner_glmnet(covariates = c("age", "value_LDL"),
lambda = 0, cross_validation = FALSE)
bl <- fit_learner(d, learner = lrn, id = "id",
status = "status_cvd", event_time = "time_cvd",
number_of_nodes = 4)
# coefficients for cause 1
coef(bl, cause = 1)
# coefficients for all causes (list)
coef(bl)
Extract stacking (meta-learner) coefficients from a fitted Poisson Super Learner
Description
Extracts the meta-learner coefficients (stacking weights) from a fitted
poisson_superlearner object returned by Superlearner().
Usage
## S3 method for class 'poisson_superlearner'
coef(object, cause = NULL, model = "sl", ...)
Arguments
object |
|
cause |
|
model |
Scalar model selector. Default is
|
... |
Passed to the underlying |
Details
For each cause k, the ensemble stores a fitted meta-learner in
object$superlearner[[k]]$meta_learner_fit. This method dispatches to the
underlying coef() method for that fitted meta-learner.
What coefficients represent. These coefficients correspond to the meta-learner
regression of the outcome on the cross-validated base-learner predictions
(Z1, Z2, ...). Under the default meta-learner, they are the stacking
weights (on the scale defined by the meta-learner).
Learner-dependent output. The returned coefficient object depends on the
meta-learner implementation (by default a glmnet fit, often returning a sparse
matrix). This method does not rename Z* terms or post-process coefficients; it
returns the output of coef(object$superlearner[[k]]$meta_learner_fit, ...)
unchanged.
Single-learner special case. If the ensemble was fit with only one base learner,
no meta-learner is fit and meta_learner_fit is NULL. In that case, coef()
for the poisson_superlearner does not have meta-learner coefficients to return.
Value
If cause is a single integer, returns the coefficient object produced by
coef() for the cause-specific fitted meta-learner.
If cause = NULL, returns a list of length object$data_info$n_crisks,
where element [[k]] contains meta-learner coefficients for cause k.
If no fitted ensemble is present (object$superlearner is NULL), signals a message
and returns invisible(object).
Examples
d <- simulateStenoT1(50, competing_risks = TRUE)
learners <- list(
glm = Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE),
gam = Learner_gam(covariates = c("age", "value_LDL"))
)
fit <- Superlearner(d, id="id", status="status_cvd", event_time="time_cvd",
learners=learners, number_of_nodes=4, nfold=2)
# meta-learner coefficients (cause 1)
coef(fit, cause = 1)
# meta-learner coefficients for all causes (list)
coef(fit)
Fit a single base learner
Description
Pre-processes subject-level time-to-event data into a long Poisson format on a piecewise-constant time grid, then fits one initialized learner object. For competing risks, a separate model is fit for each event type (cause) using the standard cause-specific Poisson likelihood on the long data.
Usage
fit_learner(
data,
learner,
id = "id",
stratified_k_fold = FALSE,
status = "status",
event_time = NULL,
number_of_nodes = NULL,
nodes = NULL,
variable_transformation = NULL,
...
)
Arguments
data |
|
learner |
Reference-class learner object (e.g. from |
id |
|
stratified_k_fold |
|
status |
|
event_time |
|
number_of_nodes |
|
nodes |
|
variable_transformation |
|
... |
Additional arguments currently ignored. |
Value
An object of class base_learner, i.e. a named list with:
- model
The learner object that was fit (the input
learner), stored for later prediction. This contains the learner specification (e.g., covariates, tuning parameters).- learner_fit
A
listof fitted model objects, one per cause. Its length equalsdata_info$n_crisks. The list is created by splitting the internally pre-processed long data by cause indicatorkand callingmodel$private_fit()on each split.Names typically correspond to the cause labels
"1","2", ...,"K".Each element is learner-dependent: e.g. for
Learner_glmnetit may be a"glmnet"(often wrapped, e.g."fishnet") fit; for other learners it will be whatever$private_fit()returns.Each fitted object is trained on long Poisson data representing the piecewise-constant hazard for that cause across the node intervals.
- data_info
A
listof bookkeeping information needed for prediction and interpretation:- id
Identifier column name used.
- status
Status column name used.
- event_time
Event/censoring time column name used.
- nodes
Numeric vector of node cut points used for the piecewise grid (includes
0and is sorted). These are the interval boundaries used in the long Poisson representation.- maximum_followup
max(data[[event_time]]).- n_crisks
Number of event types (causes) detected. If censoring is present (
0instatus), thenn_crisks = #unique(status) - 1; otherwisen_crisks = #unique(status).- variable_transformation
The transformation specification passed in
variable_transformation(orNULL).
Examples
d <- simulateStenoT1(50, competing_risks = TRUE)
lrn <- Learner_glmnet(covariates = c("age", "value_LDL"),
lambda = 0,
cross_validation = FALSE)
bl <- fit_learner(d,
learner = lrn,
id = "id",
status = "status_cvd",
event_time = "time_cvd",
number_of_nodes = 4)
Absolute risk (cumulative incidence) for a cause under piecewise-constant hazards
Description
Computes, per row, the cumulative incidence function at the end of each interval,
grouped by id. The number of causes is inferred from the number of columns in haz.
Usage
pch_absolute_risk(id, dt, haz, cause_idx, one_based = TRUE, na_is_zero = FALSE)
Arguments
id |
Integer vector. Sorted by |
dt |
Numeric vector of interval lengths. |
haz |
Numeric matrix (n x C) of cause-specific hazards per interval. Columns correspond to causes 1..C. |
cause_idx |
Integer. Index of the cause of interest (1-based by default). |
one_based |
Logical. If |
na_is_zero |
Logical. If |
Value
Numeric vector of cumulative incidence values at the end of each interval.
Examples
id <- c(1L, 1L, 2L, 2L)
dt <- c(1, 1, 1, 1)
haz <- rbind(
c(0.10, 0.05),
c(0.20, 0.10),
c(0.05, 0.02),
c(0.10, 0.03)
)
pch_absolute_risk(id = id, dt = dt, haz = haz, cause_idx = 1)
Absolute risk (Euler approximation) for a cause under piecewise-constant hazards
Description
Computes the cumulative incidence function using the first-order Euler (discrete) approximation:
F_j(t) \approx \sum S(t_{k-1}) \lambda_{j,k} \Delta t_k
Grouped by id, this returns the cumulative incidence at the end of each interval.
Usage
pch_absolute_risk_euler(
id,
dt,
haz,
cause_idx,
one_based = TRUE,
na_is_zero = FALSE
)
Arguments
id |
Integer vector. Sorted by |
dt |
Numeric vector of interval lengths. |
haz |
Numeric matrix (n x C) of cause-specific hazards per interval. |
cause_idx |
Integer. Index of the cause of interest (1-based by default). |
one_based |
Logical. If |
na_is_zero |
Logical. If |
Value
Numeric vector of cumulative incidence values (Euler approximation) at the end of each interval.
Examples
id <- c(1L, 1L, 2L, 2L)
dt <- c(1, 1, 1, 1)
haz <- rbind(
c(0.10, 0.05),
c(0.20, 0.10),
c(0.05, 0.02),
c(0.10, 0.03)
)
pch_absolute_risk_euler(id = id, dt = dt, haz = haz, cause_idx = 1)
Piecewise-constant hazards survival function
Description
Computes survival at the end of each interval for competing risks with piecewise constant hazards.
Usage
pch_survival(id, dt, haz, na_is_zero = FALSE)
Arguments
id |
Integer vector of subject IDs, sorted by id then time. |
dt |
Numeric vector of interval lengths. |
haz |
Numeric matrix (n x C) of cause-specific hazards. |
na_is_zero |
Logical. If TRUE, treat NA hazards as zero. |
Value
Numeric vector of survival probabilities at the end of each interval.
Examples
id <- c(1L, 1L, 2L, 2L)
dt <- c(1, 1, 1, 1)
haz <- rbind(
c(0.10, 0.05),
c(0.20, 0.10),
c(0.05, 0.02),
c(0.10, 0.03)
)
pch_survival(id = id, dt = dt, haz = haz)
Predict hazards, survival and absolute risk from a fitted base learner
Description
Computes cause-specific piecewise-constant hazards (pwch_k), the corresponding
survival function, and absolute risk for a given cause, at user-supplied
prediction horizons times, using a fitted base_learner object (single learner;
no stacking).
Usage
## S3 method for class 'base_learner'
predict(object, newdata, times, cause = 1, ...)
Arguments
object |
|
newdata |
|
times |
|
cause |
|
... |
Additional arguments (currently ignored). |
Details
Internally, newdata is expanded to a Cartesian product with times, converted to
long Poisson format on object$data_info$nodes, and the fitted learner for each
cause in object$learner_fit is used to predict the cause-specific hazards.
Survival and absolute risk are then computed from the predicted hazards.
Special case times = 0: when 0 is included in times, the returned rows
have survival_function = 1, absolute_risk = 0, and all pwch_k = 0 at time 0.
Identifiers in the output: if newdata contains the id column, it is carried
into the output. If newdata does not contain an id column, an internal id is
created for computation, but it is not guaranteed to appear in the returned table
unless it was present in newdata.
Value
A data.table with one row per (row in newdata, time in times) and columns:
- (original columns)
All columns from
newdata(excluding ignored event columns).- time column
A column with name
object$data_info$event_timeholding the requested horizon.- pwch_1, pwch_2, ...
Predicted cause-specific piecewise hazards at the horizon.
- survival_function
Predicted survival probability at the horizon.
- absolute_risk
Predicted cumulative incidence (absolute risk) for
causeat the horizon.
Examples
d <- simulateStenoT1(120, competing_risks = TRUE)
lrn <- Learner_glmnet(covariates = c("age", "value_LDL"), lambda = 0, cross_validation = FALSE)
bl <- fit_learner(d, learner = lrn, id="id", status="status_cvd", event_time="time_cvd",
number_of_nodes=8)
p <- predict(bl, newdata = d[1:5], times = c(0, 2, 5), cause = 1)
head(p)
Predict hazards, survival and absolute risk from a fitted Poisson Super Learner
Description
Computes cause-specific piecewise-constant hazards (pwch_k), the corresponding
survival function, and absolute risk for a given cause, at user-supplied
prediction horizons times, for each row in newdata.
Usage
## S3 method for class 'poisson_superlearner'
predict(object, newdata, times, cause = 1, model = "sl", ...)
Arguments
object |
|
newdata |
|
times |
|
cause |
|
model |
Scalar model selector. Default is
Numeric positions refer to the learners actually stored in the fitted object. |
... |
Additional arguments (currently ignored). |
Details
Internally, newdata is expanded to a Cartesian product with the requested
times, converted to long Poisson format on object$data_info$nodes, and hazards
are predicted either from the stacked super learner (model = "sl") or from one
selected fitted base learner. Survival and absolute risk are then computed from
the predicted hazards.
Special case times = 0: when 0 is included in times, the returned rows
have survival_function = 1, absolute_risk = 0, and all pwch_k = 0 at time 0.
Identifiers in the output: if newdata contains the id column, it is carried
into the output. If newdata does not contain an id column, an internal id is
created for computation, but it is not guaranteed to appear in the returned table
unless it was present in newdata.
Value
A data.table with one row per (row in newdata, time in times) and columns:
- (original columns)
All columns from
newdata(excluding ignored event columns).- time column
A column with name
object$data_info$event_timeholding the requested horizon.- pwch_1, pwch_2, ...
Predicted cause-specific piecewise hazards at the horizon.
- survival_function
Predicted survival probability at the horizon.
- absolute_risk
Predicted cumulative incidence (absolute risk) for
causeat the horizon.
Examples
d <- simulateStenoT1(30, competing_risks = TRUE)
learners <- list(
lasso = Learner_glmnet(
covariates = "sex",
alpha = 1,
lambda = 0.01,
cross_validation = FALSE
),
ridge = Learner_glmnet(
covariates = c("sex", "value_LDL"),
alpha = 0,
lambda = 0.01,
cross_validation = FALSE
)
)
fit <- Superlearner(
data = d,
id = "id",
status = "status_cvd",
event_time = "time_cvd",
learners = learners,
number_of_nodes = 3,
nfold = 2
)
p <- predict(fit, newdata = d[1:3], times = c(0, 2), cause = 1)
p[, .(id, time_cvd, absolute_risk)]
Absolute-risk matrix predictions for a fitted base learner
Description
Absolute-risk matrix predictions for a fitted base learner
Usage
## S3 method for class 'base_learner'
predictRisk(object, newdata, times, cause = 1, ...)
Arguments
object |
|
newdata |
|
times |
|
cause |
|
... |
Unused. |
Value
numeric matrix with nrow(newdata) rows and length(times) columns.
Examples
d <- simulateStenoT1(30, competing_risks = TRUE)
lrn <- Learner_glmnet(
covariates = c("sex", "value_LDL"),
lambda = 0.01,
cross_validation = FALSE
)
bl <- fit_learner(
d,
learner = lrn,
id = "id",
status = "status_cvd",
event_time = "time_cvd",
number_of_nodes = 3
)
if (requireNamespace("riskRegression", quietly = TRUE)) {
riskRegression::predictRisk(bl, newdata = d[1:3], times = c(1, 3), cause = 1)
}
Absolute-risk matrix predictions for a fitted Poisson Super Learner
Description
S3 method compatible with riskRegression::predictRisk returning one column
per requested time.
Usage
## S3 method for class 'poisson_superlearner'
predictRisk(object, newdata, times, cause = 1, model = "sl", ...)
Arguments
object |
|
newdata |
|
times |
|
cause |
|
model |
Scalar model selector. Default is |
... |
Unused. |
Value
numeric matrix with nrow(newdata) rows and length(times) columns.
Examples
d <- simulateStenoT1(30, competing_risks = TRUE)
learners <- list(
lasso = Learner_glmnet(
covariates = "sex",
alpha = 1,
lambda = 0.01,
cross_validation = FALSE
),
ridge = Learner_glmnet(
covariates = c("sex", "value_LDL"),
alpha = 0,
lambda = 0.01,
cross_validation = FALSE
)
)
fit <- Superlearner(
data = d,
id = "id",
status = "status_cvd",
event_time = "time_cvd",
learners = learners,
number_of_nodes = 3,
nfold = 2
)
if (requireNamespace("riskRegression", quietly = TRUE)) {
riskRegression::predictRisk(fit, newdata = d[1:3], times = c(1, 3), cause = 1)
}
Print method for base_learner
Description
Prints a compact description of the fitted base learner, including the learner type, the time-grid used, and (optionally) the fitted model object for a given cause.
Usage
## S3 method for class 'base_learner'
print(x, cause = 1, ...)
Arguments
x |
|
cause |
|
... |
Passed to the underlying fitted object |
Value
Invisibly returns x.
Examples
d <- simulateStenoT1(30, competing_risks = TRUE)
lrn <- Learner_glmnet(
covariates = c("sex", "value_LDL"),
lambda = 0.01,
cross_validation = FALSE
)
bl <- fit_learner(
d,
learner = lrn,
id = "id",
status = "status_cvd",
event_time = "time_cvd",
number_of_nodes = 3
)
print(bl, cause = NULL)
Print method for poisson_superlearner
Description
Prints a compact description of the fitted Poisson Super Learner, including the number of base learners, the meta-learner, the time-grid used, and competing-risk structure. Optionally prints the fitted meta-learner for a given cause.
Usage
## S3 method for class 'poisson_superlearner'
print(x, cause = 1, model = "sl", ...)
Arguments
x |
|
cause |
|
model |
Scalar model selector. Default is
|
... |
Passed to the underlying fitted meta-learner |
Value
Invisibly returns x.
Examples
d <- simulateStenoT1(30, competing_risks = TRUE)
learners <- list(
lasso = Learner_glmnet(
covariates = "sex",
alpha = 1,
lambda = 0.01,
cross_validation = FALSE
),
ridge = Learner_glmnet(
covariates = c("sex", "value_LDL"),
alpha = 0,
lambda = 0.01,
cross_validation = FALSE
)
)
fit <- Superlearner(
data = d,
id = "id",
status = "status_cvd",
event_time = "time_cvd",
learners = learners,
number_of_nodes = 3,
nfold = 2
)
print(fit, cause = NULL)
Simulate time-to-event data for hypothetical type-1 diabetes patients
Description
Simulate synthetic data inspired by the Steno Type-1 risk engine
Usage
simulateStenoT1(
n,
coefficient_age = 0.05,
coefficient_LDL = 0.1,
value_diabetis = 0.02,
keep = NULL,
scenario = c("alpha", "beta"),
competing_risks = FALSE
)
Arguments
n |
|
coefficient_age |
|
coefficient_LDL |
|
value_diabetis |
|
keep |
|
scenario |
|
competing_risks |
|
Details
Generates baseline covariates and event times for CVD and censoring, with an optional competing-risks setting, for examples, benchmarks and tests.
The simulator uses a structural equation model (via lava::lvm) to generate
realistic correlations between covariates. Event times are then generated from
cause-specific Weibull proportional hazards models, where the linear predictor
depends on the simulated covariates (and scenario).
The following baseline covariates are generated (column name, type, interpretation):
- sex
factor. Binary sex indicator (generated Bernoulli, then stored as factor).- age
numeric. Age at baseline (years).- diabetes_duration
numeric. Duration of diabetes at baseline (years).- value_SBP
numeric. Systolic blood pressure (SBP).- value_LDL
numeric. LDL cholesterol.- value_HBA1C
numeric. HbA1c.- value_Albuminuria
factorwith levelsNormal,Micro,Macro. Albuminuria category.- eGFR
numeric. Estimated glomerular filtration rate, constructed from latent age-dependent log2 eGFR components (higher values indicate better kidney function).- value_Smoking
factor. Smoking indicator (generated from a logistic model, then stored as factor).- value_Motion
factor. Physical activity indicator (generated from a logistic model, then stored as factor).
Event time variables are generated from latent Weibull PH models:
time.event.1 (CVD), time.event.0 (censoring), and in scenario "alpha" also
time.event.2 (death without prior CVD). These latent variables are used to
construct the observed outcome variables returned by the function (see below).
Value
A data.table with at least the following columns:
- id
integer. Subject identifier (1, ...,n).- time_cvd
numeric. Observed follow-up time (minimum of event and censoring times; also includes competing risk time ifcompeting_risks = TRUEin scenario"alpha").- status_cvd
integer. Observed event status:0= censored,1= CVD, and ifcompeting_risks = TRUEin scenario"alpha",2= death without prior CVD.- time
numeric. Alias oftime_cvd(kept for convenience).- event
integer. Alias ofstatus_cvd(kept for convenience).- uncensored_time_cvd
numeric. Event time ignoring censoring (minimum of event causes only).- uncensored_status_cvd
integer. Event cause ignoring censoring. In scenario"alpha"this is1(CVD) or2(death without CVD); in scenario"beta"this is always1.- uncensored_time
numeric. Alias ofuncensored_time_cvd.- uncensored_event
integer. Alias ofuncensored_status_cvd.
In addition, the returned table contains all baseline covariates listed in
Details. Internal latent variables used only for simulation are removed
before returning (e.g., log2 eGFR components and, in scenario "beta", the
hinge-squared features).
Author(s)
Thomas A. Gerds tag@biostat.ku.dk
Examples
simulateStenoT1(n = 20, scenario = "alpha", competing_risks = TRUE)
Summarize a fitted base learner object
Description
Dispatches to the underlying fitted model’s summary() method for the selected
cause, or returns a list of summaries for all causes.
Usage
## S3 method for class 'base_learner'
summary(object, cause = 1, ...)
Arguments
object |
|
cause |
|
... |
Passed to the underlying |
Value
If cause is a single integer, returns the underlying model summary for
that cause. If cause = NULL, returns a list of summaries (one per cause).
Examples
d <- simulateStenoT1(30, competing_risks = TRUE)
lrn <- Learner_glmnet(
covariates = c("sex", "value_LDL"),
lambda = 0.01,
cross_validation = FALSE
)
bl <- fit_learner(
d,
learner = lrn,
id = "id",
status = "status_cvd",
event_time = "time_cvd",
number_of_nodes = 3
)
out <- summary(bl, cause = 1)
Summarize a fitted Poisson Super Learner object
Description
Prints:
a compact description of the fitted ensemble,
cross-validated deviances for base learners (when available),
cause-specific meta-learner coefficients (stacking weights).
Usage
## S3 method for class 'poisson_superlearner'
summary(object, cause = NULL, model = "sl", ...)
Arguments
object |
|
cause |
|
model |
Scalar model selector. Default is
|
... |
Passed to the underlying |
Value
Invisibly returns a list with elements:
- cross_validation_deviance
data.table(orNULL).- meta_coefficients
List of length
n_criskswith cause-specific coefficient objects (orNULL).
Examples
d <- simulateStenoT1(30, competing_risks = TRUE)
learners <- list(
lasso = Learner_glmnet(
covariates = "sex",
alpha = 1,
lambda = 0.01,
cross_validation = FALSE
),
ridge = Learner_glmnet(
covariates = c("sex", "value_LDL"),
alpha = 0,
lambda = 0.01,
cross_validation = FALSE
)
)
fit <- Superlearner(
data = d,
id = "id",
status = "status_cvd",
event_time = "time_cvd",
learners = learners,
number_of_nodes = 3,
nfold = 2
)
s <- summary(fit, cause = 1)
names(s)