Skip to contents

Implements a Bagging (Bootstrap Aggregating) ensemble for prognostic models. It trains multiple base models on bootstrapped samples of the training data and aggregates their predictions.

Usage

bagging_pro(
  data,
  base_model_name,
  n_estimators = 10,
  subset_fraction = 0.632,
  tune_base_model = FALSE,
  time_unit = "day",
  years_to_evaluate = c(1, 3, 5),
  seed = 456
)

Arguments

data

A data frame for training. The first column must be the sample ID, the second column the event status (0/1), the third column the time, and subsequent columns the features.

base_model_name

A character string, the name of the base prognostic model to use (e.g., "lasso_pro", "rsf_pro"). This model must be registered.

n_estimators

An integer, the number of base models to train.

subset_fraction

A numeric value between 0 and 1, the fraction of samples to bootstrap for each base model.

tune_base_model

Logical, whether to enable tuning for each base model.

time_unit

A character string, the unit of time in the third column of data.

years_to_evaluate

A numeric vector of specific years at which to calculate time-dependent AUROC for evaluation.

seed

An integer, for reproducibility.

Value

A list containing the model_object, sample_score, and evaluation_metrics.

Examples

# \donttest{
# NOTE: This example requires the 'train_pro' dataset.
if (requireNamespace("E2E", quietly = TRUE) &&
"train_pro" %in% utils::data(package = "E2E")$results[,3]) {
  data(train_pro, package = "E2E")
  initialize_modeling_system_pro()

  bagging_lasso_results <- bagging_pro(
    data = train_pro,
    base_model_name = "lasso_pro",
    n_estimators = 3, # Small number for example speed
    subset_fraction = 0.8,
    years_to_evaluate = c(1, 3)
  )
  print_model_summary_pro("Bagging (Lasso)", bagging_lasso_results)
}
#> Prognosis modeling system already initialized.
#> Running Bagging model: Bagging_pro (base: lasso_pro)
#> Warning: from glmnet C++ code (error code -30001); Numerical error at 1th lambda value; solutions for larger values of lambda returned
#> Warning: an empty model has been returned; probably a convergence issue
#> Warning: Cannot perform KM analysis due to constant, all NA, or non-varying scores.
#> 
#> --- Bagging (Lasso) Prognosis Model (on Training Data) Metrics ---
#> Ensemble Type: Bagging (Base: lasso_pro, Estimators: 3)
#> C-index: 0.7215
#> Time-dependent AUROC (years 1, 3): 0.5403, 0.6605
#> Average Time-dependent AUROC: 0.6004
#> KM Group HR (High vs Low): 3.0837 (p-value: 5.352e-08, Cutoff: 0.2388)
#> --------------------------------------------------
# }