Skip to contents

Implements a Bagging (Bootstrap Aggregating) ensemble for diagnostic models. It trains multiple base models on bootstrapped samples of the training data and aggregates their predictions by averaging probabilities.

Usage

bagging_dia(
  data,
  base_model_name,
  n_estimators = 50,
  subset_fraction = 0.632,
  tune_base_model = FALSE,
  threshold_strategy = "default",
  specific_threshold_value = 0.5,
  positive_label_value = 1,
  negative_label_value = 0,
  new_positive_label = "Positive",
  new_negative_label = "Negative",
  seed = 456
)

Arguments

data

A data frame where the first column is the sample ID, the second is the outcome label, and subsequent columns are features.

base_model_name

A character string, the name of the base diagnostic model to use (e.g., "rf", "lasso"). This model must be registered.

n_estimators

An integer, the number of base models to train.

subset_fraction

A numeric value between 0 and 1, the fraction of samples to bootstrap for each base model.

tune_base_model

Logical, whether to enable tuning for each base model.

threshold_strategy

A character string (e.g., "f1", "youden", "default") or a numeric value (0-1) for determining the evaluation threshold for the ensemble.

specific_threshold_value

A numeric value between 0 and 1. Only used if threshold_strategy is "numeric".

positive_label_value

A numeric or character value in the raw data representing the positive class.

negative_label_value

A numeric or character value in the raw data representing the negative class.

new_positive_label

A character string, the desired factor level name for the positive class (e.g., "Positive").

new_negative_label

A character string, the desired factor level name for the negative class (e.g., "Negative").

seed

An integer, for reproducibility.

Value

A list containing the model_object, sample_score, and evaluation_metrics.

Examples

# \donttest{
# This example assumes your package includes a dataset named 'train_dia'.
# If not, create a toy data frame first.
if (exists("train_dia")) {
  initialize_modeling_system_dia()

  bagging_rf_results <- bagging_dia(
    data = train_dia,
    base_model_name = "rf",
    n_estimators = 5, # Reduced for a quick example
    threshold_strategy = "youden",
    positive_label_value = 1,
    negative_label_value = 0,
    new_positive_label = "Case",
    new_negative_label = "Control"
  )
  print_model_summary_dia("Bagging (RF)", bagging_rf_results)
}
#> Diagnostic modeling system already initialized.
#> Running Bagging model: Bagging_dia (base: rf)
#> 
#> --- Bagging (RF) Model (on Training Data) Metrics ---
#> Ensemble Type: Bagging (Base: rf, Estimators: 5)
#> 
#> AUROC: 0.9999 (95% CI: 0.9997 - 1.0000)
#> AUPRC: 1.0000
#> Accuracy: 0.9942
#> F1: 0.9968
#> Precision: 1.0000
#> Recall: 0.9936
#> Specificity: 1.0000
#> --------------------------------------------------
# }