Skip to contents

Applies a previously trained model (or ensemble) to a new, unseen dataset to generate predicted probabilities.

Usage

apply_dia(
  trained_model_object,
  new_data,
  label_col_name = NULL,
  pos_class,
  neg_class
)

Arguments

trained_model_object

A trained model object, as returned by models_dia, bagging_dia, stacking_dia, voting_dia, or imbalance_dia.

new_data

A data frame containing the new data for prediction. The first column must be the sample ID, subsequent columns are features.

label_col_name

A character string, the name of the column containing the class labels in the new data. This is optional and only used to include true labels in the output; it is not used for prediction.

pos_class

A character string, the label for the positive class (must match the label used during training).

neg_class

A character string, the label for the negative class (must match the label used during training).

Value

A data frame with sample (ID), label (original numeric label from new data, or NA if not provided), and score (predicted probability for the positive class).

Examples

# \donttest{
# 1. Assume 'train_dia' and 'test_dia' are loaded from your package
# data(train_dia)
# data(test_dia) # test_dia has same structure, maybe without the label column
initialize_modeling_system_dia()
#> Diagnostic modeling system initialized and default models registered.

# 2. Train a model
train_results <- models_dia(
  data = train_dia, model = "lasso",
  new_positive_label = "Case", new_negative_label = "Control"
)
#> Running model: lasso
#> Loading required package: ggplot2
#> Loading required package: lattice
trained_lasso_model <- train_results$lasso$model_object

# 3. Apply the trained model to new data
new_predictions <- apply_dia(
  trained_model_object = trained_lasso_model,
  new_data = test_dia,
  label_col_name = "Disease_Status", # Optional
  pos_class = "Case",
  neg_class = "Control"
)
#> Applying model to new data...
#> Warning: Label column 'Disease_Status' not found in new data. 'label' column in results will be NA.
utils::head(new_predictions)
#>                         sample label     score
#> 1 TCGA-A2-A25D-01A-12R-A16F-07    NA 0.9756574
#> 2 TCGA-AC-A23C-01A-12R-A169-07    NA 0.9967656
#> 3 TCGA-AR-A5QP-01A-11R-A28M-07    NA 0.9645596
#> 4 TCGA-AC-A8OQ-01A-11R-A41B-07    NA 0.9625082
#> 5 TCGA-AC-A2FM-11B-32R-A19W-07    NA 0.7016649
#> 6 TCGA-C8-A1HE-01A-11R-A13Q-07    NA 0.9876338
# }