E2E Package Parameter Reference Guide
This guide provides comprehensive parameter documentation for all E2E functions.
Built-in Datasets
E2E includes example datasets for both diagnostic and prognostic modeling:
Built-in Models
Diagnostic Models (12 algorithms)
- rf: Random Forest
-
xb: XGBoost
- svm: Support Vector Machine
- mlp: Multi-Layer Perceptron
- lasso: L1-regularized Logistic Regression
- en: Elastic Net
- ridge: L2-regularized Logistic Regression
- lda: Linear Discriminant Analysis
- qda: Quadratic Discriminant Analysis
- nb: Naive Bayes
- dt: Decision Tree
- gbm: Gradient Boosting Machine
Diagnostic Modeling Functions
models_dia()
Trains base classification models for diagnostic tasks. Parameters:
data
(required): Data frame with sample names (column 1), outcomes 0/1 (column 2), features (columns 3+)model
(required): Character vector of model names or “all_dia” for all modelstune
: Logical (default FALSE). Whether to perform hyperparameter tuning-
threshold_choices
: Threshold selection method- “default” (default): Fixed 0.5 threshold
- “f1”: Optimize F1 score
- “youden”: Optimize Youden index
- Numeric value (0-1): Custom threshold
seed
: Integer (default 123). Random seed for reproducibility
bagging_dia()
Bootstrap aggregating ensemble method. Parameters:
data
(required): Training data framebase_model_name
(required): Base model name (e.g., “xb”, “rf”)n_estimators
: Integer (default 50). Number of base modelssubset_fraction
: Numeric (default 0.632). Bootstrap sampling fractiontune_base_model
: Logical (default FALSE). Tune base modelsthreshold_choices
: Same as models_dia()seed
: Integer (default 123). Random seed
voting_dia()
Voting ensemble combining multiple models. Parameters:
results_all_models
(required): Output from models_dia()data
(required): Training data-
type
: Voting type- “soft” (default): Weighted probability averaging
- “hard”: Majority voting
weight_metric
: String (default “AUROC”). Metric for soft voting weightstop
: Integer (default 5). Number of top models to usethreshold_choices
: Same as models_dia()seed
: Integer (default 123). Random seed
stacking_dia()
Stacking ensemble with meta-model. Parameters:
results_all_models
(required): Output from models_dia()data
(required): Training datameta_model_name
(required): Meta-model name (e.g., “lasso”, “gbm”)top
: Integer (default 5). Number of top base modelstune_meta
: Logical (default FALSE). Tune meta-modelthreshold_choices
: Same as models_dia()seed
: Integer (default 123). Random seed
imbalance_dia()
Handles imbalanced datasets using EasyEnsemble-like algorithm. Parameters:
data
(required): Imbalanced training database_model_name
(required): Base model for balanced subsetsn_estimators
: Integer (default 10). Number of balanced subsetstune_base_model
: Logical (default FALSE). Tune base modelsthreshold_choices
: Same as models_dia()seed
: Integer (default 123). Random seed
Prognostic Modeling Functions
models_pro()
Trains base survival models. Parameters:
data
(required): Data frame with sample ID, survival status, time, featuresmodel
(required): Model names or “all_pro” for all modelstune
: Logical (default FALSE). Hyperparameter tuningtime_unit
: String (default “day”). Time unit (“day”, “month”, “year”)years_to_evaluate
: Numeric vector (default c(1,3,5)). Time points for time-dependent AUROCseed
: Integer (default 789). Random seed
stacking_pro()
Stacking ensemble for survival analysis. Parameters:
results_all_models
(required): Output from models_pro()data
(required): Training datameta_model_name
(required): Meta-model nametop
: Integer (default 3). Number of top base modelstune_meta
: Logical (default FALSE). Tune meta-modeltime_unit
: String (default “day”). Time unityears_to_evaluate
: Numeric vector (default c(1,3,5)). Evaluation time pointsseed
: Integer (default 789). Random seed
bagging_pro()
Bootstrap aggregating for survival analysis. Parameters:
data
(required): Training database_model_name
(required): Base model namen_estimators
: Integer (default 10). Number of base modelssubset_fraction
: Numeric (default 0.632). Bootstrap sampling fractiontune_base_model
: Logical (default FALSE). Tune base modelstime_unit
: String (default “day”). Time unityears_to_evaluate
: Numeric vector (default c(1,3,5)). Evaluation time pointsseed
: Integer (default 456). Random seed
Visualization Functions
figure_dia()
Creates diagnostic model evaluation plots. Parameters:
-
type
(required): Plot type- “roc”: ROC curve
- “prc”: Precision-recall curve
- “matrix”: Confusion matrix
-
data
(required): Model results object