E2E Package Parameter Reference Guide
This guide provides comprehensive parameter documentation for all E2E functions.
Built-in Datasets
E2E includes example datasets for both diagnostic and prognostic modeling:
Built-in Models
Diagnostic Models (12 algorithms)
- rf: Random Forest
-
xb: XGBoost
- svm: Support Vector Machine
- mlp: Multi-Layer Perceptron
- lasso: L1-regularized Logistic Regression
- en: Elastic Net
- ridge: L2-regularized Logistic Regression
- lda: Linear Discriminant Analysis
- qda: Quadratic Discriminant Analysis
- nb: Naive Bayes
- dt: Decision Tree
- gbm: Gradient Boosting Machine
Prognostic Models (9 algorithms)
- lasso_pro: Lasso Cox Regression
-
en_pro: Elastic Net Cox Regression
- ridge_pro: Ridge Cox Regression
- stepcox_pro: Stepwise Cox Regression
- gbm_pro: Gradient Boosting Machine for Cox
- rsf_pro: Random Survival Forest
- xgb_pro: XGBoost for prognosis
- pc_pro: Supervised Principal Components (SuperPC)
- pls_pro: Partial Least Squares Cox Regression
Integrated Pipeline Functions
int_dia()
Comprehensive diagnostic modeling pipeline executing single models, bagging, stacking, and voting ensembles.
Parameters:
...(required): Data frames for analysis. First = training dataset; subsequent = test datasets. Structure: column 1 = sample ID, column 2 = outcome (0/1), columns 3+ = featurestune: Logical (default TRUE). Enable hyperparameter tuning for base modelsn_estimators: Integer (default 10). Number of bootstrap samples for baggingseed: Integer (default 123). Random seed for reproducibility
int_pro()
Comprehensive prognostic modeling pipeline for survival analysis.
Parameters:
...(required): Data frames. First = training; others = test sets. Structure: column 1 = ID, column 2 = survival status (0/1), column 3 = survival time, columns 4+ = featurestune: Logical (default TRUE). Enable tuningn_estimators: Integer (default 10). Bagging iterationsseed: Integer (default 123). Random seedtime_unit: String (default “day”). Time unit: “day”, “month”, or “year”years_to_evaluate: Numeric vector (default c(1,3,5)). Years for time-dependent AUROC
plot_integrated_results()
Visualizes integrated modeling results with heatmap and summary plots.
Parameters:
results_obj(required): Output fromint_dia(),int_imbalance(), orint_pro()metric_name: Character string (default “AUROC”). Metric name for plot labels (e.g., “AUROC”, “C-index”)
Returns: ggplot object (invisibly)
Diagnostic Modeling Functions
models_dia()
Trains base classification models for diagnostic tasks. Parameters:
data(required): Data frame with sample names (column 1), outcomes 0/1 (column 2), features (columns 3+)model(required): Character vector of model names or “all_dia” for all modelstune: Logical (default FALSE). Whether to perform hyperparameter tuning-
threshold_choices: Threshold selection method- “default” (default): Fixed 0.5 threshold
- “f1”: Optimize F1 score
- “youden”: Optimize Youden index
- Numeric value (0-1): Custom threshold
seed: Integer (default 123). Random seed for reproducibility
bagging_dia()
Bootstrap aggregating ensemble method. Parameters:
data(required): Training data framebase_model_name(required): Base model name (e.g., “xb”, “rf”)n_estimators: Integer (default 50). Number of base modelssubset_fraction: Numeric (default 0.632). Bootstrap sampling fractiontune_base_model: Logical (default FALSE). Tune base modelsthreshold_choices: Same as models_dia()seed: Integer (default 123). Random seed
voting_dia()
Voting ensemble combining multiple models. Parameters:
results_all_models(required): Output from models_dia()data(required): Training data-
type: Voting type- “soft” (default): Weighted probability averaging
- “hard”: Majority voting
weight_metric: String (default “AUROC”). Metric for soft voting weightstop: Integer (default 5). Number of top models to usethreshold_choices: Same as models_dia()seed: Integer (default 123). Random seed
stacking_dia()
Stacking ensemble with meta-model. Parameters:
results_all_models(required): Output from models_dia()data(required): Training datameta_model_name(required): Meta-model name (e.g., “lasso”, “gbm”)top: Integer (default 5). Number of top base modelstune_meta: Logical (default FALSE). Tune meta-modelthreshold_choices: Same as models_dia()seed: Integer (default 123). Random seed
imbalance_dia()
Handles imbalanced datasets using EasyEnsemble-like algorithm. Parameters:
data(required): Imbalanced training database_model_name(required): Base model for balanced subsetsn_estimators: Integer (default 10). Number of balanced subsetstune_base_model: Logical (default FALSE). Tune base modelsthreshold_choices: Same as models_dia()seed: Integer (default 123). Random seed
Prognostic Modeling Functions
models_pro()
Trains base survival models. Parameters:
data(required): Data frame with sample ID, survival status, time, featuresmodel(required): Model names or “all_pro” for all modelstune: Logical (default FALSE). Hyperparameter tuningtime_unit: String (default “day”). Time unit (“day”, “month”, “year”)years_to_evaluate: Numeric vector (default c(1,3,5)). Time points for time-dependent AUROCseed: Integer (default 789). Random seed
stacking_pro()
Stacking ensemble for survival analysis. Parameters:
results_all_models(required): Output from models_pro()data(required): Training datameta_model_name(required): Meta-model nametop: Integer (default 3). Number of top base modelstune_meta: Logical (default FALSE). Tune meta-modeltime_unit: String (default “day”). Time unityears_to_evaluate: Numeric vector (default c(1,3,5)). Evaluation time pointsseed: Integer (default 789). Random seed
bagging_pro()
Bootstrap aggregating for survival analysis. Parameters:
data(required): Training database_model_name(required): Base model namen_estimators: Integer (default 10). Number of base modelssubset_fraction: Numeric (default 0.632). Bootstrap sampling fractiontune_base_model: Logical (default FALSE). Tune base modelstime_unit: String (default “day”). Time unityears_to_evaluate: Numeric vector (default c(1,3,5)). Evaluation time pointsseed: Integer (default 456). Random seed
Visualization Functions
figure_dia()
Creates diagnostic model evaluation plots. Parameters:
-
type(required): Plot type- “roc”: ROC curve
- “prc”: Precision-recall curve
- “matrix”: Confusion matrix
-
data(required): Model results object