The bootstrap infrastructure is designed to be extensible, allowing you to add custom statistical measurements to the bootstrap analysis process (i.e., repeated experiments with replacement) to estimate confidence intervals, standard deviations, and more.
To do this, you simply write a function that computes your desired measurement(s).
Function Signature
Your measurement function must use the following signature:
Lazy_Iterator *iterator: Used to fetch data for measurement calculation. This iterator is efficient, performing randomization on the fly without allocating memory for each bootstrap experiment.
int thread_num: The thread number for parallel execution, used by the iterator.
Measurement_Params *function_params: Optional pointer to function parameters (can be nullptr). Use this to pass custom parameters to your function.
Using the Iterator
Call fetch_next(thread_num, y, pred, w) to iterate through the data. It returns true until the end of input is reached.
If you are using your function outside of bootstrap.cpp, use fetch_next_external instead (the only difference is a minor optimization in fetch_next).
For more complex cases (e.g., multi-label outcomes), you can pass ret_preds_order to fetch_next when you have multiple predictions.
Argument meanings:
y: Label/outcome
pred: Prediction/score
w: Weight (-1 if not used). If your measurement does not support weights, check that w == -1 and throw an error if weights are present.
function_params:
If your function uses parameters, check that function_params is not nullptr (otherwise use defaults), and cast it to your parameter type. Measurement_Params is a simple class you can extend (e.g., ROC_Params).
map<string,float>calc_roc_measures_with_inc(Lazy_Iterator*iterator,intthread_num,Measurement_Params*function_params){ROC_Paramsdefault_params;ROC_Params*params=&default_params;if(function_params!=NULL)params=(ROC_Params*)function_params;// Use params as neededfloatmax_diff_in_wp=params->max_diff_working_point;// ...}
Function Output
Your function should return a map<string, float>, where each key is the name of a measurement and the value is the result. This allows you to compute multiple measurements in a single function.
The infrastructure will automatically append suffixes to each measurement: _Mean, _Std, _CI.Lower.95, _CI.Upper.95, _Obs. See: Bootstrap legend
#include<MedStat/MedStat/MedBootstrap.h>// Example custom measurement functionmap<string,float>MY_CUSTOMIZED_MEASUREMENT_FUNCTION(Lazy_Iterator*iterator,intthread_num,Measurement_Params*function_params){// ...}// Example custom parameter classclassCustom_Measurement_Parameters:publicMeasurement_Params{public:intmin_bin_size=0;floatcases_weight=1;floatcontrols_weight=1;};// Helper function to generate matrix for filtering (needed if generating cohorts and filtering with bootstrap)voidget_mat(conststring&json_model,conststring&rep_path,MedSamples&samples,MedModel&mdl){if(!json_model.empty())mdl.init_from_json_file(json_model);elseMLOG("No json for bootstrap - can only use Time-Window,Age,Gender filters\n");boolneed_age=true,need_gender=true;for(FeatureGenerator*generator:mdl.generators){if(generator->generator_type==FTR_GEN_AGE)need_age=false;if(generator->generator_type==FTR_GEN_GENDER)need_gender=false;}if(need_age)mdl.add_age();if(need_gender)mdl.add_gender();vector<int>pids_to_take;samples.get_ids(pids_to_take);MedPidRepositoryrep;mdl.load_repository(rep_path,pids_to_take,rep,true);if(mdl.learn(rep,&samples,MedModelStage::MED_MDL_LEARN_REP_PROCESSORS,MedModelStage::MED_MDL_APPLY_FTR_PROCESSORS)<0)MTHROW_AND_ERR("Error creating features for filtering\n");}intmain(intargc,char*argv[]){// Set bootstrap parameters (see MedBootstrap::init for all options)stringbt_params="loopcnt=500;sample_per_pid=1";MedBootstrapResultbt;bt.bootstrap_params.init_from_string(bt_params);// Custom measurement parametersCustom_Measurement_Parameterscustom_args;stringjson_model;// Optional: for generating features for cohort filteringstringcohorts_file;// Path to cohort definitions. If not specified, all samples are used as a single cohort.stringrep_path;// Path to repository (needed if filtering by more than Time-Window)MedSamplespreds;// Your MedSamples with predictionspreds.read_from_file("YOUR_SAMPLES_PATH");if(!cohorts_file.empty())bt.bootstrap_params.parse_cohort_file(cohorts_file);// Set measurement functions (default is calc_roc_measures_with_inc)vector<pair<MeasurementFunctions,Measurement_Params*>>&cl_m=bt.bootstrap_params.measurements_with_params;cl_m.clear();cl_m.push_back(pair<MeasurementFunctions,Measurement_Params*>(MY_CUSTOMIZED_MEASUREMENT_FUNCTION,&custom_args));// Create matrix for json and repositoryMedModelmdl;get_mat(json_model,rep_path,smps_preds,mdl);MedFeatures&bt_features_matrix=mdl.features;// Prepare features for bootstrap analysisvector<float>preds_v,labelsOrig;vector<int>pidsOrig,preds_order;map<string,vector<float>>bt_data;bt.bootstrap_params.prepare_bootstrap(bt_features_matrix,preds_v,labelsOrig,pidsOrig,bt_data,preds_order);// Run bootstrap analysisbt.bootstrap(preds,bt_data);// Access results: bt.bootstrap_results is a map of maps (cohort name -> measurement name -> value)for(auto&it_cohort:bt.bootstrap_results){conststring&cohort_name=it_cohort.first;for(auto&it_measurement:it_cohort.second){conststring&measurement_name=it_measurement.first;floatvalue=it_measurement.second;fprintf(stdout,"%s %s %f\n",cohort_name.c_str(),measurement_name.c_str(),value);}}}
You can append multiple measurement functions; the bootstrap process will apply all of them.