Extending bootstrap
The main idea of the bootstrap infrastructure is the ability to extend it to measure some statistical measurement in bootstrap analysis process (repeated experiment with replacement) - to assest the confidence interval, std, etc. In order to do that, you need to write function that calculates that measurement/s.
Function input - you just need to keep that signature and can't change that.
- Lazy_Iterator *iterator - this is iterator that allows you to "fetch" the data for calculating the measurements. (It's lazy iterator to allow it to be efficient without allocating memory for each bootstrap experiment, but with doing the randomization process on the fly)
- int thread_num - the thread num for parallelism - will be used by the "Lazy_Iterator"
- Measurement_Params function_params - optional pointer to function parameters (can be null if not given). You can pass parameters to your function in this way
How to use the iterator?*
You need to call "fetch_next" with thread_num, y, pred, w. IT will return true if you didn't reached to end of input.
If you want to use if outside of "bootstrap.cpp" in the infrastructure, you will need to use "fetch_next_external" instead. The difference is that "fetch_next" is optimized and not exists outside of bootstrap.cpp, so you can't referred to it (The optimization is just slightly speedup).
You can also pass "ret_preds_order" to fetch_next when you have multiple predictions and your measurement function is more complicated (for example in multi label outcomes).
Arguments meanings:
y - it's the label/outcome
pred - the prediction/score
w - weight, if no weights than it's "-1". If your measurment doesn't support weights, please check that "w==-1" and throw error if you received weights
function_params - if the function has parameters, please check it is not "null" (if null please use default parameters), cast this object to your parameter object.
Measurement_Params is simple class that you need to extend if you want to specify parameters. For example class "ROC_Params". Example peace of used code in "calc_roc_measures_with_inc":
Function output
The function returns "map" from string to float. the key of the map is the name of the measurement and the float is the corresponding value. In that way, you can calculate multiple measurements in a single function and give them names. When used, the infrastructure will append suffix for each measurement: "_Mean", "_Std", "_CI.Lower.95", "_CI.Upper.95", "_Obs" as in this page: Bootstrap legend
Here is a simple example of function that counts how many cases and how many controls exists:
This function iterates through the data and "counts" how many of each outcome we see in the data and stores it in "cnt" variable. It also supports weights.
How to use the custom function?
Expand source
When executing bootstrap, you can append multiple measurement functions and the bootstrap process will be applied on all functions.