Medial Code Documentation
Loading...
Searching...
No Matches
Public Member Functions | Static Public Member Functions | Data Fields | Static Public Attributes
MedBootstrap Class Reference

Bootstrap wrapper for Medila Infrastructure objects, simplify the parameters and the input, output process. More...

#include <MedBootstrap.h>

Inheritance diagram for MedBootstrap:
SerializableObject

Public Member Functions

void parse_cohort_line (const string &line)
 parsing specific line.
 
void get_cohort_from_arg (const string &single_cohort)
 A function which reads a single cohort definition from the command line and parses it.
 
void parse_cohort_file (const string &cohorts_path)
 a function which reads cohorts file and stores it in filter_cohort.
 
 MedBootstrap ()
 defualt Ctor.
 
int init (map< string, string > &map)
 Initialization string with format "parameter_name=value;..." each paramter_name is same as the class name field.
 
void clean_feature_name_prefix (map< string, vector< float > > &features)
 cleans the initiale "FTR_" from the feature names in MedFeatures created by the infra pipeline
 
void prepare_bootstrap (const MedFeatures &features, vector< float > &preds, vector< float > &y, vector< int > &pids, map< string, vector< float > > &final_additional_info, vector< int > &preds_order, unordered_map< int, vector< int > > *splits_inds=NULL)
 prepares the required vectors for bootstrap from MedFeatures &features
 
void prepare_bootstrap (MedSamples &samples, map< string, vector< float > > &additional_info, vector< float > &preds, vector< float > &y, vector< int > &pids, vector< int > &preds_order, unordered_map< int, vector< int > > *splits_inds=NULL)
 prepares the required vectors for bootstrap from samples, additional_info
 
map< string, map< string, float > > bootstrap (const MedFeatures &features, map< int, map< string, map< string, float > > > *results_per_split=NULL, with_registry_args *registry_args=NULL)
 Will run the bootstraping process on all cohorts and measurements.
 
map< string, map< string, float > > bootstrap (MedSamples &samples, map< string, vector< float > > &additional_info, map< int, map< string, map< string, float > > > *results_per_split=NULL, with_registry_args *registry_args=NULL)
 Will run the bootstraping process on all cohorts and measurements.
 
map< string, map< string, float > > bootstrap (MedSamples &samples, const string &rep_path, map< int, map< string, map< string, float > > > *results_per_split=NULL, with_registry_args *registry_args=NULL)
 Will run the bootstraping process on all cohorts and measurements.
 
map< string, map< string, float > > bootstrap (MedSamples &samples, MedPidRepository &rep, map< int, map< string, map< string, float > > > *results_per_split=NULL, with_registry_args *registry_args=NULL)
 Will run the bootstraping process on all cohorts and measurements.
 
void apply_censor (const unordered_map< int, int > &pid_censor_dates, MedSamples &samples)
 censors samples from samples based on time_range provided in pid_censor_dates.
 
void apply_censor (const unordered_map< int, int > &pid_censor_dates, MedFeatures &features)
 censors samples from features based on time_range provided in pid_censor_dates.
 
void apply_censor (const vector< int > &pids, const vector< int > &censor_dates, MedFeatures &features)
 censors samples from features based on time_range provided in pids,censor_dates.
 
void apply_censor (const vector< int > &pids, const vector< int > &censor_dates, MedSamples &samples)
 censors samples from samples based on time_range provided in pids,censor_dates.
 
void change_sample_autosim (MedSamples &samples, int min_time, int max_time, MedSamples &new_samples)
 changing the samples to be auto-simulations - taking max score in the time window for each pid
 
void change_sample_autosim (MedFeatures &features, int min_time, int max_time, MedFeatures &new_features)
 changing the samples to be auto-simulations - taking max score in the time window for each pid
 
MeasurmentFunctionType measurement_function_name_to_type (const string &measurement_function_name)
 convert measurement function name to type
 
- Public Member Functions inherited from SerializableObject
virtual int version () const
 Relevant for serializations.
 
virtual string my_class_name () const
 For better handling of serializations it is highly recommended that each SerializableObject inheriting class will implement the next method.
 
virtual void serialized_fields_name (vector< string > &field_names) const
 The names of the serialized fields.
 
virtual void * new_polymorphic (string derived_name)
 for polymorphic classes that want to be able to serialize/deserialize a pointer * to the derived class given its type one needs to implement this function to return a new to the derived class given its type (as in my_type)
 
virtual void pre_serialization ()
 
virtual void post_deserialization ()
 
virtual size_t get_size ()
 Gets bytes sizes for serializations.
 
virtual size_t serialize (unsigned char *blob)
 Serialiazing object to blob memory. return number ob bytes wrote to memory.
 
virtual size_t deserialize (unsigned char *blob)
 Deserialiazing blob to object. returns number of bytes read.
 
size_t serialize_vec (vector< unsigned char > &blob)
 
size_t deserialize_vec (vector< unsigned char > &blob)
 
virtual size_t serialize (vector< unsigned char > &blob)
 
virtual size_t deserialize (vector< unsigned char > &blob)
 
virtual int read_from_file (const string &fname)
 read and deserialize model
 
virtual int write_to_file (const string &fname)
 serialize model and write to file
 
virtual int read_from_file_unsafe (const string &fname)
 read and deserialize model without checking version number - unsafe read
 
int init_from_string (string init_string)
 Init from string.
 
int init_params_from_file (string init_file)
 
int init_param_from_file (string file_str, string &param)
 
int update_from_string (const string &init_string)
 
virtual int update (map< string, string > &map)
 Virtual to update object from parsed fields.
 
virtual string object_json () const
 

Static Public Member Functions

static void filter_bootstrap_cohort (MedFeatures &features, const string &bt_cohort)
 commit bootstrap cohort filter on a given matrix
 
static void filter_bootstrap_cohort (MedPidRepository &bt_repository, MedModel &bt_filters, MedSamples &curr_samples, const string &bt_cohort)
 commit bootstrap cohort filter on a given samples
 
static void filter_bootstrap_cohort (const string &rep, const string &bt_json, MedSamples &curr_samples, const string &bt_cohort)
 commit bootstrap cohort filter on a given samples
 

Data Fields

ROC_Params roc_Params
 Controling the roc parameters: sensitivity, specificity...
 
Regression_Params regression_params
 params for regerssion
 
Multiclass_Params multiclass_params
 Controling the multi class parameters: top n...
 
map< string, vector< Filter_Param > > filter_cohort
 the cohorts definitions. name to parameters range to intersect
 
map< string, FilterCohortFuncadditional_cohorts
 not Serializable! additional cohorts given by function
 
float sample_ratio
 the sample ratio of the patients out of all patients in each bootstrap
 
int sample_per_pid
 how many samples to take for each patients. 0 - means no sampling take all sample for patient
 
bool sample_patient_label
 if true will treat patient+label as the "id" for the sampling
 
int sample_seed
 if 0 will use random_device
 
int loopCnt
 the bootstrap count
 
bool is_binary_outcome
 only used for validating bootstrap input
 
bool use_time_control_as_case
 if True will use time window condition for controls same as cases.
 
bool simTimeWindow
 Time window simulation (in cohorts with Time-Window filtering) - instead of censoring cases out of time range , treat them as controls.
 
float censor_time_factor
 
bool sort_preds_in_multicategory
 
size_t num_categories
 number of categories
 
vector< pair< MeasurementFunctions, Measurement_Params * > > measurements_with_params
 not Serializable! the measurements with the params
 

Static Public Attributes

static unordered_map< string, MeasurmentFunctionType > measurement_function_name_map
 

Detailed Description

Bootstrap wrapper for Medila Infrastructure objects, simplify the parameters and the input, output process.


for more control and lower level interface please refer to bootstrap.h

Constructor & Destructor Documentation

◆ MedBootstrap()

MedBootstrap::MedBootstrap ( )

defualt Ctor.

look for ROC_Params defaults. cohorts consists of 1 cohort called "All" with not filtering

Member Function Documentation

◆ apply_censor() [1/4]

void MedBootstrap::apply_censor ( const unordered_map< int, int > &  pid_censor_dates,
MedFeatures features 
)

censors samples from features based on time_range provided in pid_censor_dates.

the format is map from pid to max_date the after that date the sample is filtered.

Returns
update features - changes outcomeDate for controls to censor date.

◆ apply_censor() [2/4]

void MedBootstrap::apply_censor ( const unordered_map< int, int > &  pid_censor_dates,
MedSamples samples 
)

censors samples from samples based on time_range provided in pid_censor_dates.

the format is map from pid to max_date the after that date the sample is filtered.

Returns
update samples - changes outcomeDate for controls to censor date.

◆ apply_censor() [3/4]

void MedBootstrap::apply_censor ( const vector< int > &  pids,
const vector< int > &  censor_dates,
MedFeatures features 
)

censors samples from features based on time_range provided in pids,censor_dates.

pids and censor_dates are same sizes. for each pid and the coresponding date in censor_dates, filtering pid's samples after that date.

Returns
update features - changes outcomeDate for controls to censor date.

◆ apply_censor() [4/4]

void MedBootstrap::apply_censor ( const vector< int > &  pids,
const vector< int > &  censor_dates,
MedSamples samples 
)

censors samples from samples based on time_range provided in pids,censor_dates.

pids and censor_dates are same sizes. for each pid and the coresponding date in censor_dates, filtering pid's samples after that date.

Returns
update samples - changes outcomeDate for controls to censor date.

◆ bootstrap() [1/4]

map< string, map< string, float > > MedBootstrap::bootstrap ( const MedFeatures features,
map< int, map< string, map< string, float > > > *  results_per_split = NULL,
with_registry_args registry_args = NULL 
)

Will run the bootstraping process on all cohorts and measurements.

MedFeatures need to contains also the information for the cohorts defenitions. for example: if there is Age:40-80, MedFeatures should contain Age Feature

Returns
the bootstrap results in map from cohort_name to all cohort measurements(a map). Each measurement is key,value in the map from measurement name to it's value if splits_inds is not NULL and mapping from each split value to it's coresponding indexes in the samples are provided - it will return also results for each split the higest level in the map is the split value

◆ bootstrap() [2/4]

map< string, map< string, float > > MedBootstrap::bootstrap ( MedSamples samples,
const string &  rep_path,
map< int, map< string, map< string, float > > > *  results_per_split = NULL,
with_registry_args registry_args = NULL 
)

Will run the bootstraping process on all cohorts and measurements.

the input is samples, and rep_path. The rep_path is path to the repository which adds Age,Gender signals for creating the cohorts definitions. it's simple overload for convention

Returns
the bootstrap results in map from cohort_name to all cohort measurements(a map). Each measurement is key,value in the map from measurement name to it's value if splits_inds is not NULL and mapping from each split value to it's coresponding indexes in the samples are provided - it will return also results for each split the higest level in the map is the split value

◆ bootstrap() [3/4]

map< string, map< string, float > > MedBootstrap::bootstrap ( MedSamples samples,
map< string, vector< float > > &  additional_info,
map< int, map< string, map< string, float > > > *  results_per_split = NULL,
with_registry_args registry_args = NULL 
)

Will run the bootstraping process on all cohorts and measurements.

the input is samples, additional_info. additional_info is provided for filtering and creating the cohorts. for example - Age:40-80 and Males

Returns
the bootstrap results in map from cohort_name to all cohort measurements(a map). Each measurement is key,value in the map from measurement name to it's value if splits_inds is not NULL and mapping from each split value to it's coresponding indexes in the samples are provided - it will return also results for each split the higest level in the map is the split value

◆ bootstrap() [4/4]

map< string, map< string, float > > MedBootstrap::bootstrap ( MedSamples samples,
MedPidRepository rep,
map< int, map< string, map< string, float > > > *  results_per_split = NULL,
with_registry_args registry_args = NULL 
)

Will run the bootstraping process on all cohorts and measurements.

the input is samples, and rep. The rep is the repository which adds Age,Gender signals for creating the cohorts definitions. it's simple overload for convention

Returns
the bootstrap results in map from cohort_name to all cohort measurements(a map). Each measurement is key,value in the map from measurement name to it's value if splits_inds is not NULL and mapping from each split value to it's coresponding indexes in the samples are provided - it will return also results for each split the higest level in the map is the split value

◆ change_sample_autosim() [1/2]

void MedBootstrap::change_sample_autosim ( MedFeatures features,
int  min_time,
int  max_time,
MedFeatures new_features 
)

changing the samples to be auto-simulations - taking max score in the time window for each pid

Returns
updates new_features from features

◆ change_sample_autosim() [2/2]

void MedBootstrap::change_sample_autosim ( MedSamples samples,
int  min_time,
int  max_time,
MedSamples new_samples 
)

changing the samples to be auto-simulations - taking max score in the time window for each pid

Returns
updates new_samples from samples

◆ filter_bootstrap_cohort() [1/3]

void MedBootstrap::filter_bootstrap_cohort ( const string &  rep,
const string &  bt_json,
MedSamples curr_samples,
const string &  bt_cohort 
)
static

commit bootstrap cohort filter on a given samples

Parameters
rep- repository path
bt_json- the json model to generate matrix for filtering the bootstrap cohort Automatically Age,Gender are added
curr_samples- the samples to filter
bt_cohort- a single line cohort (no support for MULTI) without the cohort name. only the filter definition. no tabs in the string.
Returns
filter samples from curr_samples by cohort definition

◆ filter_bootstrap_cohort() [2/3]

void MedBootstrap::filter_bootstrap_cohort ( MedFeatures features,
const string &  bt_cohort 
)
static

commit bootstrap cohort filter on a given matrix

Parameters
features- matrix
bt_cohort- a single line cohort (no support for MULTI) without the cohort name. only the filter definition. no tabs in the string.
Returns
filter rows from features by cohort definition

◆ filter_bootstrap_cohort() [3/3]

void MedBootstrap::filter_bootstrap_cohort ( MedPidRepository bt_repository,
MedModel bt_filters,
MedSamples curr_samples,
const string &  bt_cohort 
)
static

commit bootstrap cohort filter on a given samples

Parameters
bt_repository- repository that was initialized for applying the bt_filters model to generate matrix
bt_filters- the model to generate matrix for filtering the bootstrap cohort
curr_samples- the samples to filter
bt_cohort- a single line cohort (no support for MULTI) without the cohort name. only the filter definition. no tabs in the string.
Returns
filter samples from curr_samples by cohort definition

◆ get_cohort_from_arg()

void MedBootstrap::get_cohort_from_arg ( const string &  single_cohort)

A function which reads a single cohort definition from the command line and parses it.


Please refer to parse_cohort_file for full spec.

◆ init()

int MedBootstrap::init ( map< string, string > &  map)
virtual

Initialization string with format "parameter_name=value;..." each paramter_name is same as the class name field.

filter_cohort is path to file roc_Params is the init string for ROC_PARAMS

if (param_name == "sample_ratio") {
sample_ratio = stof(param_value);
if (sample_ratio > 1.0 || sample_ratio < 0)
MTHROW_AND_ERR("sample_ratio should be between 0-1, got %2.3f\n", sample_ratio);
}
else if (param_name == "sample_per_pid")
sample_per_pid = stoi(param_value);
else if (param_name == "loopcnt")
loopCnt = stoi(param_value);
else if (param_name == "sample_seed")
sample_seed = stoi(param_value);
else if (param_name == "sample_patient_label")
sample_patient_label = stoi(param_value) > 0;
else if (param_name == "roc_params")
roc_Params = ROC_Params(param_value);
else if (param_name == "filter_cohort") {
filter_cohort.clear();
parse_cohort_file(param_value);
}
else if (param_name == "simtimewindow")
simTimeWindow = stoi(param_value) > 0;
else if (param_name == "censor_time_factor")
censor_time_factor = stof(param_value);
else if (param_name == "is_binary_outcome")
is_binary_outcome = stoi(param_value) > 0;
else if (param_name == "use_time_control_as_case")
use_time_control_as_case = stoi(param_value) > 0;
else if (param_name == "sort_preds_in_multicategory")
sort_preds_in_multicategory = stoi(param_value) > 0;
else if (param_name == "measurement") {
//parse and create this object tuples need to pass in {}: format measure_name|param_init_string;...
vector<string> tokens;
boost::split(tokens, param_value, boost::is_any_of(";"));
for (size_t i = 0; i < tokens.size(); ++i)
{
vector<string> parts;
boost::split(parts, tokens[i], boost::is_any_of("|"));
if (parts.size() != 2)
MTHROW_AND_ERR("Error MedBootstrap::init - expecting 2 tokens by | delimeter\n");
MeasurmentFunctionType measure_type = measurement_function_name_to_type(parts[0]);
MeasurementFunctions func = NULL;
Measurement_Params *pr_m = NULL;
string init_m_params = parts[1];
switch (measure_type)
{
case MeasurmentFunctionType::calc_npos_nneg:
func = calc_npos_nneg;
break;
case MeasurmentFunctionType::calc_only_auc:
func = calc_only_auc;
break;
case MeasurmentFunctionType::calc_roc_measures_with_inc:
func = calc_roc_measures_with_inc;
pr_m = &roc_Params;
break;
case MeasurmentFunctionType::calc_multi_class:
func = calc_multi_class;
break;
case MeasurmentFunctionType::calc_kandel_tau:
func = calc_kandel_tau;
break;
case MeasurmentFunctionType::calc_harrell_c_statistic:
func = calc_harrell_c_statistic;
break;
case MeasurmentFunctionType::calc_regression:
func = calc_regression;
break;
default:
MTHROW_AND_ERR("Error MedBootstrap::init unsupported measure %d\n", (int)measure_type);
}
measurements_with_params.push_back(pair<MeasurementFunctions, Measurement_Params*>(func, pr_m));
}
}
map< string, float >(* MeasurementFunctions)(Lazy_Iterator *iterator, int thread_num, Measurement_Params *function_params)
Function which recieves Lazy_Iterator and the thread num for iterating the predictions and labels.
Definition bootstrap.h:453
A base class for measurements parameter.
Definition bootstrap.h:173
int loopCnt
the bootstrap count
Definition MedBootstrap.h:61
vector< pair< MeasurementFunctions, Measurement_Params * > > measurements_with_params
not Serializable! the measurements with the params
Definition MedBootstrap.h:70
int sample_per_pid
how many samples to take for each patients. 0 - means no sampling take all sample for patient
Definition MedBootstrap.h:58
Regression_Params regression_params
params for regerssion
Definition MedBootstrap.h:53
map< string, vector< Filter_Param > > filter_cohort
the cohorts definitions. name to parameters range to intersect
Definition MedBootstrap.h:55
bool use_time_control_as_case
if True will use time window condition for controls same as cases.
Definition MedBootstrap.h:63
Multiclass_Params multiclass_params
Controling the multi class parameters: top n...
Definition MedBootstrap.h:54
bool is_binary_outcome
only used for validating bootstrap input
Definition MedBootstrap.h:62
void parse_cohort_file(const string &cohorts_path)
a function which reads cohorts file and stores it in filter_cohort.
Definition MedBootstrap.cpp:53
bool simTimeWindow
Time window simulation (in cohorts with Time-Window filtering) - instead of censoring cases out of ti...
Definition MedBootstrap.h:66
bool sample_patient_label
if true will treat patient+label as the "id" for the sampling
Definition MedBootstrap.h:59
float sample_ratio
the sample ratio of the patients out of all patients in each bootstrap
Definition MedBootstrap.h:57
MeasurmentFunctionType measurement_function_name_to_type(const string &measurement_function_name)
convert measurement function name to type
Definition MedBootstrap.cpp:1094
int sample_seed
if 0 will use random_device
Definition MedBootstrap.h:60
ROC_Params roc_Params
Controling the roc parameters: sensitivity, specificity...
Definition MedBootstrap.h:52
Parameter object for calc_roc_measures functions.
Definition bootstrap.h:294
int init_from_string(string init_string)
Init from string.
Definition SerializableObject.cpp:121
float stof(const std::string &value, size_t *pos=nullptr)
A faster implementation of stof(). See documentation of std::stof() for more information....
Definition strtonum.h:467

[MedBootstrap::init]

[MedBootstrap::init]

Reimplemented from SerializableObject.

◆ measurement_function_name_to_type()

MeasurmentFunctionType MedBootstrap::measurement_function_name_to_type ( const string &  measurement_function_name)

convert measurement function name to type

Returns
MeasurmentFunctionType

◆ parse_cohort_file()

void MedBootstrap::parse_cohort_file ( const string &  cohorts_path)

a function which reads cohorts file and stores it in filter_cohort.

The file format may be in 2 options:

  1. COHORT_NAME[TAB]PARAMETERS_DEF - cohort name is string representing cohort
    name. PARAMETER_DEF is in format: "PARAMETER_NAME:MIN_RANGE,MAX_RANGE;..."
    the format can repeat itself with ";" between each parameter. the cohort
    will consist of intersection between all parameters ranges with "and" condition.
    there is single tab betwwen the name and the defenition.
    Example Line:
    1 year back & age 40-80 Time-Window:0,365;Age:40,80
    will create cohort called "1 year back & age 40-80" and will filter out records
    with (Time-Window>=0 and Time-Window<=365) and (Age>=40 and Age<=80)
  2. MULTI[TAB]PARAMETERS_DEF[TAB]...PARAMETERS_DEF[TAB] - this definition with
    line starting with MULTI keyword will create all the cartesain options for each
    parameter definition with the each parameter definition in the next TABs.
    PARAMETERS_DEF - is same as option 1 format.
    Example Line:
    MULTI Time-Window:0,30;Time-Window:30,180 Age:40,60;Age:60,80;Age:40,80 Gender:1,1;Gender:2,2
    will create 2*3*2=12 cohorts for each Time-Window, Age, and Gender option

◆ parse_cohort_line()

void MedBootstrap::parse_cohort_line ( const string &  line)

parsing specific line.

please refer to parse_cohort_file for full spec

◆ prepare_bootstrap() [1/2]

void MedBootstrap::prepare_bootstrap ( const MedFeatures features,
vector< float > &  preds,
vector< float > &  y,
vector< int > &  pids,
map< string, vector< float > > &  final_additional_info,
vector< int > &  preds_order,
unordered_map< int, vector< int > > *  splits_inds = NULL 
)

prepares the required vectors for bootstrap from MedFeatures &features

Returns
updates - preds, y, pids, final_additional_info with the information from MedFeatures. if splits_inds is provided (and not NULL) it will fill a mapping from split_index to the indexes in the samples vector correspond to each split value

◆ prepare_bootstrap() [2/2]

void MedBootstrap::prepare_bootstrap ( MedSamples samples,
map< string, vector< float > > &  additional_info,
vector< float > &  preds,
vector< float > &  y,
vector< int > &  pids,
vector< int > &  preds_order,
unordered_map< int, vector< int > > *  splits_inds = NULL 
)

prepares the required vectors for bootstrap from samples, additional_info

Returns
updates - preds, y, pids, final_additional_info with the information from samples, additional_info. if splits_inds is provided (and not NULL) it will fill a mapping from split_index to the indexes in the samples vector correspond to each split value

Field Documentation

◆ measurement_function_name_map

unordered_map< string, MeasurmentFunctionType > MedBootstrap::measurement_function_name_map
static
Initial value:
= {
{ "calc_npos_nneg",MeasurmentFunctionType::calc_npos_nneg },
{ "calc_only_auc",MeasurmentFunctionType::calc_only_auc },
{ "calc_roc_measures_with_inc",MeasurmentFunctionType::calc_roc_measures_with_inc },
{ "calc_multi_class",MeasurmentFunctionType::calc_multi_class },
{ "calc_kandel_tau", MeasurmentFunctionType::calc_kandel_tau },
{ "calc_harrell_c_statistic", MeasurmentFunctionType::calc_harrell_c_statistic },
{ "calc_regression", MeasurmentFunctionType::calc_regression }
}

The documentation for this class was generated from the following files: