Medial Code Documentation
Loading...
Searching...
No Matches
Data Structures | Macros | Enumerations | Functions
FeatureGenerator.h File Reference

FeatureGenerator : creating features from raw signals. More...

#include <InfraMed/InfraMed/InfraMed.h>
#include <Logger/Logger/Logger.h>
#include <MedProcessTools/MedProcessTools/RepProcess.h>
#include <MedProcessTools/MedProcessTools/MedFeatures.h>
#include <SerializableObject/SerializableObject/SerializableObject.h>
#include <MedProcessTools/MedProcessTools/MedModelExceptions.h>
#include <MedTime/MedTime/MedTime.h>
#include <MedAlgo/MedAlgo/MedAlgo.h>
#include <MedAlgo/MedAlgo/MedLM.h>
#include <cfloat>
#include <boost/regex.hpp>

Go to the source code of this file.

Data Structures

class  FeatureGenerator
 
class  BasicFeatGenerator
 A Basic Stats Generator for calcing simple statics on time window. More...
 
class  AgeGenerator
 Age Generator. More...
 
class  SingletonGenerator
 Singleton. More...
 
class  GenderGenerator
 Gender. More...
 
struct  BinnedLmEstimatesParams
 BinnedLinearModels : parameters. More...
 
class  BinnedLmEstimates
 BinnedLinearModels : Apply a set of liner models to generate features. More...
 
class  RangeFeatGenerator
 RangeFeatGenerator : Generate features for a time range with value signal (for example drug) More...
 
class  ModelFeatGenerator
 Use a model to generate predictions to be used as features. More...
 
class  TimeFeatGenerator
 
class  AttrFeatGenerator
 Attribute Feature Generator: creating features from samples attributes. More...
 
class  CategoryDependencyGenerator
 Creates multipal features based on categorical values and statistical dependency strength by Age,Gender groups. More...
 

Macros

#define DEFAULT_FEAT_GNRTR_NTHREADS   8
 

Enumerations

enum  FeatureGeneratorTypes {
  FTR_GEN_NOT_SET , FTR_GEN_BASIC , FTR_GEN_AGE , FTR_GEN_SINGLETON ,
  FTR_GEN_GENDER , FTR_GEN_BINNED_LM , FTR_GEN_SMOKING , FTR_GEN_KP_SMOKING ,
  FTR_GEN_UNIFIED_SMOKING , FTR_GEN_RANGE , FTR_GEN_DRG_INTAKE , FTR_GEN_ALCOHOL ,
  FTR_GEN_MODEL , FTR_GEN_TIME , FTR_GEN_ATTR , FTR_GEN_CATEGORY_DEPEND ,
  FTR_GEN_EMBEDDING , FTR_GEN_EXTRACT_TBL , FTR_GEN_ELIXHAUSER , FTR_GEN_DIABETES_FINDER ,
  FTR_GEN_LAST
}
 
enum  BasicFeatureTypes {
  FTR_LAST_VALUE = 0 , FTR_FIRST_VALUE = 1 , FTR_LAST2_VALUE = 2 , FTR_AVG_VALUE = 3 ,
  FTR_MAX_VALUE = 4 , FTR_MIN_VALUE = 5 , FTR_STD_VALUE = 6 , FTR_LAST_DELTA_VALUE = 7 ,
  FTR_LAST_DAYS = 8 , FTR_LAST2_DAYS = 9 , FTR_SLOPE_VALUE = 10 , FTR_WIN_DELTA_VALUE = 11 ,
  FTR_CATEGORY_SET = 12 , FTR_CATEGORY_SET_COUNT = 13 , FTR_CATEGORY_SET_SUM = 14 , FTR_NSAMPLES = 15 ,
  FTR_EXISTS = 16 , FTR_CATEGORY_SET_FIRST = 17 , FTR_MAX_DIFF = 18 , FTR_FIRST_DAYS = 19 ,
  FTR_RANGE_WIDTH = 20 , FTR_CATEGORY_SET_FIRST_TIME = 21 , FTR_SUM_VALUE =22 , FTR_LAST_NTH_VALUE = 23 ,
  FTR_CATEGORY_SET_LAST_NTH = 24 , FTR_TIME_SINCE_LAST_CHANGE = 25 , FTR_LAST
}
 
enum  TimeRangeTypes { TIME_RANGE_CURRENT = 0 , TIME_RANGE_BEFORE = 1 , TIME_RANGE_LAST }
 
enum  BinnedLMSamplingStrategy { BINNED_LM_TAKE_ALL = 0 , BINNED_LM_STOP_AT_FIRST = 1 , BINNED_LM_STOP_AT_LAST = 2 , BINNED_LM_LAST }
 BinnedLinearModels : which time-points to take.
 
enum  RangeFeatureTypes {
  FTR_RANGE_CURRENT = 0 , FTR_RANGE_LATEST = 1 , FTR_RANGE_MAX = 2 , FTR_RANGE_MIN = 3 ,
  FTR_RANGE_EVER = 4 , FTR_RANGE_TIME_DIFF = 5 , FTR_RANGE_RECURRENCE_COUNT = 6 , FTR_RANGE_TIME_COVERED = 7 ,
  FTR_RANGE_LAST_NTH_TIME_LENGTH = 8 , FTR_RANGE_TIME_DIFF_START = 9 , FTR_RANGE_TIME_INSIDE = 10 , FTR_RANGE_LAST
}
 
enum  TimeFeatTypes {
  FTR_TIME_YEAR = 0 , FTR_TIME_MONTH = 1 , FTR_TIME_DAY_IN_MONTH = 2 , FTR_TIME_DAY_IN_WEEK = 3 ,
  FTR_TIME_HOUR = 4 , FTR_TIME_MINUTE = 5 , FTR_TIME_DATE = 6 , FTR_TIME_LAST
}
 Time Feature Generator: creating sample-time features (e.g. More...
 
enum class  category_stat_test { chi_square = 1 , mcnemar = 2 }
 

Functions

FeatureGeneratorTypes ftr_generator_name_to_type (const string &generator_name)
 
void get_window_in_sig_time (int _win_from, int _win_to, int _time_unit_win, int _time_unit_sig, int _win_time, int &_min_time, int &_max_time, bool boundOutcomeTime=false, int outcome_time=-1)
 gets a [-_win_to, -_win_from] window in win time unit, and returns [_min_time, _max_time] window in signal time units relative to _win_time boundOutcomeTime is used to future time windows when looking to the future to limit the time window till the outcomeTime
 
TimeRangeTypes time_range_name_to_type (const string &name)
 Conversion between time-range type and name.
 
string time_range_type_to_name (TimeRangeTypes type)
 
void get_updated_time_window (UniversalSigVec &time_range_usv, TimeRangeTypes type, int time_unit_range_sig, int time_unit_win, int time_unit_sig, int time, int win_from, int &updated_win_from, int win_to, int &updated_win_to, bool delta_win, int d_win_from, int &updated_d_win_from, int d_win_to, int &updated_d_win_to)
 
void get_updated_time_window (TimeRangeTypes type, int range_from, int range_to, int time, int _win_from, int _win_to, int &updated_win_from, int &updated_win_to)
 

Detailed Description

FeatureGenerator : creating features from raw signals.

Enumeration Type Documentation

◆ BasicFeatureTypes

Enumerator
FTR_LAST_VALUE 

"last" - Last Value in Window

FTR_FIRST_VALUE 

"first" - First Value in Window

FTR_LAST2_VALUE 

"last2" - One before last value in Window

FTR_AVG_VALUE 

"avg" - Mean value in Window

FTR_MAX_VALUE 

"max" - Max value in Window

FTR_MIN_VALUE 

"min" - Min value in Window

FTR_STD_VALUE 

"std" - Standart Dev. value in Window

FTR_LAST_DELTA_VALUE 

"last_delta" - Last delta. last-previous_last value

FTR_LAST_DAYS 

"last_time" - time diffrence from prediction time to last time has signal in range of values

FTR_LAST2_DAYS 

"last2_time" - time diffrence from prediction time to one previous last time has signal in range of values

FTR_SLOPE_VALUE 

"slope" - calculating the slope over the points in the window

FTR_WIN_DELTA_VALUE 

"win_delta" - diffrence in value in two time windows (only if both exists, otherwise missing_value). value in [win_from,win_to] minus value in [d_win_from, d_win_to]

FTR_CATEGORY_SET 

"category_set" - boolean 0/1 if the signal has the value in the given lut (which initialized by the "sets" that can be specific single definition or name of set definition. the lookup is hierarchical)

FTR_CATEGORY_SET_COUNT 

"category_set_count" - counts the number of appearnces of sets in the time window

FTR_CATEGORY_SET_SUM 

"category_set_sum" - sums the values of appearnces of sets in the time window

FTR_NSAMPLES 

"nsamples" - counts the number of times the signal apear in the time window

FTR_EXISTS 

"exists" - boolean 0/1 if the signal apears in the time window

FTR_CATEGORY_SET_FIRST 

"category_set_first" - boolean 0/1 if the signal apears in the time window and did not appear ever before the window

FTR_MAX_DIFF 

"max_diff" maximum diff in window

FTR_FIRST_DAYS 

"first_time" time diffrence from prediction time to first time with signal

FTR_RANGE_WIDTH 

"range_width" maximal value - minimal value in a given window time frame

FTR_CATEGORY_SET_FIRST_TIME 

"category_set_first_time" - first time of category set found in the time window

FTR_SUM_VALUE 

"sum" - sum of values in window

FTR_LAST_NTH_VALUE 

"last_nth" : (set also N_th parameter to use), get the last N_th in window, 0 is last, 1 is last2, etc.

FTR_CATEGORY_SET_LAST_NTH 

"category_set_last_nth" : (set also N_th parameter to use), check is the last N_th in window is in the given set

FTR_TIME_SINCE_LAST_CHANGE 

"time_since_last_change" : go over states signal, take last time since the value changed

◆ FeatureGeneratorTypes

Enumerator
FTR_GEN_BASIC 

"basic" - creates basic statistic on time windows - BasicFeatGenerator

FTR_GEN_AGE 

"age" - creating age feature - AgeGenerator

FTR_GEN_SINGLETON 

"singleton" - take the value of a time-less signale - SingletonGenerator

FTR_GEN_GENDER 

"gender" - creating gender feature - GenderGenerator (special case of signleton)

FTR_GEN_BINNED_LM 

"binnedLm" or "binnedLM" - creating linear model for esitmating feature in time points - BinnedLmEstimates

FTR_GEN_SMOKING 

"smoking" - creating smoking feature - SmokingGenerator

FTR_GEN_KP_SMOKING 

"kp_smoking" - creating smoking feature - KpSmokingGenerator

FTR_GEN_UNIFIED_SMOKING 

"unified_smoking" - creating smoking feature - UnifiedSmokingGenerator

FTR_GEN_RANGE 

"range" - creating RangeFeatGenerator

FTR_GEN_DRG_INTAKE 

"drugIntake" - creating drugs feature coverage of prescription time - DrugIntakeGenerator

FTR_GEN_ALCOHOL 

"alcohol" - creating alcohol feature - AlcoholGenerator

FTR_GEN_MODEL 

"model" - creating ModelFeatGenerator

FTR_GEN_TIME 

"time" - creating sample-time features (e.g. differentiate between times of day, season of year, days of the week, etc.). Creates TimeFeatGenerator

FTR_GEN_ATTR 

"attr" - creating features from samples attributes. Creates AttrFeatGenerator

FTR_GEN_CATEGORY_DEPEND 

"category_depend" - creates features from categorical signal that have statistical strength in samples - CategoryDependencyGenerator

FTR_GEN_EMBEDDING 

"embedding" - allows applying a pre trained embedding model to incorporate features into matrix. Creates EmbeddingGenerator

FTR_GEN_EXTRACT_TBL 

"extract_tbl" - extract values from table with keys and rules to join with each patient. Creates FeatureGenExtractTable

FTR_GEN_ELIXHAUSER 

Calculate Current Elixhauser given latest DRG and Diagnosis information. Creates ElixhauserGenerator.

FTR_GEN_DIABETES_FINDER 

"diabetes_finder" - Diabetes Finder feature. Creates DiabetesFinderGenerator

◆ RangeFeatureTypes

Enumerator
FTR_RANGE_CURRENT 

"current" - finds the value of the time range signal that intersect with win_from. signal start_time is before this time and signal end_time is after this time point

FTR_RANGE_LATEST 

"latest" - finds the last value of the time range signal, that there is intersection of time signal range with the defined time window

FTR_RANGE_MAX 

"max" - finds the maximal value of the time range signal, that there is intersection of time signal range with the defined time window

FTR_RANGE_MIN 

"min" - finds the minimal value of the time range signal, that there is intersection of time signal range with the defined time window

FTR_RANGE_EVER 

"ever" - boolean 0/1 - finds if there is intersection between signal time window and the defined time window with specific lut value.

uses set.

FTR_RANGE_TIME_DIFF 

"time_diff" - returns time diffrences between first intersection(if check_first is True) between signal time window and the defined time window with specific lut value.

uses set. if check_first is false returns the time diffrences between last intersection between signal time window and the defined time window. prediction time minus the last intersecting signal end time window. if the last intersction if time ranges has no match to sets value and check_first is false will return -win_to value, otherwise missing value "recurrence_count" - count the number of time the event occur shortly after a previous event, there is an intersection of the time signal range with the defined time window previous event does not need to intersect the time window.

FTR_RANGE_TIME_COVERED 

"time_covered" : give a time window, sum up all the times in ranges that intersect the time window

FTR_RANGE_LAST_NTH_TIME_LENGTH 

"last_nth_time_len" : gives the length (in win_time_unit) of the last_n range in the window. If in middle of range, cuts to current time

FTR_RANGE_TIME_INSIDE 

< "time_inside" : checks if the prediction time point is currently INSIDE a range, if not returns 0, if it is , then how long since the start.

◆ TimeFeatTypes

Time Feature Generator: creating sample-time features (e.g.

differentiate between times of day, season of year, days of the week, etc.)

Enumerator
FTR_TIME_YEAR 

Year (as is)

FTR_TIME_MONTH 

Month of year (0-11)

FTR_TIME_DAY_IN_MONTH 

Day of the month (0-30)

FTR_TIME_DAY_IN_WEEK 

Day of the week (0-6)

FTR_TIME_HOUR 

Hour of the day (0-23)

FTR_TIME_MINUTE 

Minute of the hout (0-59)

FTR_TIME_DATE 

Completete date (as is)

◆ TimeRangeTypes

Enumerator
TIME_RANGE_CURRENT 

"current" - consider only the current time-range

TIME_RANGE_BEFORE 

"before" - consider anything before the current time-range