How to Write a Feature Generator
A Feature Generator is a processing unit that takes raw input signals directly from a data repository or EMR. Its process has two main stages:
- It runs all relevant rep processors to pre-process the input signals. This prepares the data before it can be used to generate new features. This is being called by the infrastructure.
- It calls the generate function, which receives this pre-processed, patient-specific data and produces the final output.
Feature Generators in MedModel follow a specific sequence of method calls. Here’s the typical lifecycle:
-
Constructor - Initializes the Feature Generator object.
-
init_defaults() - Sets default values for the generator. please update
generator_typeto hold genertor type -
Initialization - During learning:
init(map<string, string>& mapper)parses parameters from a key-value map (usingSerializableObject::init_from_string). Please make sure to updatereq_signalsas required input signals for the feature generator please settagsvariable - During application:
Arguments are loaded from disk. Parameters stored viaADD_SERIALIZATION_FUNCSare restored automatically. -
fit_for_repository(MedPidRepository) - Adapts the generator to the repository, e.g., modifies logic if certain signals are missing.
-
Signal Requirements and Setup -
get_required_signal_ids()
Returns the list of required signal IDs for learning or applying the generator. -set_required_signal_ids(MedDictionarySections)
Stores required signal IDs using dictionary sections. -set_signal_ids(MedSignals)
Stores required signal IDs using signal objects. -init_tables(MedDictionarySections)
Initializes tables and stores needed signal IDs using dictionary sections. -set_names- stores the output names of the feature generator - please override. -
Feature Filtering -
filter_features()
Determines if this generator is needed (e.g., after feature selection). Returnstrueif the generator should be kept. Uses by defaultnamesvariable set byset_namesto check if the feature generator is needed and if one of his output names is needed in the pipeline. -
Signal Names -
get_required_signal_names()
Returns all signal names needed to run this generator. -
Learning Phase -
learn()
Performs learning logic (called only during training). -
Preparation -
prepare()
Prepares features, attributes, and allocates space. -
Output Initialization
get_p_data()
Initializes the address for the generator’s output (useful for parallelism).
-
Feature Generation
generate()
Generates the feature for each sample. The infrastructure already execuated all relavent rep processors for the desired input signals the feature generator is using.
-
Summary
make_summary()
Summarizes results after generation (e.g., collects statistics across all data).
Steps to Implement a Feature Generator
-
Create Class Files - Make a new
.hheader and.cppsource file for your feature generator class. Include"FeatureGenerator.h"in your header. -
Set Default Values - Implement
init_defaults()or set defaults in the constructor. -
Parameter Initialization - Override
init(map<string, string>& mapper)to parse external parameters. -
Serialization - Add
MEDSERIALIZE_SUPPORT($CLASS_NAME)at the end of your header file (replace$CLASS_NAME). - AddADD_CLASS_NAME($CLASS_NAME)in the public section of your class. - UseADD_SERIALIZATION_FUNCSto specify which parameters should be saved after learning. Exclude temporary or repository-specific variables. -
Signal and Table Setup - Implement or override (if needed):
set_namesUpdate feature generator output featuresget_required_signal_ids()andget_required_signal_names()- only if needed. The deafult is to usereq_signalsset_required_signal_ids(MedDictionarySections)- only if needed. The deafult is to usereq_signalsset_signal_ids(MedSignals)- only if needed to do more setup.init_tables(MedDictionarySections)get_required_signal_categories- if the feature generator uses categorical signals - this will need to list all "required" categorical values the feature generator is using
-
Feature Filtering - Overide (if needed)
filter_features()if your generator should be skipped under certain conditions (e.g., after feature selection). The default is to usenamesto identify if the feature generator is needed. -
Learning and Preparation - Implement
learn()for training logic (if needed). - Implementprepare()to allocate resources and set up attributes. -
Feature Generation - Implement
generate()to produce the feature for each sample. - Implementget_p_data()if your generator supports parallel output. -
Summary - Implement
make_summary()to collect and report statistics after feature generation. -
Register Your Feature Generator in header file in
FeatureGenerator.h- register a new type inFeatureGeneratorTypesbeforeFTR_GEN_LASTIn the documentation comment, specify the name inFeatureGeneratorTypesfor Doxygen reference. -
Register Your Feature Generator in cpp file
FeatureGenerator.cpp- Add your type conversion toftr_generator_name_to_type- Add your class toFeatureGenerator::new_polymorphic- Add your class toFeatureGenerator::make_processor(FeatureGeneratorTypes generator_type)
Tip:
Follow the structure and naming conventions used in existing feature generators for consistency and easier maintenance.