How to Write a Feature Generator
A Feature Generator is a processing unit that takes raw input signals directly from a data repository or EMR. Its process has two main stages:
- It runs all relevant rep processors to pre-process the input signals. This prepares the data before it can be used to generate new features. This is being called by the infrastructure.
- It calls the generate function, which receives this pre-processed, patient-specific data and produces the final output.
Feature Generators in MedModel follow a specific sequence of method calls. Here’s the typical lifecycle:
-
Constructor - Initializes the Feature Generator object.
-
init_defaults() - Sets default values for the generator. please update
generator_type
to hold genertor type -
Initialization - During learning:
init(map<string, string>& mapper)
parses parameters from a key-value map (usingSerializableObject::init_from_string
). Please make sure to updatereq_signals
as required input signals for the feature generator please settags
variable - During application:
Arguments are loaded from disk. Parameters stored viaADD_SERIALIZATION_FUNCS
are restored automatically. -
fit_for_repository(MedPidRepository) - Adapts the generator to the repository, e.g., modifies logic if certain signals are missing.
-
Signal Requirements and Setup -
get_required_signal_ids()
Returns the list of required signal IDs for learning or applying the generator. -set_required_signal_ids(MedDictionarySections)
Stores required signal IDs using dictionary sections. -set_signal_ids(MedSignals)
Stores required signal IDs using signal objects. -init_tables(MedDictionarySections)
Initializes tables and stores needed signal IDs using dictionary sections. -set_names
- stores the output names of the feature generator - please override. -
Feature Filtering -
filter_features()
Determines if this generator is needed (e.g., after feature selection). Returnstrue
if the generator should be kept. Uses by defaultnames
variable set byset_names
to check if the feature generator is needed and if one of his output names is needed in the pipeline. -
Signal Names -
get_required_signal_names()
Returns all signal names needed to run this generator. -
Learning Phase -
learn()
Performs learning logic (called only during training). -
Preparation -
prepare()
Prepares features, attributes, and allocates space. -
Output Initialization
get_p_data()
Initializes the address for the generator’s output (useful for parallelism).
-
Feature Generation
generate()
Generates the feature for each sample. The infrastructure already execuated all relavent rep processors for the desired input signals the feature generator is using.
-
Summary
make_summary()
Summarizes results after generation (e.g., collects statistics across all data).
Steps to Implement a Feature Generator
-
Create Class Files - Make a new
.h
header and.cpp
source file for your feature generator class. Include"FeatureGenerator.h"
in your header. -
Set Default Values - Implement
init_defaults()
or set defaults in the constructor. -
Parameter Initialization - Override
init(map<string, string>& mapper)
to parse external parameters. -
Serialization - Add
MEDSERIALIZE_SUPPORT($CLASS_NAME)
at the end of your header file (replace$CLASS_NAME
). - AddADD_CLASS_NAME($CLASS_NAME)
in the public section of your class. - UseADD_SERIALIZATION_FUNCS
to specify which parameters should be saved after learning. Exclude temporary or repository-specific variables. -
Signal and Table Setup - Implement or override (if needed):
set_names
Update feature generator output featuresget_required_signal_ids()
andget_required_signal_names()
- only if needed. The deafult is to usereq_signals
set_required_signal_ids(MedDictionarySections)
- only if needed. The deafult is to usereq_signals
set_signal_ids(MedSignals)
- only if needed to do more setup.init_tables(MedDictionarySections)
get_required_signal_categories
- if the feature generator uses categorical signals - this will need to list all "required" categorical values the feature generator is using
-
Feature Filtering - Overide (if needed)
filter_features()
if your generator should be skipped under certain conditions (e.g., after feature selection). The default is to usenames
to identify if the feature generator is needed. -
Learning and Preparation - Implement
learn()
for training logic (if needed). - Implementprepare()
to allocate resources and set up attributes. -
Feature Generation - Implement
generate()
to produce the feature for each sample. - Implementget_p_data()
if your generator supports parallel output. -
Summary - Implement
make_summary()
to collect and report statistics after feature generation. -
Register Your Feature Generator in header file in
FeatureGenerator.h
- register a new type inFeatureGeneratorTypes
beforeFTR_GEN_LAST
In the documentation comment, specify the name inFeatureGeneratorTypes
for Doxygen reference. -
Register Your Feature Generator in cpp file
FeatureGenerator.cpp
- Add your type conversion toftr_generator_name_to_type
- Add your class toFeatureGenerator::new_polymorphic
- Add your class toFeatureGenerator::make_processor(FeatureGeneratorTypes generator_type)
Tip:
Follow the structure and naming conventions used in existing feature generators for consistency and easier maintenance.