How to Write a Feature Processor
Feature Processors are components that operate on the feature matrix produced by the Feature Generator. They take a matrix of features as input, process it (e.g., normalization, feature selection, PCA), and output a transformed feature matrix.
Feature Processors in MedModel follow a defined sequence of method calls. Hereβs the typical lifecycle:
-
Constructor - Initializes the Feature Processor object.
-
init_defaults() - Sets default values for the processor. Be sure to update
processor_type
to reflect the processor type. -
Initialization - During learning:
Implementinit(map<string, string>& mapper)
to parse parameters from a key-value map (usingSerializableObject::init_from_string
).
If your processor affects a single feature, you may want to usefeature_name
to specify the output feature. - During application:
Arguments are loaded from disk. Parameters stored viaADD_SERIALIZATION_FUNCS
are restored automatically. -
Repository Setup - Calls
set_feature_name
to configure the processor using repository information. -
Feature Filtering - Methods like
update_req_features_vec
,are_features_affected
, andfilter
determine if this Feature Processor is needed for prediction.
If the processor does not affect any required features, it will be skipped.
By default,filter
usesfeature_name
to check if the processor is necessary. -
select_learn_matrix - Usually not required. In special cases, you may want to create a copy of the original feature matrix and store it under a different name for use by other processors in the pipeline.
-
Learning Phase -
learn()
Implements any learning logic needed during training. -
Feature Processing -
apply()
Applies the processor logic to the feature matrix.
Steps to Implement a Feature Processor
-
Create Class Files - Create a new
.h
header and.cpp
source file for your feature processor class. Include"FeatureProcess.h"
in your header. -
Set Default Values - Implement
init_defaults()
or set defaults in the constructor. -
Parameter Initialization - Override
init(map<string, string>& mapper)
to parse external parameters. -
Serialization - Add
MEDSERIALIZE_SUPPORT($CLASS_NAME)
at the end of your header file (replace$CLASS_NAME
). - AddADD_CLASS_NAME($CLASS_NAME)
in the public section of your class. - UseADD_SERIALIZATION_FUNCS
to specify which parameters should be saved after learning. Do not include temporary or repository-specific variables. -
Custom Setup (if needed) - Implement or override:
filter
(update logic if your processor affects a specific set of features)
-
Learning - Implement
learn()
for any required training logic. -
Apply - Implement
apply()
to process the features. -
Register Your Feature Processor in the Header (
FeatureProcess.h
) - Add a new type toFeatureProcessorTypes
beforeFTR_PROCESS_LAST
. In the documentation comment, specify the name inFeatureProcessorTypes
for Doxygen reference. -
Register Your Feature Processor in the Source (
FeatureProcess.cpp
) - Add your type conversion tofeature_processor_name_to_type
- Add your class toFeatureProcessor::new_polymorphic
- Add your class toFeatureProcessor::make_processor(FeatureProcessorTypes processor_type)
Tip:
Follow the structure and naming conventions of existing feature processors for consistency and easier maintenance.