How to Write a PostProcessor
A PostProcessor is a component that takes the feature matrix and prediction results, then applies additional post-processing steps. It is executed after the MedPredictor stage in the pipeline.
PostProcessors in MedModel follow a defined sequence of method calls. Hereβs the typical lifecycle:
-
Constructor - Initializes the PostProcessor object.
-
init_defaults() - Sets default values for the processor. Make sure to update
processor_type
to indicate the processor type. -
Initialization - During learning:
Implementinit(map<string, string>& mapper)
to parse parameters from a key-value map (usingSerializableObject::init_from_string
).
If your PostProcessor should operate on a specific subset of training samples, set eitheruse_p
oruse_split
:use_split
: Uses the "split" stored in MedSamples. All splits (based on patient ID) except the selected one are passed to the full model pipeline; the selected split is reserved for training this PostProcessor.use_p
: A value between 0 and 1 that determines the proportion of randomly selected patient IDs passed to the PostProcessor. The remainder is processed by the main MedModel pipeline.
This mechanism also supports multiple PostProcessors, each working on a different subset of the data.- During application:
Arguments are loaded from disk. Parameters stored viaADD_SERIALIZATION_FUNCS
are restored automatically.
-
Pipeline Integration -
init_post_processor()
:
Initializes the PostProcessor using the complete MedModel pipeline, allowing for any necessary adaptations before execution. -
Learning Phase -
Learn()
:
Implements any learning logic required during training. -
Application Phase -
Apply()
:
Applies the post-processing logic to the data.
Steps to Implement a PostProcessor
-
Create Class Files - Create a new
.h
header and.cpp
source file for your PostProcessor class. IncludePostProcessor.h
in your header. -
Set Default Values - Implement
init_defaults()
or set defaults in the constructor. -
Parameter Initialization - Override
init(map<string, string>& mapper)
to parse external parameters. -
Serialization - Add
MEDSERIALIZE_SUPPORT($CLASS_NAME)
at the end of your header file (replace$CLASS_NAME
). - AddADD_CLASS_NAME($CLASS_NAME)
in the public section of your class. - UseADD_SERIALIZATION_FUNCS
to specify which parameters should be saved after learning. Exclude temporary or repository-specific variables. -
Pipeline Adaptation (if needed) - Implement
init_post_processor()
if your PostProcessor needs to adapt based on the full MedModel pipeline. -
Define Dependencies and Outputs - Implement
get_input_fields()
andget_output_fields()
to specify the inputs and outputs of your PostProcessor.- For features, prefix the name with
"feature:"
. - For predictions, use
"prediction:X"
(where X is the prediction index, usually 0). - For other sample effects, use
"attr:"
,"str_attr:"
, or"json:"
as appropriate.
The MedModel pipeline uses this information to determine if the PostProcessor is required.
- For features, prefix the name with
-
Learning - Implement
Learn()
for any required training logic. -
Apply - Implement
Apply()
to perform the post-processing. -
Register Your PostProcessor in the Header (
PostProcessor.h
) - Add a new type toPostProcessorTypes
beforeFTR_POSTPROCESS_LAST
. In the documentation comment, specify the name inPostProcessorTypes
for Doxygen reference. -
Register Your PostProcessor in the Source (
PostProcessor.cpp
)- Add your type conversion to
post_processor_name_to_type
- Add your class to
PostProcessor::new_polymorphic
- Add your class to
PostProcessor::make_processor(MedPredictorTypes model_type)
- Add your type conversion to
Tip:
Follow the structure and conventions of existing PostProcessors for consistency and easier integration into the MedModel framework.