How to Write a RepProcessor
RepProcessors in MedModel follow a defined lifecycle. Below is the typical sequence of method calls and their purpose:
-
Constructor - Initializes the RepProcessor object.
-
init_defaults() - Sets default values for the processor.
-
Initialization - During learning:
init(map<string, string>& mapper)
parses arguments from a key-value map. - During application:
Arguments were loaded from disk (based onADD_SERIALIZATION_FUNCS
). -
fit_for_repository(MedPidRepository) - Adapts the processor to the repository (e.g., creates virtual signals if needed). - During learning:
get_required_signal_names()
Identifies which signals are needed for processing.filter()
Determines if the processor should be applied, based on whether it affects any required signals. The list of affected signals is stored in theaff_signals
variable. You can override thefilter
logic if needed.
-
Virtual Signal Management -
add_virtual_signals()
Lists virtual signals to generate and their types. -register_virtual_section_name_id()
Registers categorical virtual signals in the dictionary. -
Signal ID Setup -
set_affected_signal_ids(MedDictionarySections)
Defines output signal IDs. -set_required_signal_ids(MedDictionarySections)
Defines input signal IDs. -set_signal_ids(MedSignals)
Sets input/output signal settings (often overlaps with above). -
Final Initialization -
init_tables(MedDictionarySections, MedSignals)
Finalizes processor settings using repository data. -
Attribute Initialization -
init_attributes()
Sets up additional processor attributes in MedSamples. For example store fields to document outlier cleaning -
Signal Requirement
get_required_signal_names()
(May be called again) Ensures all required signals are fetched.
-
Application
conditional_apply(PidDynamicRec, MedIdSamples)
Applies processor logic to patient data in memory. UsesPidDynamicRec
which is editable in-memory repository for a single patient. It also protects us from changing data for other patients.
-
Summary
make_summary()
Generates a summary after processing (e.g., outlier percentages).
Useful for parallel execution and feature generation.
Steps to Implement a RepProcessor:
- Create a new
.h
file for your class and a corresponding.cpp
file that includes the header. In the header, include"RepProcess.h"
. - Set up default values in
init_defaults()
or the constructor. For example, setprocessor_type
usingRepProcessorTypes
(optional). - Override
init(map<string, string>& mapper)
to parse external parameters. - Set up serialization:
- Add
MEDSERIALIZE_SUPPORT($CLASS_NAME)
at the end of the.h
file (replace$CLASS_NAME
). - AddADD_CLASS_NAME($CLASS_NAME)
in the public section of your class. - UseADD_SERIALIZATION_FUNCS
to specify only the parameters that need to be stored on disk after learning. Do not include temporary or repository-specific variables. - Configure key variables for pipeline integration:
- Assign
virtual_signals_generic
afterinit
if your processor creates virtual signals. - Setreq_signals
to define required/input signals. This helps manage dependencies and ensures prerequisite processors run first. You can set this ininit_tables
or afterinit
. - Setaff_signals
to specify output/affected signals, aiding pipeline dependency tracking. This can also be set ininit_tables
or afterinit
. - Override necessary functions as needed:
-
register_virtual_section_name_id
(for virtual categorical signals) -init_tables
(for initializing temporary variables using the repository - both in learn\apply) -set_required_signal_ids
,set_affected_signal_ids
(for custom signal ID logic; usually, usingaff_signals
andreq_signals
is sufficient) -fit_for_repository
(for repository-specific adjustments, e.g., virtual signal checks) (optional). -_learn
(override only if learning logic is needed; default is empty) -_apply
(main logic for applying the processor) -print
(optional for debugging) - Any other required virtual functions - Register your new RepProcessor in
RepProcess.h
: - Add a new type toRepProcessorTypes
beforeREP_PROCESS_LAST
. In the documentation comment, specify the name inrep_processor_name_to_type
for Doxygen reference. - Register your new RepProcessor in
RepProcess.cpp
: - Add your class torep_processor_name_to_type
- Add your class toRepProcessor::new_polymorphic
- Add your class toRepProcessor::make_processor(RepProcessorTypes processor_type)