How to Write a RepProcessor
RepProcessors in MedModel follow a defined lifecycle. Below is the typical sequence of method calls and their purpose:
-
Constructor - Initializes the RepProcessor object.
-
init_defaults() - Sets default values for the processor.
-
Initialization - During learning:
init(map<string, string>& mapper)parses arguments from a key-value map. - During application:
Arguments were loaded from disk (based onADD_SERIALIZATION_FUNCS). -
fit_for_repository(MedPidRepository) - Adapts the processor to the repository (e.g., creates virtual signals if needed). - During learning:
get_required_signal_names()
Identifies which signals are needed for processing.filter()Determines if the processor should be applied, based on whether it affects any required signals. The list of affected signals is stored in theaff_signalsvariable. You can override thefilterlogic if needed.
-
Virtual Signal Management -
add_virtual_signals()
Lists virtual signals to generate and their types. -register_virtual_section_name_id()
Registers categorical virtual signals in the dictionary. -
Signal ID Setup -
set_affected_signal_ids(MedDictionarySections)
Defines output signal IDs. -set_required_signal_ids(MedDictionarySections)
Defines input signal IDs. -set_signal_ids(MedSignals)
Sets input/output signal settings (often overlaps with above). -
Final Initialization -
init_tables(MedDictionarySections, MedSignals)
Finalizes processor settings using repository data. -
Attribute Initialization -
init_attributes()
Sets up additional processor attributes in MedSamples. For example store fields to document outlier cleaning -
Signal Requirement
get_required_signal_names()
(May be called again) Ensures all required signals are fetched.
-
Application
conditional_apply(PidDynamicRec, MedIdSamples)
Applies processor logic to patient data in memory. UsesPidDynamicRecwhich is editable in-memory repository for a single patient. It also protects us from changing data for other patients.
-
Summary
make_summary()
Generates a summary after processing (e.g., outlier percentages).
Useful for parallel execution and feature generation.
Steps to Implement a RepProcessor:
- Create a new
.hfile for your class and a corresponding.cppfile that includes the header. In the header, include"RepProcess.h". - Set up default values in
init_defaults()or the constructor. For example, setprocessor_typeusingRepProcessorTypes(optional). - Override
init(map<string, string>& mapper)to parse external parameters. - Set up serialization:
- Add
MEDSERIALIZE_SUPPORT($CLASS_NAME)at the end of the.hfile (replace$CLASS_NAME). - AddADD_CLASS_NAME($CLASS_NAME)in the public section of your class. - UseADD_SERIALIZATION_FUNCSto specify only the parameters that need to be stored on disk after learning. Do not include temporary or repository-specific variables. - Configure key variables for pipeline integration:
- Assign
virtual_signals_genericafterinitif your processor creates virtual signals. - Setreq_signalsto define required/input signals. This helps manage dependencies and ensures prerequisite processors run first. You can set this ininit_tablesor afterinit. - Setaff_signalsto specify output/affected signals, aiding pipeline dependency tracking. This can also be set ininit_tablesor afterinit. - Override necessary functions as needed:
-
register_virtual_section_name_id(for virtual categorical signals) -init_tables(for initializing temporary variables using the repository - both in learn\apply) -set_required_signal_ids,set_affected_signal_ids(for custom signal ID logic; usually, usingaff_signalsandreq_signalsis sufficient) -fit_for_repository(for repository-specific adjustments, e.g., virtual signal checks) (optional). -_learn(override only if learning logic is needed; default is empty) -_apply(main logic for applying the processor) -print(optional for debugging) - Any other required virtual functions - Register your new RepProcessor in
RepProcess.h: - Add a new type toRepProcessorTypesbeforeREP_PROCESS_LAST. In the documentation comment, specify the name inrep_processor_name_to_typefor Doxygen reference. - Register your new RepProcessor in
RepProcess.cpp: - Add your class torep_processor_name_to_type- Add your class toRepProcessor::new_polymorphic- Add your class toRepProcessor::make_processor(RepProcessorTypes processor_type)