How to Write a Feature Generator

A Feature Generator is a processing unit that takes raw input signals directly from a data repository or EMR. Its process has two main stages:

It runs all relevant rep processors to pre-process the input signals. This prepares the data before it can be used to generate new features. This is being called by the infrastructure.
It calls the generate function, which receives this pre-processed, patient-specific data and produces the final output.

Feature Generators in MedModel follow a specific sequence of method calls. Here’s the typical lifecycle:

Constructor - Initializes the Feature Generator object.
init_defaults() - Sets default values for the generator. please update generator_type to hold genertor type
Initialization - During learning:
init(map<string, string>& mapper) parses parameters from a key-value map (using SerializableObject::init_from_string). Please make sure to update req_signals as required input signals for the feature generator please set tags variable - During application:
Arguments are loaded from disk. Parameters stored via ADD_SERIALIZATION_FUNCS are restored automatically.
fit_for_repository(MedPidRepository) - Adapts the generator to the repository, e.g., modifies logic if certain signals are missing.
Signal Requirements and Setup - get_required_signal_ids()
Returns the list of required signal IDs for learning or applying the generator. - set_required_signal_ids(MedDictionarySections)
Stores required signal IDs using dictionary sections. - set_signal_ids(MedSignals)
Stores required signal IDs using signal objects. - init_tables(MedDictionarySections)
Initializes tables and stores needed signal IDs using dictionary sections. - set_names - stores the output names of the feature generator - please override.
Feature Filtering - filter_features()
Determines if this generator is needed (e.g., after feature selection). Returns true if the generator should be kept. Uses by default names variable set by set_names to check if the feature generator is needed and if one of his output names is needed in the pipeline.
Signal Names - get_required_signal_names()
Returns all signal names needed to run this generator.
Learning Phase - learn()
Performs learning logic (called only during training).
Preparation - prepare()
Prepares features, attributes, and allocates space.
Output Initialization
- get_p_data()
  Initializes the address for the generator’s output (useful for parallelism).
Feature Generation
- generate()
  Generates the feature for each sample. The infrastructure already execuated all relavent rep processors for the desired input signals the feature generator is using.
Summary
- make_summary()
  Summarizes results after generation (e.g., collects statistics across all data).

Steps to Implement a Feature Generator

Create Class Files - Make a new .h header and .cpp source file for your feature generator class. Include "FeatureGenerator.h" in your header.
Set Default Values - Implement init_defaults() or set defaults in the constructor.
Parameter Initialization - Override init(map<string, string>& mapper) to parse external parameters.
Serialization - Add MEDSERIALIZE_SUPPORT($CLASS_NAME) at the end of your header file (replace $CLASS_NAME). - Add ADD_CLASS_NAME($CLASS_NAME) in the public section of your class. - Use ADD_SERIALIZATION_FUNCS to specify which parameters should be saved after learning. Exclude temporary or repository-specific variables.
Signal and Table Setup - Implement or override (if needed):
- set_names Update feature generator output features
- get_required_signal_ids() and get_required_signal_names() - only if needed. The deafult is to use req_signals
- set_required_signal_ids(MedDictionarySections) - only if needed. The deafult is to use req_signals
- set_signal_ids(MedSignals) - only if needed to do more setup.
- init_tables(MedDictionarySections)
- get_required_signal_categories - if the feature generator uses categorical signals - this will need to list all "required" categorical values the feature generator is using
Feature Filtering - Overide (if needed) filter_features() if your generator should be skipped under certain conditions (e.g., after feature selection). The default is to use names to identify if the feature generator is needed.
Learning and Preparation - Implement learn() for training logic (if needed). - Implement prepare() to allocate resources and set up attributes.
Feature Generation - Implement generate() to produce the feature for each sample. - Implement get_p_data() if your generator supports parallel output.
Summary - Implement make_summary() to collect and report statistics after feature generation.
Register Your Feature Generator in header file in FeatureGenerator.h - register a new type in FeatureGeneratorTypes before FTR_GEN_LAST In the documentation comment, specify the name in FeatureGeneratorTypes for Doxygen reference.
Register Your Feature Generator in cpp file FeatureGenerator.cpp - Add your type conversion to ftr_generator_name_to_type - Add your class to FeatureGenerator::new_polymorphic - Add your class to FeatureGenerator::make_processor(FeatureGeneratorTypes generator_type)

Tip:
Follow the structure and naming conventions used in existing feature generators for consistency and easier maintenance.