Using the Flow App
Overview
The Flow App is a versatile tool with multiple switches, each designed to perform a specific action. Below are its key functionalities:
- Load New Repository: Converts raw ETL output files into an efficient, binary, and indexed format compatible with the AlgoMedical library framework.
- Train a model.
- Apply a model to generate predictions.
- Extract feature matrices from the model pipeline.
- Print specific patient data or signal distributions.
- Feature Importance with Shapley Values Analysis.
- Prepare Samples and Get Incidences Using Flow.
- Fit MedModel to Repository: Adjusts an existing model to fit a new repository. For instance, if a non-critical signal is missing, the "fit" operation generates a virtual empty signal to bypass errors, ensuring compatibility. The suggested changes can later be reviewed and validated or corrected. More information inside
GitHub Repository: Flow App Code
The Flow App is also compiled as part of AllTools. Refer to the Setup Instructions.
Flow App Options
General Switches
--help
: Displays the full help menu.--help_
: Searches the help menu and displays only the relevant sections matching the search term.--rep
: Specifies the path to the repository.
Creating Repositories
To create a repository using a convert configuration file, use the --convert_conf
option:
To create a by-pid transposed version of a repository, use the following command. This allows faster access to specific patient IDs (pids) and reduces memory consumption significantly:
For more details on creating repositories, convert configuration files, and required inputs, refer to Load a New Repository.
Printing PIDs and Signals
- Print all records for all signals for a specific pid using the default API:
- Print all records for all signals for a specific pid using the by-pid API (faster but requires a by-pid repository):
- Print all records for a specific signal and pid using the default API:
- Print all records for a specific signal and pid using the by-pid API (faster but requires a by-pid repository):
- Print general statistics for a signal, such as sample counts, gender distribution, average samples per person, and more. This works only for
SDateVal
type signals and repositories containingGENDER
andBYEAR
signals:
Training a Model
To train a model, you need the following inputs:
REPOSITORY_PATH
: Path to the data repository.PATH_TO_TRAIN_SAMPLES
: Path to MedSamples, a TSV file defining labels for each patient and point in time.PATH_TO_JSON_WITH_MODEL_INSTRUCTIONS
: Path to the JSON file defining the model architecture. See Model JSON Format.PATH_TO_OUTPUT_TO_STORE_MODEL
: Path to save the trained model.
Example command:
For cross-validation, use the --train_test
mode switch. However, this is deprecated.
Use the Optimizer instead.
Predicting/Applying a Model
To apply a model, you need the following inputs:
REPOSITORY_PATH
: Path to the data repository.PATH_TO_TRAIN_SAMPLES
: Path to MedSamples, defining requested prediction times for each patient. The outcome column is not used during testing.PATH_TO_TRAINED_MODEL_BINARY_FILE
: Path to the stored model.OUTPUT_PATH_TO_STORE_SAMPLES
: Path to save the predictions. The output will include apred_0
column in the MedSamples file for each requested prediction date.
Example command:
Pre-processors can be added to the beginning of the model pipeline to manipulate raw signals before they are fed into the model. This allows you to perform operations that don't require training or storage in the model itself, such as simulating the removal or limitation of a specific signal. For more details, see Using Pre Processors
Creating a Feature Matrix for Samples
To create a feature matrix, use the same inputs as for predicting/applying a model. The output will be a CSV file containing the feature matrix:
To inspect the training matrix directly from the model JSON, use the following command. The inputs are the same as for model training, but the output is a matrix instead of a model:
Print trained model information
To inspect model pipeline:
--print_json_format 1 --f_output $OUTPUT_JSON
and set OUTPUT_JSON
to output path with a more detailed information about the model. It is not exactly a json format, but this is textual.