Finalizing the Load Process
This step completes the ETL pipeline and prepares the repository for use.
Recap of earlier steps
- Prepare signals: Run
prepare_final_signalsfor each data type (see previous step) - Handle client dictionaries (if needed): Use
prepare_dictsfor categorical signals
Now we will finalize the preparation and generate all configuration files needed for loading by using a third function: finish_prepare_load.
finish_prepare_load
Finalizes preparation and loads your data into the repository.
Parameters:
WORK_DIR- path to the working directory (string)REPOSITORY_OUTPUT_DIR- destination folder for the repository (string)REPO_NAME- name of the repository (string)
Full Workflow Example
Hereβs a complete example combining all steps:
Final Script to Create the Data Repository
Look for a message on your screen similar to the last two lines below, which provide the full path to the script that runs Flow and generates the repository:
WORK_DIR/rep_configs/load_with_flow.sh. Run this script and confirm that it completes successfully with a success message at the end.
Function Reference
1. prepare_final_signals
Processes and tests each data type. Handles batching if needed.
- Arguments:
data_fetcherorDataFrame: Source of your dataworkdir: Working directory for outputssignal_type: Name/type of the signal (used for classification)batch_size: Batch size (0 = no batching)override:'y'to overwrite,'n'to skip completed signals
2. prepare_dicts
Creates mapping dictionaries for categorical signals.
- Arguments:
workdir: Working directorysignal: Signal namedef_dict: DataFrame with internal codes and descriptions (optional)set_dict: DataFrame mapping client codes to known ontology
3. finish_prepare_load
Finalizes preparation, generates signals, and loads the repository.
- Arguments:
workdir: Working directorydest_folder: Destination for the repositorydest_rep: Repository name (prefix)to_remove(optional): List of signals to skipload_only(optional): List of signals to load only
Extending and Testing
For guidance on extending the process and adding automated tests, see Test Extention
Validating The ETL Outputs and Tests
Follow this guide