Finalizing the Load Process
his step completes the ETL pipeline and prepares the repository for use.
Recap of earlier steps
- Prepare signals: Run
prepare_final_signals
for each data type (see previous step) - Handle client dictionaries (if needed): Use
prepare_dicts
for categorical signals
Now we will finalize the preparation and generate all configuration files needed for loading by using a third function: finish_prepare_load
.
finish_prepare_load
Finalizes preparation and loads your data into the repository.
Parameters:
WORK_DIR
- path to the working directory (string)REPOSITORY_OUTPUT_DIR
- destination folder for the repository (string)REPO_NAME
- name of the repository (string)
Full Workflow Example
Hereβs a complete example combining all steps:
Function Reference
1. prepare_final_signals
Processes and tests each data type. Handles batching if needed.
- Arguments:
data_fetcher
orDataFrame
: Source of your dataworkdir
: Working directory for outputssignal_type
: Name/type of the signal (used for classification)batch_size
: Batch size (0 = no batching)override
:'y'
to overwrite,'n'
to skip completed signals
2. prepare_dicts
Creates mapping dictionaries for categorical signals.
- Arguments:
workdir
: Working directorysignal
: Signal namedef_dict
: DataFrame with internal codes and descriptions (optional)set_dict
: DataFrame mapping client codes to known ontology
3. finish_prepare_load
Finalizes preparation, generates signals, and loads the repository.
- Arguments:
workdir
: Working directorydest_folder
: Destination for the repositorydest_rep
: Repository name (prefix)to_remove
(optional): List of signals to skipload_only
(optional): List of signals to load only
Extending and Testing
For guidance on extending the process and adding automated tests, see Test Extention