ETL Process – Dynamic Testing of Signals
You can define both global tests (applied across all ETL processes) and local tests (specific to a given ETL process).
Local tests can override global tests if they share the same name in the local path.
Test Locations
- Global tests:
$MR_ROOT/Tools/RepoLoadUtils/common/ETL_Infra/tests
- Local tests:
$CODE_DIR/tests
The code is executed from
$MR_ROOT/Tools/RepoLoadUtils/common/ETL_Infra
.
This means you can use relative paths to access config files, dictionaries, etc.
Test Organization
- Each test directory (global or local) contains subdirectories for groups of tests.
- Subdirectory names correspond to either:
- A signal name, or
- A group of signals (e.g.,
"labs"
,"cbc"
).
- Only signals matching the directory name will be tested.
Test Function Format
Each test file must include a function called Test
with the following signature:
Arguments:
- df: Input dataframe containing the signal to test
- si: Signal information object
- si.t_ch: Array of time channel types (i = int, f = float, etc.)
- si.v_ch: Array of value channel types
- codedir: Path to the ETL code (useful for accessing the config folder)
- workdir: Working directory for storing outputs
Return value:
- True if the test passes
- False if the test fails
Example Test
Path:
$MR_ROOT/Tools/RepoLoadUtils/common/tests/labs/test_non_nulls.py
This test verifies that no more than 1% null values exist in pid, time_0, value_0 for all labs signals.
You can copy it into a local directory and adjust thresholds as needed.
Plotting Graphs
To generate HTML plots, use the plot_graph function:
- Input:
- A dataframe with two columns, or
- A dictionary {name: dataframe} (to plot multiple series)
Running Tests on Signals
You can run or rerun tests with:
* --signal can accept multiple signals (comma-separated).