Reviewing ETL loading results
Purpose
This page explains how to inspect and interpret the ETL outputs after running the ETL process. It tells you where to find logs and visualizations, what the most important checks are, and how to spot common issues such as unit mismatches or unexpected value distributions.
Where outputs are stored
All ETL outputs are written under the repository WORK_DIR. See the high-level path: WORK_DIR.
Key folders to review:
signal_processings_log/- per-signal processing logs and diagnostics.outputs/- summary test logs (namedtests.*log) produced by the validation checks. Directory for each signal and its distribution values.
How-to: quick checklist
- Open
outputs/tests.*log(for exampleoutputs/tests.labs.log) to read automated validation results. - If the test log reports anomalies, inspect the per-signal directory under
outputs/$SIGNAL/for plots/visualizations and batch-level reports to help confirm unit consistency and distribution shapes.
Example β reading a test log
Here is a representative excerpt from outputs/tests.labs.log (Hemoglobin checks):
Mechanism β what this output tells you
- KLD (KullbackβLeibler divergence) per source indicates how similar that source's value distribution is to the overall signal distribution. Small KLD (<< 1) means similar.
- The sections labelled "There are issues with ..." flag quantile-level discrepancies between the current dataset and a reference distribution. These point to potential unit mismatches, data-entry issues or population differences.
Columns in the discrepancy tables
q: quantile being compared (for example 0.001, 0.5, 0.999)value_0: quantile value in the current datasetreference: quantile value in the reference datasetratio1: value_0 / referenceratio2: 1 / ratio1ratio: max(ratio1, ratio2) β used to highlight large deviations
Interpreting large ratios
Large ratios (for example ~10) often indicate unit mismatches. A common case for Hemoglobin is g/L vs g/dL (multiply g/dL by 10 to get g/L). If you see a consistent factor across quantiles, consider converting units or normalizing the source before further processing.
[!IMPORTANT] If there are mismatches in the input, loading does not fail by default. Warnings will appear in the logs. It is your responsibility to review and correct data issues where needed.
Deep-dive: log locations and visual aids
- Signal specific test log:
ETL/outputs/test.$SIGNAL.log - Signal processing log (runtime messages, dropped lines):
ETL/signal_processings_log/$SIGNAL.log - Batch-level charts and aggregated reports:
ETL/signal_processings_log/$SIGNAL/batches/andETL/signal_processings_log/$SIGNAL
Practical checks and recommended workflow
- Scan
outputs/tests.*logfor flagged issues. - For flagged signals, open
signal_processings_log/$SIGNAL/and inspect per-signal log for dropped records and warnings. - Use the charts in
ETL/outputs/$SIGNALto confirm whether a discrepancy comes from unit differences, data entry errors, or genuine population shifts. - If a unit mismatch is found, apply a unit conversion and re-run the pipeline for that signal.
Visual examples
Example 1 β monthly vs yearly distribution (Hemoglobin)
Monthly charts can look noisy when most data come from a single year. In that case, check both monthly and yearly views to avoid false alarms.

Example 2 β distribution shape check
On the right is a smooth (expected) distribution. On the left are unexplained "vibrations"; they may indicate data-quality issues or batch artifacts. Discuss with the dataset owner to confirm.
