Test 09: Coverage
Purpose
Verify that the model correctly identifies and flags high-risk groups, ensuring coverage of critical populations as defined by custom rules.
Required Inputs
From configs/env.sh:
WORK_DIR: Output directory for resultsMODEL_PATH: Path to the modelREPOSITORY_PATH: Path to the data repositoryTEST_SAMPLES: Path to the test samplesconfigs/coverage_groups.py: Python file with pandas rules to define high-risk groups
How to Run
From your TestKit folder, execute:
What This Test Does
- Uses custom rules (from
coverage_groups.py) to define high-risk cohorts in your data - Checks how well the model flags these groups compared to random selection
- Calculates coverage metrics and lift for each group at multiple score cutoffs
Output Location
- Main log:
${WORK_DIR}/09.test_coverage.log
Example: Defining a Risk Group
Suppose we want to flag undiagnosed CKD patients with low eGFR (<65):
Example Output
How to Interpret Results
- Review the lift values for each cutoff: high lift means the model flags high-risk patients much more often than random
- Check the percentage of the cohort covered at each cutoff
- Use these metrics to validate that the model is useful for identifying key populations
In the example output interpertation: The model targets patients with an eGFR below 65 (a cohort of 19,426 patients, or 3.2% of the total population of 605,636). Using a 1% population cutoff (score≥0.51042), the model flags 6,057 patients. Within this flagged group:
- 2,246 patients (or 11.6% of the total eGFR<65 group) are correctly identified.
- The "lift" is 11.6, meaning the flagged patients are 11.6 times more likely to have eGFR<65 than a randomly selected patient.
- The Precision (or probability of having eGFR<65 in the flagged group) is 37.1% (2,246/6,057). While most flagged patients have eGFR≥65, over a third have eGFR<65.