Test 12: Fairness
Purpose
Assess model fairness by comparing performance (e.g., sensitivity) across sensitive groups at the same score cutoff. The goal is to ensure similar probability of being flagged by the model when you are a case, regardless of group (e.g., gender, ethnicity).
Required Inputs
From configs/env.sh:
WORK_DIR: Output directory for resultsMODEL_PATH: Path to the modelREPOSITORY_PATH: Path to the data repositoryTEST_SAMPLES: Path to the test samplesBT_JSON_FAIRNESS: Bootstrap JSON for filtering cohortsFAIRNESS_BT_PREFIX: Bootstrap cohort definition for fairness testingconfig/fairness_groups.cfg: Defines the groups to compare- Each line: two or more group definitions (bootstrap cohort filter format, separated by
|), tab-delimited with display names (also separated by|) - Example:
- Each line: two or more group definitions (bootstrap cohort filter format, separated by
How to Run
From your TestKit folder, execute:
What This Test Does
- Uses cohort definitions from
fairness_groups.cfgto compare model performance across groups - Calculates AUC, sensitivity and specificity, at common cutoffs (5%, 10%)
- Performs statistical tests (chi-square) to assess significance of differences
- If unfairness is detected, applies matching (e.g., by age) and re-evaluates fairness
Output Location
$WORK_DIR/fairness/fairness_report.tsv: Summary table comparing sensitivity, specificity, and AUC at 5% and 10% cutoffsfairness_report.*: Statistical chi-square results for sensitivity differences (one file for each cutoff)Graph_fairness: Plots sensitivity vs. score threshold for each group (with confidence intervals)graph_matched: Plots after matching (e.g., by age) if fairness is not met
How to Interpret Results
- Look for low chi-square values and similar sensitivity across groups in
fairness_report.tsv - Use
Graph_fairnessto visually compare sensitivity curves - If needed, review matched results in
graph_matchedto see if fairness improves