Model Validation Checklist
Data Distribution and Performance
- Analyze sample distribution over time: count controls and cases by year and month.
- Perform bootstrapping on the validation set (and preferably on a future time set). For the future set, also assess performance on the same patients.
- Evaluate performance (AUC and other metrics) across years, months, and time windows.
- Assess results by age group, sex, and key comorbidities (e.g., diabetes, COPD, CVD).
- Check minimal membership period and presence/absence of key lab tests, if relevant.
- Assess calibration on the same samples used for bootstrapping.
Model Analysis
- Conduct ButWhy analysis:
- Examine global feature importance, with and without grouping signals.
- Analyze contributions of individual features: for important features, report mean score, outcome, and Shapley value for each value bin.
- Evaluate coverage and lift for risk groups at various PR cutoffs. For example, determine the prevalence of COPD patients with hospital admissions and the proportion captured in top x, y, z PR cutoffs.
- Print feature matrix: report mean and CI/STD for each feature to identify outliers or unreasonable values (can be done on large test/train matrices).
- Compare matrices across years:
- Analyze score distributions over multiple years.
- Build a propensity model to differentiate between years and identify changing features.
Fairness and Bias
- Assess fairness and bias:
- Without matching: compare across sex, age groups, insurance, race, and socio-demographic factors.
- With matching: control for important clinical or explanatory features.
External and Baseline Validation
- Validate externally on different datasets.
- Compare to a simple baseline model: assess not only performance but also which patients are flagged. Use ButWhy analysis to understand population differences.
Sensitivity and Robustness
- Perform sensitivity analysis:
- Add noise to lab values.
- Shift dates.
- Remove lab values to simulate missing data.
- Ensure the model applies cleaning procedures to all signals.
Applying to New Datasets Without Labels
- Compare test matrix to training repository matrix: check feature moments using TestModelExternal or train a propensity model.
- Also compare score distributions, both raw and after matching on key factors.
- Run ButWhy importance analysis on the test set and compare with the training repository.
- Report statistics on outliers detected by cleaning procedures.
Test Kit for Model Validation
For models in development, external validation with labels, or silent run, see the tools in this repository: https://github.com/Medial-EarlySign/MR_Tools, for example under MR_Tools/AutoValidation.