Skip to content

Model Validation Checklist

Data Distribution and Performance

  • Analyze sample distribution over time: count controls and cases by year and month.
  • Perform bootstrapping on the validation set (and preferably on a future time set). For the future set, also assess performance on the same patients.
    • Evaluate performance (AUC and other metrics) across years, months, and time windows.
    • Assess results by age group, sex, and key comorbidities (e.g., diabetes, COPD, CVD).
    • Check minimal membership period and presence/absence of key lab tests, if relevant.
  • Assess calibration on the same samples used for bootstrapping.

Model Analysis

  • Conduct ButWhy analysis:
    • Examine global feature importance, with and without grouping signals.
    • Analyze contributions of individual features: for important features, report mean score, outcome, and Shapley value for each value bin.
  • Evaluate coverage and lift for risk groups at various PR cutoffs. For example, determine the prevalence of COPD patients with hospital admissions and the proportion captured in top x, y, z PR cutoffs.
  • Print feature matrix: report mean and CI/STD for each feature to identify outliers or unreasonable values (can be done on large test/train matrices).
  • Compare matrices across years:
    • Analyze score distributions over multiple years.
    • Build a propensity model to differentiate between years and identify changing features.

Fairness and Bias

  • Assess fairness and bias:
    • Without matching: compare across sex, age groups, insurance, race, and socio-demographic factors.
    • With matching: control for important clinical or explanatory features.

External and Baseline Validation

  • Validate externally on different datasets.
  • Compare to a simple baseline model: assess not only performance but also which patients are flagged. Use ButWhy analysis to understand population differences.

Sensitivity and Robustness

  • Perform sensitivity analysis:
    • Add noise to lab values.
    • Shift dates.
    • Remove lab values to simulate missing data.
  • Ensure the model applies cleaning procedures to all signals.

Applying to New Datasets Without Labels

  • Compare test matrix to training repository matrix: check feature moments using TestModelExternal or train a propensity model.
    • Also compare score distributions, both raw and after matching on key factors.
  • Run ButWhy importance analysis on the test set and compare with the training repository.
  • Report statistics on outliers detected by cleaning procedures.

Test Kit for Model Validation

For models in development, external validation with labels, or silent run, see the tools in this repository: https://github.com/Medial-EarlySign/MR_Tools, for example under MR_Tools/AutoValidation.