Model Evaluation
Please refer to our suggested Checklist for model evaluation Common evaluation tools and workflows used in MES:
- bootstrap_app - bootstrap-based performance analysis with cohort and subgroup filtering. For example, we might want to test performance in different sub-cohorts: time window, age range, etc.
- Feature importance and post-processing - see Flow post-processors.
- Explainability - add model explainers as post-processors. See the Explainers Guide and our patent US20240161005A1 for the MES-specific approach. Recognizing that standard Shapley values struggle with high-dimensional, correlated medical data, we developed a specialized extension. This new method was validated by our clinicians in a blinded study against other explainability techniques and was a key component of our award-winning submission to the CMS Health AI Outcome Challenge. The results are published, but some of the process can be seen in the Research tab of this wiki.
- Covariate-shift / simulation tools - Simulator.
- Automated checks - AutoTest, a pipeline of tests derived from the Model Checklist.