ButWhy experiments results
Expirement models
NWP_Flu
The model has 22 features, most of them are binary (Drugs, Diagnosis category_set). The non categorical are: Age, Smoking, SpO2, Resp_Rate, Flu.nsamples, Complications.nsamples, Memebership. In this run, Added Shapley gibbs explainer (22 features is OK for shapley gibbs -see apendix for more details). scores are the higher the better from 1-5. this is the histogram of 18 examples of flu. Also added average on sqrt of the 1-5 scores to increase the importance for improving on low scores compare to higher scores - no big change here.
| Explainer_name | 1 | 2 | 3 | 4 | 5 | Mean_Score | Mean_of_Sqrt_Score |
|---|---|---|---|---|---|---|---|
| Tree_with_cov | 0 | 2 | 3 | 8 | 5 | 3.888889 | 1.955828857 |
| Tree | 0 | 2 | 3 | 10 | 3 | 3.777778 | 1.929599082 |
| SHAP_Gibbs_LightGBM | 0 | 1 | 7 | 8 | 2 | 3.611111 | 1.889483621 |
| missing_shap | 0 | 1 | 9 | 4 | 4 | 3.611111 | 1.885941263 |
| LIME_GAN | 1 | 4 | 7 | 4 | 2 | 3.111111 | 1.736296992 |
| SHAP_GAN | 2 | 4 | 6 | 6 | 0 | 2.888889 | 1.669397727 |
| knn | 0 | 7 | 6 | 5 | 0 | 2.888889 | 1.682877766 |
| knn_with_th | 8 | 2 | 5 | 2 | 1 | 2.222222 | 1.42915273 |
Summary - in the simple case of 22 features Tree_with_Covariance preforms the best and than the regular tree. Not far behind SHAP_Gibbs_LightGBM and missing_shap which preforms similarly. Reference to expirement results:
- compare_blinded.tsv - the blinded experiment - for each sample random shuffle of explainers outputs. and in xlsx format: compare_blinded.xlsx
- map.ids.tsv - the order of each explainer
- summary.tsv - results for each sample - with explainers aligned (not blinded) - after joining map.ids.tsv with compare_blinded.tvs
CRC
| Explainer_name | 1 | 2 | 3 | 4 | 5 | <EMPTY> | Score | Score_Sqrt |
|---|---|---|---|---|---|---|---|---|
| Tree_with_cov | 0 | 1 | 13 | 21 | 3 | 1 | 3.684211 | 1.911555 |
| Tree | 0 | 3 | 16 | 17 | 2 | 1 | 3.473684 | 1.853358 |
| LIME_GAN | 0 | 15 | 11 | 12 | 0 | 1 | 2.921053 | 1.691204 |
| SHAP_GAN | 0 | 14 | 16 | 6 | 2 | 1 | 2.894737 | 1.683788 |
| missing_shap | 4 | 18 | 7 | 8 | 1 | 1 | 2.578947 | 1.574112 |
| knn | 6 | 17 | 9 | 6 | 0 | 1 | 2.394737 | 1.516581 |
| knn_with_th | 6 | 18 | 10 | 4 | 0 | 1 | 2.315789 | 1.494115 |
Reference to expirement results:
- compare_blinded.tsv - the blinded experiment - for each sample random shuffle of explainers outputs. and in xlsx format: compare_blinded_CRC.xlsx
- map.ids.tsv - the order of each explainer
- summary.sum.tsv - results for each sample - with explainers aligned (not blinded) - after joining map.ids.tsv with compare_blinded.tvs
Pre2D
| Explainer_name | 1 | 2 | 3 | 4 | 5 | <EMPTY> | Score | Score_of_Sqrt |
|---|---|---|---|---|---|---|---|---|
| SHAP_GAN | 0 | 4 | 43 | 86 | 5 | 2 | 3.666667 | 1.908082456 |
| LIME_GAN | 0 | 3 | 49 | 78 | 7 | 3 | 3.649635 | 1.903398585 |
| Tree | 0 | 5 | 48 | 82 | 3 | 2 | 3.601449 | 1.890708047 |
| Tree_with_cov | 0 | 10 | 63 | 63 | 2 | 2 | 3.413043 | 1.838648351 |
| missing_shap | 1 | 26 | 76 | 34 | 0 | 3 | 3.043796 | 1.732886234 |
| knn | 3 | 43 | 61 | 29 | 2 | 2 | 2.884058 | 1.680713177 |
| knn_with_th | 22 | 36 | 56 | 22 | 2 | 2 | 2.608696 | 1.582454126 |
Reference to expirement results:
- compare_blinded.tsv - the blinded experiment - for each sample random shuffle of explainers outputs. and in xlsx format:
- map.ids.tsv - the order of each explainer
- summary.tsv - results for each sample - with explainers aligned (not blinded) - after joining map.ids.tsv with compare_blinded.tvs
Conclusions
Summary Table all expirements:
| Method | Flu 1 | Flu 0.5 | CRC 1 | CRC 0.5 | Diabetes 1 | Diabetes 0.5 | L1 | L0.5 |
|---|---|---|---|---|---|---|---|---|
| Tree_with_cov | 3.888889 | 1.955828857 | 3.684211 | 1.912 | 3.413043 | 1.8386484 | 3.662048 | 1.902011 |
| Tree | 3.777778 | 1.929599082 | 3.473684 | 1.853 | 3.601449 | 1.890708 | 3.617637 | 1.891222 |
| SHAP_Gibbs_LightGBM | 3.611111 | 1.889483621 | 3.611111 | 1.889484 | ||||
| LIME_GAN | 3.111111 | 1.736296992 | 2.921053 | 1.691 | 3.649635 | 1.9033986 | 3.227266 | 1.776967 |
| SHAP_GAN | 2.888889 | 1.669397727 | 2.894737 | 1.684 | 3.666667 | 1.9080825 | 3.150098 | 1.753756 |
| missing_shap | 3.611111 | 1.885941263 | 2.578947 | 1.574 | 3.043796 | 1.7328862 | 3.077951 | 1.73098 |
| knn | 2.888889 | 1.682877766 | 2.394737 | 1.517 | 2.884058 | 1.6807132 | 2.722561 | 1.626724 |
| knn_with_th | 2.222222 | 1.42915273 | 2.315789 | 1.494 | 2.608696 | 1.5824541 | 2.382236 | 1.501907 |
- The tree algorithm works the best in gerneal when the predictor is tree based. the covariance fix also improves it slightly.
- The LIME\SHAP are pretty similar. The LIME is slightly better and faster so it's preferable over SHAP. They are also model agnostic, but hareder to train. Gibbs might imporve the results (but be much slower) and might be usefull if we use it on not too many features/groups of features.
- The missing_shap - very simple and fast model (also model agnostic). It preforms good in some problems, but has some train parameters the are important to tune. In previous experiments in Pre2D it was much better (used different train parameters that made it worse comapre to the previous run). After runing with better params, I see it's even better than Shapley,LIME methods..BUG found in paramters in missing_shap that had disabled the grouping and cause problem - need to run again (Can't run with Grouping and "group_by_sum=1", should use the grouping mechanisim in missing_shap)...Bug found when training with wrong weights in missing_shap when using groups!
- KNN - should be used without threshold. For now, it haven't prove itself enougth to be used.
- If we use Trees predictors without groups - the shapley values should do the job without covariance fix. It's a unique solution that preserve fairness.
Appendix - Gibbs in Flu NWP
The Gibbs shows seperation of 0.6 between the generated samples and the real ones (when using random mask with probability 0.5 for each feature) and seperation of 0.719 when generating all features. This is high quality generation of matrix. For example GAN show seperation of 0.99 when generating all features and 0.74 when choosing random masks. Test gibbs script

