Skip to content

Test_07 - feature importance

Overview

Tests that the model feature importance in different way - how each signal impact model performance if we will not "send" it to the model. We will test each signal in a group that the signal exists, otherwise important signals but very rare will get low importance and we don't want necessarily this to happen.

Input

  • WORK_DIR - output work directory
  • MODEL_PATH - path for model
  • REPOSITORY_PATH - repository path
  • BT_JSON_FAIRNESS - json for bootstrap analysis
  • FAIRNESS_BT_PREFIX - bootstrap cohort definition to focus on feature importance  

Depends:

Test_06 -  predictions file results

Output

${WORK_DIR}/ButWhy/feature_importance.sorted_final.tsv - file with sorted signals importance. For each signal - how many of the samples has this signal (feature are not missing values) and how important is it (impact on AUC if you don't pass it to the model).