Skip to content

Test 7 - Compare Important Feature

Overview One more test exploring the differences between the current dataset, and the dataset that was used to train the model, from the important features aspect (and see also Test 2).   Parameters (see env.sh) All parameters are included in env.sh and described in External Silent Run. In particular, this test uses:

  • CMP_FEATURE_RES=... list of important features for the model, with resolution 

What is actually done? Several test and graphs per important feature.   Test Results Review In the log file 07.feature_analysis.log, for every feature, we get something like the following (but note that we need to check the the similarity measure (KLD). : In features_stat.tsv, we get similar results, without separation male/female and without similarity measures, but with missing value rate in the test. Last, we have graphical histogram (with the defined resolution above) in the directory features_graphs.   How to read the results? This test repeats part of Test 2, in a slightly difference way, with separation male/female. Thus, when we see differences in a specific feature in Test 2, we can use the output here to further investigate the gaps.