Skip to content

TestModelExternal

TestModelExternal is a tool designed to compare differences between repositories or sample sets when applying a model. It builds a propensity model to distinguish between repositories or samples, revealing differences and enabling straightforward comparison of feature matrices. The main goal is to identify complex patterns when comparing data.

You can use this tool to:

  • Compare feature matrices from different repositories to check model transferability and detect issues in new repositories, such as:
    • Bugs in data handling, eligibility, or client data extraction
    • Estimating expected model performance in a new repository, even without labels, using the propensity model
  • Compare samples within the same repository, for example, to analyze data from different years and identify feature differences.

TestModelExternal is part of the MR_Tools repository and can be compiled under AllTools.

Mode 1: Compare When Both Repositories Are Available

Required arguments:

  • model_path: Path to the binary MedModel to test (required in all modes)
  • rep_test: Repository for testing and comparison. In the propensity model, data from this repository is labeled as 1.
  • samples_test: Path to MedSamples for the test repository
  • output: Directory for output files
  • rep_trained: Path to the trained model's repository (or reference repository)
  • samples_train: Path to MedSamples from the training repository (ensure the same method/eligibility rules are used in both datasets)
  • predictor_type, predictor_args: Parameters for the propensity model to distinguish between repositories
  • calibration_init_str: Calibration arguments for the propensity model's post-processor

Optional arguments:

  • smaller_model_feat_size: If > 0, creates an additional smaller propensity model using the top X features
  • additional_importance_to_rank: Path to a SHAP report (from "Flow --shap_val_request") to rank differences combined with feature importance
  • features_subset_file: File to filter features from the MedModel
  • fix_train_res: If > 0, sets feature resolution in training to match the test set
  • sub_sample_train: Integer to limit the maximum number of training samples (0 = no subsampling)
  • sub_sample_test: Integer to limit the maximum number of test samples (0 = no subsampling)
  • train_ratio: Train/test split ratio (test set is used to report propensity model performance)
  • bt_params: Bootstrap parameters for the propensity model
  • binning_shap_params: Parameters for SHAP report analysis on the propensity model
  • group_shap_params: Grouping arguments for SHAP analysis
  • shap_auc_threshold: If AUC is below this value, SHAP analysis is skipped to save time
  • print_mat: If > 0, prints the propensity matrix (0 = labels for train samples, 1 = labels for test samples)

Mode 2: Compare When Repositories Are Not on the Same Machine

In this mode, leave rep_trained and/or samples_train empty.

Required arguments:

  • model_path: Path to the binary MedModel to test (required in all modes)
  • rep_test: Repository for testing and comparison (labeled as 1 in the propensity model)
  • samples_test: Path to MedSamples for the test repository
  • output: Directory for output files
  • strata_json_model: JSON file for creating strata and collecting statistics
  • strata_settings: Strata settings for collecting statistics

When comparing to a different repository on another machine, also provide either:

  • train_matrix_csv: A CSV matrix from the reference to compare with
  • strata_train_features_moments: File for the reference statistics to compare with the specified path. Created in "train" repository and controled with strata_json_model, strata_settings

The train_matrix_csv can be created in the reference by generating a feature matrix and is the prefered way when possible

Mode 3: Compare Different Samples Within the Same Repository

Provide different samples_train and samples_test paths, and use the same values for rep_train and rep_test to indicate the same repository.

Example Output

The tool creates a propensity model and generates a SHAP report for this model. It also produces a compare_rep.txt file, which compares feature averages and standard deviations.

You can use the resulting propensity model to assess expected performance when controlling for changes in your variables of interest.

See examples usages: