This guide covers two primary methods for training a model: using the Python API for a code-driven approach or using command-line tools.
Step 1: Define the Model Architecture
Before training, you must define the model's architecture in a JSON file. This file specifies the entire pipeline, including feature generators, processors, and the algorithm to be used. This is conceptually similar to defining layers and stacking them in a deep learning framework.
importmed# --- 1. Configuration ---rep_path='/path/to/your/repository_file'model_json_path='/path/to/your/model_architecture.json'samples_path='/path/to/your/training_samples.tsv'output_model_path='trained_model.bin'# --- 2. Initialize Model and Fit to Repository ---print("Initializing model and fitting to the repository...")# Load the model architecture from the JSON filemodel=med.Model()model.init_from_json_file(model_json_path)# Before loading all data, initialize a repository to analyze its structure# This helps the model determine which signals are "virtual" (i.e., can be# generated from other signals, like BMI from Height and Weight) versus# which need to be fetched directly.rep=med.PidRepository()rep.init(rep_path)model.fit_for_repository(rep)# Get the list of signals that must be fetched from the repositoryrequired_signals=model.get_required_signal_names()# --- 3. Load Data ---print("Loading training samples and repository data...")# Load the training samples (patient IDs, dates, and labels)samples=med.Samples()samples.read_from_file(samples_path)# Alternatively, load from a pandas DataFrame:# samples.from_df(your_dataframe)# Get the unique patient IDs required for trainingpatient_ids=samples.get_ids()# Load the actual data for the required signals and patientsrep=med.PidRepository()rep.read_all(rep_path,patient_ids,required_signals)# --- 4. Train the Model ---print("Starting model training...")model.learn(rep,samples)print("Training complete.")# --- 5. Save the Trained Model ---model.write_to_file(output_model_path)print(f"Model saved to {output_model_path}")# --- Optional: Post-Training Steps ---# Access the generated feature matrix as a pandas DataFramefeature_matrix_df=model.features.to_df()# Apply the trained model to a new set of samples (e.g., a test set)# test_samples = med.Samples()# test_samples.read_from_file('path/to/test_samples.tsv')# predictions = model.apply(rep, test_samples)
Loading a Trained Model
You can load a previously trained binary model file directly without needing the original JSON or data.
Optimizer: For grid-search style hyperparameter tuning with built-in regularization to improve generalization. For more advanced tuning, consider using external libraries like Optuna.