Accessing Repository Data
This guide explains two methods for accessing data from the repository: using the Python API for programmatic access or using MES Tools for a user-interface-based approach.
Method 1: Using the Python API
This method is ideal for programmatic data access and analysis within a Python environment.
Prerequisites
Before you begin, ensure you have:
- Installed the Python API for MES Infrastructure.
- Loaded a Data Repository by following the ETL Tutorial.
Basic Data Retrieval
To get started, import the med library and initialize the PidRepository with the path to your loaded repository file.
Full API Sphinx Link
Filtering by Patients and Signals
You can optimize data loading by initializing the repository with a specific list of patient IDs and signals. This is particularly useful for large datasets.
Working with Categorical Signals
Categorical signals, like 'DIAGNOSIS', can be handled in two ways.
1. Translated (String) Values
By default, get_sig returns a DataFrame with human-readable string values. While convenient, this can be memory-intensive.
2. Untranslated (Numeric) Codes
For more efficient memory usage and advanced querying, you can retrieve the raw numeric codes for each category.
This returns a DataFrame with integer codes, which is more memory-efficient but requires an extra step for querying based on categories.
Efficiently Querying Categorical Data
To efficiently query categorical signals by their meaning (e.g., finding all diagnoses related to a specific disease group), use lookup tables (LUTs).
A lookup table is an array where each index corresponds to a numeric category code. By marking relevant indices, you can perform very fast filtering.
Example: Filtering for Respiratory Diseases
Let's filter the 'DIAGNOSIS' signal for all codes corresponding to respiratory diseases, defined by the ICD-10 range J00-J99.
Step 1: Create a Lookup Table
First, get the dictionary section for the 'DIAGNOSIS' signal. Then, create a lookup table for the desired code range.
The lut now contains 1 at indices corresponding to the J00-J99 codes and 0 otherwise. This mapping can handle complex relationships, such as mapping NDC drug codes to ATC codes, not just simple string matching.
[!NOTE] You can pass multiple values to prep_sets_lookup_table, it acceptes a list of codes to create a single lookup table with OR condition between all codes.
Step 2: Apply the Lookup Table
Now, use the lookup table to filter your DataFrame of diagnosis codes.
filtered_diagnosis now contains only the diagnosis records that fall within the J00-J99 range.
Method 2: Using MES Tools and UI
If you prefer command line or a graphical interface, you can use the MES Tools.
Prerequisites
First, complete the MES Tools Setup.
Examples
- View or Export Data with
Flow: See the guide on how to use Flow to view signals and export data. - Inspect a Single Patient: Use the Repository Viewers UI to open and explore the data for a single patient.