MedDictionary
The MedDictionary
class provides methods for mapping integer values to strings, which is essential since MedRepository does not support free text and relies on numerical codes for efficiency (e.g., drug names or read codes). This class enables translating these codes back to their textual descriptions.
Additionally, MedDictionary supports defining sets of values and assigning them names, which is useful for organizing hierarchical classifications (such as ICD codes, drug categories, or cancer types). Efficient membership testing for these sets is also provided.
To handle multiple dictionaries with potentially overlapping numerical codes, MedDictionary uses a sections mechanism.
Dictionary File Format
When loading a repository, dictionary files are read with these rules:
- Ignore empty lines and lines starting with
#
(comments). - All other lines are tab-delimited.
- Section definition:
SECTION <comma-separated list of section names>
Each section can have multiple names. Typically, a section contains all relevant signal names. - Value definition:
DEF <numerical int value> <string>
Multiple names can be assigned to the same value. Each set must have a DEF line for its name. Each int value must be unique. - Set membership:
SET <set name> <member name>
Sets can include other sets, but avoid cyclic definitions.
Example dictionary file:
MedDictionary vs. MedDictionarySections
- MedDictionary: Handles a single dictionary with one namespace for numerical values.
- MedDictionarySections: Manages multiple dictionaries, each in its own section, with APIs for section management.
Initializing Dictionaries
Dictionaries are automatically initialized when a repository is loaded, using the dictionary files specified in the repository config.
To manually initialize, use read(vector<string> &input_dictionary_files)
for either MedDictionary or MedDictionarySections.
Dictionaries are typically used within a MedRepository, but can be used independently.
Key Methods
MedDictionary:
int id(const string &name)
: Get id from name.string name(int id)
: Get name from id (returns the last name if multiple exist).map<int, vector<string>> Id2Names
: Maps id to all its names.int is_in_set(int member_id, int set_id)
: Check if member id is in set.int is_in_set(const string& member, const string& set_name)
: Same as above, using names.int prep_sets_lookup_table(const vector<string> &set_names, vector<char> &lut)
: Creates a fast lookup table for set membership.int prep_sets_indexed_lookup_table(const vector<string> &set_names, vector<unsigned char> &lut)
: Similar, but notes the serial number of the set for each member.
MedDictionarySections:
int section_id(const string &name)
: Get section id from name.vector<MedDictionary> dicts
: Access a dictionary by section id and use all MedDictionary methods.