Skip to content

PidDynamicRec

The PidDynamicRec class provides advanced access to signal data, allowing both reading and modification, as well as maintaining multiple "versions" of a signal vector. This is particularly useful when implementing RepProcessors in the MedProcessTools classes, where signals may need to be cleaned, altered, or removed before feature creation.

Why are multiple versions necessary?

When running a RepProcessor and modifying a signal at timepoint T, data from future timepoints (t > T) might inadvertently influence the change, leading to data leakage during model training and testing. For example, if testing at T_test (where T_test ≥ T) and using data from t > T_test, future information contaminates the test.

The complexity increases when testing at multiple timepoints for a patient (e.g., T_test1 and T_test2, with T_test2 > T_test1). The allowed "horizon" for each test is:

  • For T_test1: t ≤ T_test1
  • For T_test2: t ≤ T_test2

Values generated by RepProcessors for t ≤ T_test1 may differ between these horizons, requiring snapshots ("versions") of the data as seen up to each timepoint. Efficient management of these versions is essential for performance and memory usage.

PidDynamicRec addresses these challenges by enabling efficient versioning.

Version Numbering

  • Version 0: The original data from the repository.
  • Versions 1+: Created and managed using PidDynamicRec tools, typically corresponding to different prediction timepoints.

Initializing a PidDynamicRec

PidDynamicRec inherits from PidRec and is initialized similarly, with the addition of specifying the number of versions to maintain (usually matching the number of prediction timepoints). Use the set_n_versions() method:

int init_from_rep(MedRepository *rep, int pid, vector<int> &sids_to_use, int n_versions);

Reading from a PidDynamicRec

To read data, specify both the signal and the version:

1
2
3
4
void *get(int sid, int version, int &len);
void *get(string &sig_name, int version, int &len);
void *uget(int sid, int version, UniversalSigVec &_usv);
void *uget(const string &sig_name, int version, UniversalSigVec &_usv);

Each PidRec maintains a usv object for thread safety and efficiency.

Modifying Versions in PidDynamicRec

The core purpose of PidDynamicRec is to create and manage versions that differ from the original. Initially, all versions reference version 0.

Key methods for version management:

// Set a version to new data
int set_version_data(int sid, int version, void *datap, int len);

// Copy original data into a new version
int set_version_off_orig(int sid, int version);

// Point one version to another's data
int point_version_to(int sid, int v_src, int v_dst);

// Remove an element from a version
int remove(int sid, int version, int idx);

// Remove from one version and place in another
int remove(int sid, int v_in, int idx, int v_out);

// Change an element in a version
int change(int sid, int version, int idx, void *new_elem);

// Change in one version and copy to another
int change(int sid, int v_in, int idx, void *new_elem, int v_out);

// Batch update: changes and removals
int update(int sid, int v_in, vector<pair<int, void *>>& changes, vector<int>& removes);

// Batch update with value channel
int update(int sid, int v_in, int val_channel, vector<pair<int, float>>& changes, vector<int>& removes);

Example: Cleaning Glucose Signal

MedRepository rep;
int pid;
// ... load repository and set pid ...

PidDynamicRec pdr;
int sid = rep.sigs.sid("Glucose");
vector<int> sids = { sid };
pdr.init_from_rep(&rep, pid, sids, 1); // Initialize with one extra version

UniversalSigVec usv;
pdr.uget(sid, 0, usv); // Get original data

// Option 1: Create a new vector and set as version 1
vector<SDataVal> new_glu;
for (int i = 0; i < usv.len; i++) {
    if (usv.Val(i, 0) > 0) {
        SDataVal sdv;
        sdv.date = usv.Time(i, 0);
        sdv.val = (float)((int)usv.Val(i, 0));
        new_glu.push_back(sdv);
    }
}
pdr.set_version_data(sid, 1, &new_glu[0], (int)new_glu.size());

// Option 2: Batch update
vector<pair<int, float>> changes;
vector<int> removes;
for (int i = 0; i < usv.len; i++) {
    if (usv.Val(i, 0) > 0) {
        changes.push_back({i, (float)((int)usv.Val(i, 0))});
    } else {
        removes.push_back(i);
    }
}
pdr.update(sid, 1, 0, changes, removes);

// Read modified version
pdr.uget(sid, 1, usv);
// usv now contains the cleaned data

Efficiency: Version Pointing and Iteration

Often, multiple versions are identical (e.g., when only considering data up to a given timepoint). To optimize, versions can "point" to the same data, only splitting when changes are needed.

Mechanisms: 1. Version Pointing: A version can reference another's data, splitting only when modifications occur. 2. Iterators: Use iterators to process blocks of versions sharing the same data.

API Examples:

class PidDynamicRec : public PidRec {
    // Point one version to another
    int point_version_to(int sid, int v_src, int v_dst);

    // Check if two versions share data
    int versions_are_the_same(int sid, int v1, int v2);
    int versions_are_the_same(set<int> sids, int v1, int v2);
    // ...
};

// Iterator for blocks of identical versions
class differentVersionsIterator : public versionIterator {
    int jVersion;
    int init();
    int next();
    bool done() { return iVersion < 0; }
    inline int block_first() { return jVersion + 1; }
    inline int block_last() { return iVersion; }
};

Iterating Over Versions Example:

1
2
3
4
5
6
7
_apply(PidDynamicRec& rec, vector<int>& time_points, ... ) {
    differentVersionsIterator vit(rec, reqSignalIds);
    for (int iver = vit.init(); !vit.done(); iver = vit.next()) {
        // Process versions from vit.block_first() to vit.block_last()
        // All versions in this block point to the same data, optimizing performance
    }
}