The PidDynamicRec class provides advanced access to signal data, allowing both reading and modification, as well as maintaining multiple "versions" of a signal vector. This is particularly useful when implementing RepProcessors in the MedProcessTools classes, where signals may need to be cleaned, altered, or removed before feature creation.
Why are multiple versions necessary?
When running a RepProcessor and modifying a signal at timepoint T, data from future timepoints (t > T) might inadvertently influence the change, leading to data leakage during model training and testing. For example, if testing at T_test (where T_test ≥ T) and using data from t > T_test, future information contaminates the test.
The complexity increases when testing at multiple timepoints for a patient (e.g., T_test1 and T_test2, with T_test2 > T_test1). The allowed "horizon" for each test is:
For T_test1: t ≤ T_test1
For T_test2: t ≤ T_test2
Values generated by RepProcessors for t ≤ T_test1 may differ between these horizons, requiring snapshots ("versions") of the data as seen up to each timepoint. Efficient management of these versions is essential for performance and memory usage.
PidDynamicRec addresses these challenges by enabling efficient versioning.
Version Numbering
Version 0: The original data from the repository.
Versions 1+: Created and managed using PidDynamicRec tools, typically corresponding to different prediction timepoints.
Initializing a PidDynamicRec
PidDynamicRec inherits from PidRec and is initialized similarly, with the addition of specifying the number of versions to maintain (usually matching the number of prediction timepoints). Use the set_n_versions() method:
// Set a version to new dataintset_version_data(intsid,intversion,void*datap,intlen);// Copy original data into a new versionintset_version_off_orig(intsid,intversion);// Point one version to another's dataintpoint_version_to(intsid,intv_src,intv_dst);// Remove an element from a versionintremove(intsid,intversion,intidx);// Remove from one version and place in anotherintremove(intsid,intv_in,intidx,intv_out);// Change an element in a versionintchange(intsid,intversion,intidx,void*new_elem);// Change in one version and copy to anotherintchange(intsid,intv_in,intidx,void*new_elem,intv_out);// Batch update: changes and removalsintupdate(intsid,intv_in,vector<pair<int,void*>>&changes,vector<int>&removes);// Batch update with value channelintupdate(intsid,intv_in,intval_channel,vector<pair<int,float>>&changes,vector<int>&removes);
MedRepositoryrep;intpid;// ... load repository and set pid ...PidDynamicRecpdr;intsid=rep.sigs.sid("Glucose");vector<int>sids={sid};pdr.init_from_rep(&rep,pid,sids,1);// Initialize with one extra versionUniversalSigVecusv;pdr.uget(sid,0,usv);// Get original data// Option 1: Create a new vector and set as version 1vector<SDataVal>new_glu;for(inti=0;i<usv.len;i++){if(usv.Val(i,0)>0){SDataValsdv;sdv.date=usv.Time(i,0);sdv.val=(float)((int)usv.Val(i,0));new_glu.push_back(sdv);}}pdr.set_version_data(sid,1,&new_glu[0],(int)new_glu.size());// Option 2: Batch updatevector<pair<int,float>>changes;vector<int>removes;for(inti=0;i<usv.len;i++){if(usv.Val(i,0)>0){changes.push_back({i,(float)((int)usv.Val(i,0))});}else{removes.push_back(i);}}pdr.update(sid,1,0,changes,removes);// Read modified versionpdr.uget(sid,1,usv);// usv now contains the cleaned data
Efficiency: Version Pointing and Iteration
Often, multiple versions are identical (e.g., when only considering data up to a given timepoint). To optimize, versions can "point" to the same data, only splitting when changes are needed.
Mechanisms:
1. Version Pointing: A version can reference another's data, splitting only when modifications occur.
2. Iterators: Use iterators to process blocks of versions sharing the same data.
classPidDynamicRec:publicPidRec{// Point one version to anotherintpoint_version_to(intsid,intv_src,intv_dst);// Check if two versions share dataintversions_are_the_same(intsid,intv1,intv2);intversions_are_the_same(set<int>sids,intv1,intv2);// ...};// Iterator for blocks of identical versionsclassdifferentVersionsIterator:publicversionIterator{intjVersion;intinit();intnext();booldone(){returniVersion<0;}inlineintblock_first(){returnjVersion+1;}inlineintblock_last(){returniVersion;}};
_apply(PidDynamicRec&rec,vector<int>&time_points,...){differentVersionsIteratorvit(rec,reqSignalIds);for(intiver=vit.init();!vit.done();iver=vit.next()){// Process versions from vit.block_first() to vit.block_last()// All versions in this block point to the same data, optimizing performance}}