MedCohort
MedCohort is a data structure with helpers to deal with a cohort, a list of individuals with (dated) outcomes and followup times.
MedCohort contatins a vector of basic records (CohortRec), each representing a single period for a specific id (with a corresponding outcome) information. A MedCohort can be sampled to generate MedSamples files according to SamplingParams using one of two fuctions:
- int create_sampling_file(SamplingParams &s_params, string out_sample_file) : Generate samples within cohort times that fit SampleingParams criteria and windows. Sample dates are selected randomly for each window of s_params.jump_days in the legal period.
- int create_sampling_file_sticked(SamplingParams &s_params, string out_sample_file) : Generate samples within cohort times that fit SampleingParams criteria and windows. Sample dates are those with the required signals for each window of s_params.jump_days in the legal period (if existing). A MedCohort can also be used to estimate the age and gender dependent incidence rate. Estimation is done using the following function which according to IncidenceParams:
- int create_incidence_file(IncidenceParams &i_params, string out_file) : Generate an incidence file from cohort + incidence-params. Check all patient-years within cohort that fit IncidenceParams and count positive outcomes within the incidence_years_window. IncidenceParams initialization:
Parameter Name | Description | Default Value |
---|---|---|
incidence_years_window | how many years ahead do we consider an outcome? | 1 |
rep | Repository configration file | None |
from_year | first year to consider in calculating incidence | 2007 |
to_year | last year to consider in calculating incidence | 2013 |
gender_mask | mask for gender specification (rightmost bit on for male, second for female) | 0x3 |
train_mask | mask for TRAIN-value specification (three rightmost bits for TRAIN = 1,2,3) | 0x7 |
from_age | minimal age to consider | 30 |
to_age | maximal age to consider | 90 |
age_bin | binning of ages | 5 |
min_samples_in_bin | minimal required samples to estimate incidence per bin | 20 |
SamplingParams initialization:
Parameter Name | Description | Default Value |
---|---|---|
is_continous | continous mode of sampling vs. stick to signal (0 = stick) | 1 |
stick_to, stick_to_sigs | comma separated list of signals required at sampling times | None |
take_all | in 'stick' mode - take all samples with requrired-signal within each sampling period is selected | 0 |
take_closest | in 'stick' mode - take the sample with requrired-signals that is closest to each target sampling-date if none of take_all and take_closest is given, a random sample with requrired-signal within each sampling period is selected |
0 |
rep | Repository configration file | None |
min_age | minimum age for sampling | 0 |
max_age | maximum age for sampling | 200 |
gender_mask | mask for gender specification (rightmost bit on for male, second for female) | 0x3 |
train_mask | mask for TRAIN-value specification (three rightmost bits for TRAIN = 1,2,3) | 0x7 |
min_year | first year for sampling | 1900 |
max_year | last year for sampling | 2100 |
jump_days | days to jump between sampling periods | 180 |
min_days, min_days_from_outcome | minimal number of days before outcome for sampling | 30 |
min_case, min_case_years | minimal number of years before outcome for cases | 0 |
max_case, max_case_years | maximal number of years before outcome for cases | 1 |
min_control, min_control_years | minimal number of years before outcome for controls | 0 |
max_control, max_control_years | maximal number of years before outcome for controls | 10 |