Create a time-varying dataset

time_varying(
  x,
  specs,
  exposure,
  ...,
  grid.only = FALSE,
  time_units = c("days", "seconds"),
  id = "pat_id",
  sort = NA,
  n_cores = parallelly::availableCores(omit = 1)
)

check_tv_data(x, time_units, id, sort)

check_tv_exposure(x, expected_ids, time_units, id, ..., check_overlap = TRUE)

check_tv_specs(specs, expected_features = NULL)

Arguments

x

A data.frame with four columns: <id>, "feature", "datetime", "value"

specs

a data.frame with four columns: "feature", "use_for_grid", "lookback_start", "lookback_end", "aggregation". See details below.

exposure

a data.frame with (at least) three columns: <id>, "exposure_start", "exposure_stop"

...

Other arguments. Currently just passes check_overlap.

grid.only

Should just the grid be computed and returned? Useful only for debugging

time_units

What time units should be used? Seconds or days

id

The id to use. Default is "pat_id"

sort

Logical, indicating whether to sort the data before performing the analysis. By default (NA), sorting is only done when useful (that is: x$datetime is a POSIXct and time_units == "days"). A warning is issued when x$datetime is a Date to make the user aware that the input ought to be sorted to get the right answer.

n_cores

Number of cores to use. If slurm is being used, it checks the SLURM_CPUS_PER_TASK variable. Else it defaults to 1, for no parallelization.

expected_ids

A vector of expected ids based on the data.

check_overlap

Should overlap be checked among exposure rows? A potentially costly operation, so you can opt out of it if you're really sure.

expected_features

A vector of expected features based on the data.

Value

A data.frame, with one row per grid value and one column per feature specification (plus grid columns).

Details

The defaults for specs are to use everything for the grid creation, and to set lookback_start=0, with a message in both cases. Currently supported aggregation functions include counting ("count" or "n"), last-value-carried forward ("last value" or "lvcf"), any/none ("any" or "binary"), time since ("time since" or "ts"), min/max/mean, and the special "event" (for which look backs are ignored).

The look back window begins at row_start - lookback_end and ends at row_start - lookback_start. Passing NA to either look back changes the corresponding window boundary to exposure_start.

Examples

  data(tv_example)
  time_varying(tv_example$data, tv_example$specs, tv_example$exposure,
               time_units = "days", id = "mcn")
#> x$datetime is a Date; as such, be sure that your data is sorted in descending datetime order, so that `lvcf` picks the most recent row correctly (it picks the first row it finds).
#> 
#> To silence this message, please specify `sort=TRUE` or `sort=FALSE`. Defaulting to `sort=FALSE`.
#>   mcn exposure_start exposure_stop  row_start   row_stop albumin_lvcf
#> 1   1     2022-02-01    2022-02-08 2022-02-01 2022-02-06           15
#> 2   1     2022-02-01    2022-02-08 2022-02-06 2022-02-07           NA
#> 3   1     2022-02-01    2022-02-08 2022-02-07 2022-02-08           NA
#> 4   2     2022-02-01    2022-02-08 2022-02-01 2022-02-02           NA
#> 5   2     2022-02-01    2022-02-08 2022-02-02 2022-02-03           12
#> 6   2     2022-02-01    2022-02-08 2022-02-03 2022-02-08           12
#>   albumin_ts neurosurgery_count neuro note_any neuro appointment_count
#> 1         31                  1              0                       0
#> 2         NA                  1              1                       0
#> 3         NA                  1              1                       1
#> 4         NA                  0              0                       0
#> 5          0                  0              0                       0
#> 6          1                  0              0                       0
#>   neuro appointment_ts death_event
#> 1                   NA           0
#> 2                   NA           0
#> 3                    0           0
#> 4                   NA           0
#> 5                   NA           1
#> 6                   NA           0