tv.Rd
Create a time-varying dataset
time_varying(
x,
specs,
exposure,
...,
grid.only = FALSE,
time_units = c("days", "seconds"),
id = "pat_id",
sort = NA,
n_cores = parallelly::availableCores(omit = 1)
)
check_tv_data(x, time_units, id, sort)
check_tv_exposure(x, expected_ids, time_units, id, ..., check_overlap = TRUE)
check_tv_specs(specs, expected_features = NULL)
A data.frame with four columns: <id>, "feature", "datetime", "value"
a data.frame with four columns: "feature", "use_for_grid", "lookback_start", "lookback_end", "aggregation". See details below.
a data.frame with (at least) three columns: <id>, "exposure_start", "exposure_stop"
Other arguments. Currently just passes check_overlap
.
Should just the grid be computed and returned? Useful only for debugging
What time units should be used? Seconds or days
The id to use. Default is "pat_id"
Logical, indicating whether to sort the data before performing the analysis. By default (NA
),
sorting is only done when useful (that is: x$datetime
is a POSIXct and time_units == "days"
).
A warning is issued when x$datetime
is a Date to make the user aware that the input ought to be sorted to
get the right answer.
Number of cores to use. If slurm is being used, it checks the SLURM_CPUS_PER_TASK
variable.
Else it defaults to 1, for no parallelization.
A vector of expected ids based on the data.
Should overlap be checked among exposure rows? A potentially costly operation, so you can opt out of it if you're really sure.
A vector of expected features based on the data.
A data.frame, with one row per grid value and one column per feature specification (plus grid columns).
The defaults for specs are to use everything for the grid creation, and to set lookback_start=0
, with a message in both cases.
Currently supported aggregation functions include counting ("count" or "n"), last-value-carried forward ("last value" or "lvcf"),
any/none ("any" or "binary"), time since ("time since" or "ts"), min/max/mean, and the special "event" (for which look backs are ignored).
The look back window begins at row_start - lookback_end
and ends at row_start - lookback_start
. Passing NA to either look back
changes the corresponding window boundary to exposure_start
.
data(tv_example)
time_varying(tv_example$data, tv_example$specs, tv_example$exposure,
time_units = "days", id = "mcn")
#> x$datetime is a Date; as such, be sure that your data is sorted in descending datetime order, so that `lvcf` picks the most recent row correctly (it picks the first row it finds).
#>
#> To silence this message, please specify `sort=TRUE` or `sort=FALSE`. Defaulting to `sort=FALSE`.
#> mcn exposure_start exposure_stop row_start row_stop albumin_lvcf
#> 1 1 2022-02-01 2022-02-08 2022-02-01 2022-02-06 15
#> 2 1 2022-02-01 2022-02-08 2022-02-06 2022-02-07 NA
#> 3 1 2022-02-01 2022-02-08 2022-02-07 2022-02-08 NA
#> 4 2 2022-02-01 2022-02-08 2022-02-01 2022-02-02 NA
#> 5 2 2022-02-01 2022-02-08 2022-02-02 2022-02-03 12
#> 6 2 2022-02-01 2022-02-08 2022-02-03 2022-02-08 12
#> albumin_ts neurosurgery_count neuro note_any neuro appointment_count
#> 1 31 1 0 0
#> 2 NA 1 1 0
#> 3 NA 1 1 1
#> 4 NA 0 0 0
#> 5 0 0 0 0
#> 6 1 0 0 0
#> neuro appointment_ts death_event
#> 1 NA 0
#> 2 NA 0
#> 3 0 0
#> 4 NA 0
#> 5 NA 1
#> 6 NA 0