tv.Rmd
tv::time_varying()
do?
Given data X, specs, and exposures
For every patient or row e in the exposures:
Let X = filter the data X to current patient
construct the “grid”:
Let f = the features from specs with
use_for_grid=TRUE
Let grid_times = the unique datetimes in X with features in f and datetime between e$exposure_start and e$exposure_stop
The grid is now a one-row-per-break dataset, with the first break at e$exposure_start and the last break before e$exposure_end
for grid period g in grid:
for row s in specs:
Let xx = filter data X to X$feature == s$feature and X$datetime in the interval (g$row_start - s$lookback_end, g$row_start - s$lookback_start)
perform the aggregation s$aggregation on xx
tv
really loop over every single feature for
every row in the grid independently?
Yes. This is a lot of looping and a good reason to use more than one core.
tv
require any exposure history to get the
counts or time-since right?
No. Each row is considered independently.
Usually the current grid row start time.
As of version 1.7.0
you can; use
tv::time_varying(grid.only = TRUE)
.
tv
on prospective patients really
slow?
Probably you misunderstand the exposure dataset. You really only want the current exposure in the grid with no other breakpoints, so set your exposure start to (e.g.) the current time and the exposure end to (e.g.) the current time plus one second.
Another example of how to use time-varying. Let’s say we want to break every 6 hours. Just add that as a feature. Here we give it a count with infinite look back, to count the 6-hour period we’re in. We also want to include the endpoint, which we encode with aggregation “event”.
library(tv)
library(tibble)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
data <- tribble(
~ pat_id, ~ feature, ~ datetime, ~ value,
1, "lactate", "2021-12-31 23:00:00", 9,
1, "lactate", "2022-01-01 03:41:00", 10,
1, "lactate", "2022-01-01 07:00:00", 11,
1, "blood pressure", "2022-01-01 02:00:00", 120,
1, "blood pressure", "2022-01-01 04:00:00", 115,
1, "blood pressure", "2022-01-01 06:00:00", 118,
1, "6-hour", "2022-01-01 00:00:00", NA_real_,
1, "6-hour", "2022-01-01 06:00:00", NA_real_,
1, "6-hour", "2022-01-01 12:00:00", NA_real_,
1, "6-hour", "2022-01-01 18:00:00", NA_real_,
1, "event", "2022-01-01 08:00:00", NA_real_,
1, "event", "2022-01-01 13:00:00", NA_real_
) %>%
mutate(datetime = as_datetime(datetime))
specs <- tribble(
~ feature, ~ use_for_grid, ~ lookback_start, ~ lookback_end, ~ aggregation,
"lactate", TRUE, 0, Inf, "ts",
"lactate", TRUE, 0, Inf, "lvcf",
"blood pressure", FALSE, 0, 7200, "ts", # two hours
"blood pressure", FALSE, 0, 7200, "lvcf", # two hours
"6-hour", TRUE, 0, Inf, "n",
"event", TRUE, 0, 0, "event"
)
exposure <- tibble(
pat_id = 1,
encounter = 1:2, # optional id
exposure_start = as_datetime(c("2022-01-01 00:00:00", "2022-01-01 08:00:00")),
exposure_stop = as_datetime(c("2022-01-01 08:00:00", "2022-01-01 13:00:00")),
)
time_varying(data, specs, exposure = exposure, time_units = "seconds", n_cores = 1) %>%
arrange(pat_id, row_start)
#> pat_id encounter exposure_start exposure_stop row_start
#> 1 1 1 2022-01-01 00:00:00 2022-01-01 08:00:00 2022-01-01 00:00:00
#> 2 1 1 2022-01-01 00:00:00 2022-01-01 08:00:00 2022-01-01 03:41:00
#> 3 1 1 2022-01-01 00:00:00 2022-01-01 08:00:00 2022-01-01 06:00:00
#> 4 1 1 2022-01-01 00:00:00 2022-01-01 08:00:00 2022-01-01 07:00:00
#> 5 1 2 2022-01-01 08:00:00 2022-01-01 13:00:00 2022-01-01 08:00:00
#> 6 1 2 2022-01-01 08:00:00 2022-01-01 13:00:00 2022-01-01 12:00:00
#> row_stop lactate_ts lactate_lvcf blood pressure_ts
#> 1 2022-01-01 03:41:00 3600 9 NA
#> 2 2022-01-01 06:00:00 0 10 6060
#> 3 2022-01-01 07:00:00 8340 10 0
#> 4 2022-01-01 08:00:00 0 11 3600
#> 5 2022-01-01 12:00:00 3600 11 7200
#> 6 2022-01-01 13:00:00 18000 11 NA
#> blood pressure_lvcf 6-hour_count event_event
#> 1 NA 1 0
#> 2 120 1 0
#> 3 118 2 0
#> 4 118 2 1
#> 5 118 2 0
#> 6 NA 3 1
Note that the lactate lab from 2021 does not contribute to a new row because it is not inside the exposure window. Note also that the look back is ignored for aggregation of type “event”.
time_varying()
function
Run two tv’s and merge the results: one for static variables and one for dynamic variables.
If there are multiple encounters per person, you can just add another
row to the exposure data, and tag it with another id column which gets
carried forward by time_varying()
.
Simply pass the special value NA in the “lookback_end” column.
Simply pass the special value NA in the “lookback_start” column. You’ll probably also want something like “lookback_end = Inf”.
Set use_for_grid=FALSE
for everything except an “hourly”
feature. Then calculate every hour that the patient is at risk; set
“feature” to “hourly”, set “datetime” to the time stamp, and set “value”
to hour(datetime)
.
Start with a dataset of start time and end time. IMPORTANT: be sure no intervals overlap; if they do, the next section would be better suited for you. Otherwise, the pseudocode below should do the trick:
data %>%
tidyr::pivot_longer(c(start_time, end_time), names_to = "which_time", values_to = "datetime") %>%
dplyr::mutate(
value = +(which_time == "start_time")
)
Then make the specs a “lvcf” with infinite look back.
Start with a dataset of start time and end time. The pseudocode below should do the trick:
data %>%
tidyr::pivot_longer(c(start_time, end_time), names_to = "which_time", values_to = "datetime") %>%
dplyr::arrange(pat_id, datetime) %>%
dplyr::group_by(pat_id) %>%
dplyr::mutate(
value = cumsum(ifelse(which_time == "start_time", 1, -1))
) %>%
dplyr::ungroup()
Then make the specs a “lvcf” with infinite look back.