What exactly does tv::time_varying() do?

Given data X, specs, and exposures

  • For every patient or row e in the exposures:

    • Let X = filter the data X to current patient

    • construct the “grid”:

      • Let f = the features from specs with use_for_grid=TRUE

      • Let grid_times = the unique datetimes in X with features in f and datetime between e$exposure_start and e$exposure_stop

    • The grid is now a one-row-per-break dataset, with the first break at e$exposure_start and the last break before e$exposure_end

    • for grid period g in grid:

      • for row s in specs:

        • Let xx = filter data X to X$feature == s$feature and X$datetime in the interval (g$row_start - s$lookback_end, g$row_start - s$lookback_start)

        • perform the aggregation s$aggregation on xx

FAQ

1. Does tv really loop over every single feature for every row in the grid independently?

Yes. This is a lot of looping and a good reason to use more than one core.

2. Does tv require any exposure history to get the counts or time-since right?

No. Each row is considered independently.

3. The look back in the specs is relative to what point in time?

Usually the current grid row start time.

4. Can I get the grid returned to me so I can see what it’s doing?

As of version 1.7.0 you can; use tv::time_varying(grid.only = TRUE).

5. Why is my tv on prospective patients really slow?

Probably you misunderstand the exposure dataset. You really only want the current exposure in the grid with no other breakpoints, so set your exposure start to (e.g.) the current time and the exposure end to (e.g.) the current time plus one second.

An example

Another example of how to use time-varying. Let’s say we want to break every 6 hours. Just add that as a feature. Here we give it a count with infinite look back, to count the 6-hour period we’re in. We also want to include the endpoint, which we encode with aggregation “event”.

library(tv)
library(tibble)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
data <- tribble(
  ~ pat_id, ~ feature, ~ datetime, ~ value,
  1, "lactate", "2021-12-31 23:00:00", 9,
  1, "lactate", "2022-01-01 03:41:00", 10,
  1, "lactate", "2022-01-01 07:00:00", 11,
  1, "blood pressure", "2022-01-01 02:00:00", 120,
  1, "blood pressure", "2022-01-01 04:00:00", 115,
  1, "blood pressure", "2022-01-01 06:00:00", 118,
  1, "6-hour", "2022-01-01 00:00:00", NA_real_,
  1, "6-hour", "2022-01-01 06:00:00", NA_real_,
  1, "6-hour", "2022-01-01 12:00:00", NA_real_,
  1, "6-hour", "2022-01-01 18:00:00", NA_real_,
  1, "event", "2022-01-01 08:00:00", NA_real_,
  1, "event", "2022-01-01 13:00:00", NA_real_
) %>%
  mutate(datetime = as_datetime(datetime))

specs <- tribble(
  ~ feature, ~ use_for_grid, ~ lookback_start, ~ lookback_end, ~ aggregation,
  "lactate", TRUE, 0, Inf, "ts",
  "lactate", TRUE, 0, Inf, "lvcf",
  "blood pressure", FALSE, 0, 7200, "ts", # two hours
  "blood pressure", FALSE, 0, 7200, "lvcf", # two hours
  "6-hour", TRUE, 0, Inf, "n",
  "event", TRUE, 0, 0, "event"
)

exposure <- tibble(
  pat_id = 1,
  encounter = 1:2, # optional id
  exposure_start = as_datetime(c("2022-01-01 00:00:00", "2022-01-01 08:00:00")),
  exposure_stop = as_datetime(c("2022-01-01 08:00:00", "2022-01-01 13:00:00")),
)

time_varying(data, specs, exposure = exposure, time_units = "seconds", n_cores = 1) %>%
  arrange(pat_id, row_start)
#>   pat_id encounter      exposure_start       exposure_stop           row_start
#> 1      1         1 2022-01-01 00:00:00 2022-01-01 08:00:00 2022-01-01 00:00:00
#> 2      1         1 2022-01-01 00:00:00 2022-01-01 08:00:00 2022-01-01 03:41:00
#> 3      1         1 2022-01-01 00:00:00 2022-01-01 08:00:00 2022-01-01 06:00:00
#> 4      1         1 2022-01-01 00:00:00 2022-01-01 08:00:00 2022-01-01 07:00:00
#> 5      1         2 2022-01-01 08:00:00 2022-01-01 13:00:00 2022-01-01 08:00:00
#> 6      1         2 2022-01-01 08:00:00 2022-01-01 13:00:00 2022-01-01 12:00:00
#>              row_stop lactate_ts lactate_lvcf blood pressure_ts
#> 1 2022-01-01 03:41:00       3600            9                NA
#> 2 2022-01-01 06:00:00          0           10              6060
#> 3 2022-01-01 07:00:00       8340           10                 0
#> 4 2022-01-01 08:00:00          0           11              3600
#> 5 2022-01-01 12:00:00       3600           11              7200
#> 6 2022-01-01 13:00:00      18000           11                NA
#>   blood pressure_lvcf 6-hour_count event_event
#> 1                  NA            1           0
#> 2                 120            1           0
#> 3                 118            2           0
#> 4                 118            2           1
#> 5                 118            2           0
#> 6                  NA            3           1

Note that the lactate lab from 2021 does not contribute to a new row because it is not inside the exposure window. Note also that the look back is ignored for aggregation of type “event”.

Overclocking the time_varying() function

What if a variable doesn’t change over the exposure?

Run two tv’s and merge the results: one for static variables and one for dynamic variables.

Multiple exposures per person

If there are multiple encounters per person, you can just add another row to the exposure data, and tag it with another id column which gets carried forward by time_varying().

Only look back to the beginning of the exposure instead of a fixed look back

Simply pass the special value NA in the “lookback_end” column.

Only look at things from before the exposure instead of a fixed look back

Simply pass the special value NA in the “lookback_start” column. You’ll probably also want something like “lookback_end = Inf”.

Only break every hour

Set use_for_grid=FALSE for everything except an “hourly” feature. Then calculate every hour that the patient is at risk; set “feature” to “hourly”, set “datetime” to the time stamp, and set “value” to hour(datetime).

Create an indicator for if a situation applies

Start with a dataset of start time and end time. IMPORTANT: be sure no intervals overlap; if they do, the next section would be better suited for you. Otherwise, the pseudocode below should do the trick:

data %>% 
  tidyr::pivot_longer(c(start_time, end_time), names_to = "which_time", values_to = "datetime") %>% 
  dplyr::mutate(
    value = +(which_time == "start_time")
  )

Then make the specs a “lvcf” with infinite look back.

Create a count of how many times a situation applies

Start with a dataset of start time and end time. The pseudocode below should do the trick:

data %>% 
  tidyr::pivot_longer(c(start_time, end_time), names_to = "which_time", values_to = "datetime") %>% 
  dplyr::arrange(pat_id, datetime) %>% 
  dplyr::group_by(pat_id) %>% 
  dplyr::mutate(
    value = cumsum(ifelse(which_time == "start_time", 1, -1))
  ) %>% 
  dplyr::ungroup()

Then make the specs a “lvcf” with infinite look back.