tidier package provides ‘Apache Spark’ style window
aggregation for R dataframes and remote dbplyr tbls via ‘mutate’ in
‘dplyr’
flavour.
Create a new column with average temp over last seven days in the same month.
set.seed(101)
airquality |>
  # create date column
  dplyr::mutate(date_col = lubridate::make_date(1973, Month, Day)) |>
  # create gaps by removing some days
  dplyr::slice_sample(prop = 0.8) |> 
  # compute mean temperature over last seven days in the same month
  tidier::mutate(avg_temp_over_last_week = mean(Temp, na.rm = TRUE),
                 .order_by = Day,
                 .by       = Month,
                 .frame    = c(lubridate::days(7), # 7 days before current row
                               lubridate::days(-1) # do not include current row
                               ),
                 .index    = date_col
                 )
#> # A tibble: 122 × 8
#>    Month Ozone Solar.R  Wind  Temp   Day date_col   avg_temp_over_last_week
#>    <int> <int>   <int> <dbl> <int> <int> <date>                       <dbl>
#>  1     6    NA     286   8.6    78     1 1973-06-01                   NaN  
#>  2     6    NA     242  16.1    67     3 1973-06-03                    78  
#>  3     6    NA     186   9.2    84     4 1973-06-04                    72.5
#>  4     6    NA     264  14.3    79     6 1973-06-06                    76.3
#>  5     6    29     127   9.7    82     7 1973-06-07                    77  
#>  6     6    NA     273   6.9    87     8 1973-06-08                    78  
#>  7     6    NA     259  10.9    93    11 1973-06-11                    83  
#>  8     6    NA     250   9.2    92    12 1973-06-12                    85.2
#>  9     6    23     148   8      82    13 1973-06-13                    86.6
#> 10     6    NA     332  13.8    80    14 1973-06-14                    87.2
#> # ℹ 112 more rowsmutate supports
.by (group by),.order_by (order by),.frame (endpoints of window frame),.index (identify index column like date column, in df
version only),.complete (whether to compute over incomplete window,
in df version only).mutate automatically uses a future backend (via furrr, in df
version only).This implementation is inspired by Apache Spark’s windowSpec
class with rangeBetween
and rowsBetween.
dbplyr
implements this via dbplyr::win_over
enabling sparklyr
users to write window computations. Also see, dbplyr::window_order/dbplyr::window_frame.
tidier’s mutate wraps this functionality via
uniform syntax across dataframes and remote tbls.
tidypyspark
python package implements mutate style window computation
API for pyspark.
remotes::install_github("talegari/tidier")install.packages("tidier")tidier package is deeply indebted to three amazing
packages and people behind it.
Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A
Grammar of Data Manipulation_. R package version 1.1.0,
<https://CRAN.R-project.org/package=dplyr>.
Vaughan D (2021). _slider: Sliding Window Functions_. R package
version 0.2.2, <https://CRAN.R-project.org/package=slider>.
Wickham H, Girlich M, Ruiz E (2023). _dbplyr: A 'dplyr' Back End
  for Databases_. R package version 2.3.2,
  <https://CRAN.R-project.org/package=dbplyr>.