Fully support dd::...() syntax (#795).
Threshold for prudence = "thrifty" is reduced to
1000 cells when the data comes from a remote data source.
Support named arguments for dd::...()
functions.
%in% to avoid performance problems in duckdb v1.4.0._R_CHECK_THINGS_IN_OTHER_DIRS_=true.This release improves compatibility with dbplyr and DuckDB. See
vignette("duckdb") for details.
Pass functions prefixed with dd$ directly to DuckDB,
e.g., dd$ROW() will be translated as DuckDB’s
ROW() function (#658).
New as_tbl() to convert to a dbplyr tbl object
(#634, #685).
Register Ark methods for Positron’s “Variables” pane (@DavisVaughan, #661,
#678). DuckDB tibbles are no longer displayed as data frames in the
“Variables” pane due to a limitation in Positron. Use
collect() to convert them to data frames if you rely on the
viewer functionality.
Translate n_distinct() as macro with support for
na.rm = TRUE (@joakimlinde, #572, #655).
Translate coalesce().
compute() does not have a fallback, failures are
reported to the client (#637).
Implement slice_head() (#640).
Set functions like union() no longer trigger
materialization (#654, #692).
Joins no longer materialize the input data when the package is
used with methods_overwrite() or
library(duckplyr) (#641).
Correct formatting for controlled fallbacks with
Sys.setenv(DUCKPLYR_FALLBACK_INFO = TRUE).
Bump duckdb and pillar dependencies.
Use roxyglobals from CRAN rather than GitHub (@andreranza, #659).
Bring tools and patch up to date (@joakimlinde, #647).
Internal rel_to_df() needs prudence
argument (#644).
Fix sync scripts and add reproducible code (#639).
Check loadability of extensions in test (#636).
Document slice_head() as supported.
Add Posit’s ROR ID (#592).
Add vignette("duckdb") (#690).
Add experimental badge.
Verbose conflict_prefer() (#667, #684).
Typos + clarification edits to “large” vignette (@mine-cetinkaya-rundel, #665).
grep() or sub() on
CRAN.Check if extensions can be loaded before running examples and vignettes (#620).
Show source of error if data frame cannot be converted to duck frame (#614).
Correct formatting for controlled fallbacks with
Sys.setenv(DUCKPLYR_FALLBACK_INFO = TRUE)
Require duckdb >= 1.2.0 (#619).
Break this version with duckdb 2.0.0 (#623).
Separate ?compute_parquet and
?compute_csv (#610, #622).
Italicize book title in README (@wibeasley, #607).
Fix typo in filter(.by = ...) error message (@maelle, #611).
Fix link in documentation (#600, #601).
Improved support for handling large data from files and S3:
ingestion with read_parquet_duckdb() and others, and
materialization with as_duckdb_tibble(),
compute.duckplyr_df() and compute_file(). See
vignette("large") for details.
Control automatic materialization of duckplyr frames with the new
prudence argument to as_duckdb_tibble(),
duckdb_tibble(), compute.duckplyr_df() and
compute_file(). See vignette("prudence") for
details.
read_csv_duckdb() and others, deprecating
duckplyr_df_from_csv() and df_from_csv()
(#210, #396, #459).
read_sql_duckdb() (experimental) to run SQL queries
against the default DuckDB connection and return the result as a
duckplyr frame (duckdb/duckdb-r#32, #397).
db_exec() to execute configuration queries against
the default duckdb connection (#39, #165, #227, #404, #459).
duckdb_tibble() (#382, #457).
as_duckdb_tibble(), replaces
as_duckplyr_tibble() and as_duckplyr_df()
(#383, #457) and supports dbplyr connections to a duckdb database (#86,
#211, #226).
compute_parquet() and compute_csv(),
implement compute.duckplyr_df() (#409, #430).
fallback_config() to create a configuration file for
the settings that do not affect behavior (#216, #426).
is_duckdb_tibble(), deprecates
is_duckplyr_df() (#391, #392).
last_rel() to retrieve the last relation object used
in materialization (#209, #375).
Add "prudent_duckplyr_df" class that stops automatic
materialization and requires collect() (#381,
#390).
Partial support for across() in
mutate() and summarise() (#296, #306, #318,
@lionel-, @DavisVaughan).
Implement na.rm handling for sum(),
min(), max(), any() and
all(), with fallback for window functions (#205,
#566).
Add support for sub() and gsub() (@toppyy, #420).
Handle dplyr::desc() (#550).
Avoid forwarding is.na() to is.nan() to
support non-numeric data, avoid checking roundtrip for timestamp data
(#482).
Correctly handle missing values in
if_else().
Limit number of items that can be handled with %in%
(#319).
duckdb_tibble() checks if columns can be represented
in DuckDB (#537).
Fall back to dplyr when passing multiple with joins
(#323).
Improve fallback error message by explicitly materializing (#432, #456).
Point to the native CSV reader if encountering data frames read with readr (#127, #469).
Improve as_duckdb_tibble() error message for invalid
x (@maelle, #339).
Depend on dplyr instead of reexporting all generics (#405). Nothing changes for users in scripts. When using duckplyr in a package, you now also need to import dplyr.
Fallback logging is now on by default, can be disabled with configuration (#422).
The default DuckDB connection is now based on a file, the
location defaults to a subdirectory of tempdir() and can be
controlled with the DUCKPLYR_TEMP_DIR environment variable
(#439, #448, #561).
collect() returns a tibble (#438, #447).
explain() returns the input, invisibly
(#331).
Compute ptype only for join columns in a safe way without materialization, not for the entire data frame (#289).
Internal expr_scrub() (used for telemetry) can
handle function-definitions (@toppyy, #268, #271).
Harden telemetry code against invalid arguments (#321).
New articles: vignette("large"),
vignette("prudence"), vignette("fallback"),
vignette("limits"), vignette("developers"),
vignette("telemetry") (#207, #504).
New flights_df() used instead of
palmerpenguins::penguins (#408).
Move to the tidyverse GitHub organization, new repository URL https://github.com/tidyverse/duckplyr/ (#225).
Avoid base pipe in examples for compatibility with R 4.0.0 (#463, #466).
Comparison expressions are translated in a way that allows them to be pushed down to Parquet (@toppyy, #270).
Printing a duckplyr frame no longer materializes (#255, #378).
Prefer vctrs::new_data_frame() over
tibble() (#500).
df_from_file() and related functions support multiple
files (#194, #195), show a clear error message for non-string
path arguments (#182), and create a tibble by default
(#177).as_duckplyr_tibble() to convert a data frame to a
duckplyr tibble (#177).?df_from_file shows how to read multiple files (#181,
#186) and how to specify CSV column types (#140, #189), and is shown
correctly in reference index (#173, #190).as.integer(),
NA and %in% (#83, #154, #148, #155, #159,
#160).library(duckplyr) calls
methods_overwrite() (#164).grepl().intersect(),
setdiff(), symdiff(), union(),
and union_all() (#169).NA and those used in an
expression (#157).head(-1) forwards to the default implementation (#131,
#156).left_join() and other join functions call
auto_copy().row_number() returns integer.is.na(NaN) is TRUE.summarise(count = n(), count = n()) creates only one
column named count.?df_from_file (@andreranza, #133, #134).vec_ptype() does not materialize (#149).expect_identical() to
capture differences between doubles and integers.df_to_parquet() to write to Parquet, new
convenience functions df_from_csv(),
duckdb_df_from_csv(), df_from_parquet() and
duckdb_df_from_parquet() (#87, #89, #96, #128).summarise()
(#72, #106).summarise() no longer restores subclass.log10() and
log().fallback_sitrep() and related functionality for
collecting telemetry data (#102, #107, #110, #111, #115). No data is
collected by default, only a message is displayed once per session and
then every eight hours. Opt in or opt out by setting environment
variables.group_by() and other methods to collect
fallback information (#94, #104, #105).suppressWarnings() as the identity
function.cli::cli_abort() over stop() or
rlang::abort() (#114)..data$a and .env$a.integer, numeric, logical,
Date, POSIXct, and difftime for
now.DUCKPLYR_METHODS_OVERWRITE
is set to TRUE, loading duckplyr automatically calls
methods_overwrite().log() and
log10().methods_overwrite() and methods_restore()
show a message.grepl(x = NA) gives correct results.auto_copy() for non-data-frame input.distinct() now preserves order in corner cases (#77,
#78).log(0) and
log(-1) (#75, #76).mutate() that are actually
representable in duckdb (#73).ifelse(), support
if_else() (#79).dplyr_reconstruct() method (#48).meta_replay().arrange() in case of ties.slice_sample(), not
sample_n() or sample_frac() (#74).IS NOT DISTINCT FROM for faster execution
(duckdb/duckdb-r#41, #68).summarise() keeps "duckplyr_df" class
(#63, #64).
Fix compatibility with duckdb >= 0.9.1.
Skip tests that give different output on dev tidyselect.
Import utils::globalVariables().
Small README improvements (@maelle, #34, #57).
Fix 301 in README.
Improve documentation.
Work around problem with dplyr_reconstruct() in R
4.3.
Rename duckdb_from_file() to
df_from_file().
Unexport private duckdb_rel_from_df(),
rel_from_df(), wrap_df() and
wrap_integer().
Reexport %>% and tibble().
R CMD check.relexpr_window() for now.Initial version, exporting: - new_relational() to
construct objects of class "relational" - Generics
rel_aggregate(), rel_distinct(),
rel_filter(), rel_join(),
rel_limit(), rel_names(),
rel_order(), rel_project(),
rel_set_diff(), rel_set_intersect(),
rel_set_symdiff(), rel_to_df(),
rel_union_all() - new_relexpr() to construct
objects of class "relational_relexpr" - Expression builders
relexpr_constant(), relexpr_function(),
relexpr_reference(), relexpr_set_alias(),
relexpr_window()