-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Hello,
When reading a table that is split in multiple parquet files, duckdb supports wildcards like this :
SELECT *
'test/*.parquet';
This can be achieved with tbl() and double quoting like this :
tbl(con, "'test/*.parquet'")
But it doesn't work with tbl_file, because tbl_file checks if the files exist with file.exist(), which doesn't work with wildcards out of the box.
Source
The common approach to wildcard expansion with R base would be to wrap the file.exist() in Sys.glob like this :
Sys.glob(file.exists(path))
Which yields a vector of booleans values for each path found after expanding the wildcard.
I believe that combining this with any() would update your check while keeping it's intent :
if (!any(Sys.glob(file.exists(path)))) {
stop("File '", path, "' not found", call. = FALSE)
}
Would you agree to change the behaviour of tbl_file to allow wildcards ?
I can submit a PR for it.
Motivation : I like what you are doing with tbl_file and tbl_function for safer tbl() alternatives, but currently cannot recommend tbl_file at my company because we have a lot of folder-like tables.
Thank you for your time and have a great day