Skip to content

tbl_file() doesn't support wildcards #1592

@rplsmn

Description

@rplsmn

Hello,

When reading a table that is split in multiple parquet files, duckdb supports wildcards like this :

SELECT * 
'test/*.parquet';

source

This can be achieved with tbl() and double quoting like this :

tbl(con, "'test/*.parquet'")

But it doesn't work with tbl_file, because tbl_file checks if the files exist with file.exist(), which doesn't work with wildcards out of the box.
Source

The common approach to wildcard expansion with R base would be to wrap the file.exist() in Sys.glob like this :

Sys.glob(file.exists(path))

Which yields a vector of booleans values for each path found after expanding the wildcard.
I believe that combining this with any() would update your check while keeping it's intent :

if (!any(Sys.glob(file.exists(path)))) {
    stop("File '", path, "' not found", call. = FALSE)
  }

Would you agree to change the behaviour of tbl_file to allow wildcards ?
I can submit a PR for it.

Motivation : I like what you are doing with tbl_file and tbl_function for safer tbl() alternatives, but currently cannot recommend tbl_file at my company because we have a lot of folder-like tables.

Thank you for your time and have a great day

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions