Skip to content

duckdb_read_csv not working when reading CSV #1511

@beausoleilmo

Description

@beausoleilmo

I posted a thread here https://stackoverflow.com/questions/79763382/duckdb-read-csv-not-working-when-there-are-double-quotes-example-in-csv/79763392#79763392 and think there might be an issue while reading CSVs.

I was trying to read a CSV with duckdb_read_csv() with this :

con <- dbConnect(duckdb())
gbif_csv = duckdb_read_csv(conn = con,
                           name = "test",
                           files ='inat_test.csv',
                           delim = ",",
                           header = TRUE)

and got this. :

Error in `duckdb_result()`:
  ! rapi_execute: Failed to run query
Error: Invalid Input Error: CSV Error on Line: 3467
Original Line: 4028745676,"50c9509d-22c7-4a22-a47d-8c48425ef4a7","https://www.inaturalist.org/observations/120076657","Plantae","Tracheophyta","Magnoliopsida","Apiales","Araliaceae","Aralia","Aralia nudicaulis",NA,"SPECIES","Aralia nudicaulis L.","Aralia nudicaulis",NA,"CA",NA,"Québec","PRESENT",NA,"28eb1a3f-1c15-4a95-931a-4af90ecb574d",45.44365,-74.255022,49,NA,NA,NA,NA,NA,"2022-06-03T13:07:53",3,6,2022,3037021,3037021,"HUMAN_OBSERVATION","iNaturalist","Observations","120076657",NA,"Henry ""Nick"" Robertson",2022-11-22 13:59:38,"CC_BY_NC_4_0","katenormand","katenormand",NA,NA,2025-09-08 07:19:47.44,"StillImage","COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_ID_NOT_FOUND"
Value with unterminated quote found.

Possible fixes:
  * Disable the parser's strict mode (strict_mode=false) to allow reading rows that do not comply with the CSV standard.
* Enable ignore errors (ignore_errors=true) to skip this row
* Set quote to empty or to a different value (e.g., quote='')

  file = a_big_CSV.csv
  delimiter = , (Set By User)
  quote = " (Set By User)
  escape = \0 (Auto-Detected)
  new_line = \n (Auto-Detected)
  header = true (Set By User)
  skip_rows = 0 (Auto-Detected)
  comment = \0 (Auto-Detected)
  strict_mode = true (Auto-Detected)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding = 0
  sample_size = 20480
  ignore_errors = false
  all_varchar = 0
The Column types set by the user do not match the ones found by the sniffer. 
Column at position: 0 Set type: DOUBLE Sniffed type: BIGINT
Column at position: 30 Set type: INTEGER Sniffed type: BIGINT
Column at position: 31 Set type: INTEGER Sniffed type: BIGINT
Column at position: 32 Set type: INTEGER Sniffed type: BIGINT
Column at position: 33 Set type: INTEGER Sniffed type: BIGINT
Column at position: 34 Set type: INTEGER Sniffed type: BIGINT
Column at position: 38 Set type: INTEGER Sniffed type: BIGINT
Column at position: 47 Set type: VARCHAR Sniffed type: TIMESTAMP

Run `rlang::last_trace()` to see where the error occurred.
So clearly, it seems that the double double quote is causing a problem. How could I read the CSV with duckdb_read_csv()?

However, if I use duckDB directly, it works :

cmd = 'CREATE VIEW myNiceTable AS SELECT * FROM read_csv('path_to_csv')'
DBI::dbExecute(conn = con, statement = cmd) 

So when scanning a file, duckdb_read_csv() might fail if it finds different value types?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions