-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Description
I posted a thread here https://stackoverflow.com/questions/79763382/duckdb-read-csv-not-working-when-there-are-double-quotes-example-in-csv/79763392#79763392 and think there might be an issue while reading CSVs.
I was trying to read a CSV with duckdb_read_csv()
with this :
con <- dbConnect(duckdb())
gbif_csv = duckdb_read_csv(conn = con,
name = "test",
files ='inat_test.csv',
delim = ",",
header = TRUE)
and got this. :
Error in `duckdb_result()`:
! rapi_execute: Failed to run query
Error: Invalid Input Error: CSV Error on Line: 3467
Original Line: 4028745676,"50c9509d-22c7-4a22-a47d-8c48425ef4a7","https://www.inaturalist.org/observations/120076657","Plantae","Tracheophyta","Magnoliopsida","Apiales","Araliaceae","Aralia","Aralia nudicaulis",NA,"SPECIES","Aralia nudicaulis L.","Aralia nudicaulis",NA,"CA",NA,"Québec","PRESENT",NA,"28eb1a3f-1c15-4a95-931a-4af90ecb574d",45.44365,-74.255022,49,NA,NA,NA,NA,NA,"2022-06-03T13:07:53",3,6,2022,3037021,3037021,"HUMAN_OBSERVATION","iNaturalist","Observations","120076657",NA,"Henry ""Nick"" Robertson",2022-11-22 13:59:38,"CC_BY_NC_4_0","katenormand","katenormand",NA,NA,2025-09-08 07:19:47.44,"StillImage","COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_ID_NOT_FOUND"
Value with unterminated quote found.
Possible fixes:
* Disable the parser's strict mode (strict_mode=false) to allow reading rows that do not comply with the CSV standard.
* Enable ignore errors (ignore_errors=true) to skip this row
* Set quote to empty or to a different value (e.g., quote='')
file = a_big_CSV.csv
delimiter = , (Set By User)
quote = " (Set By User)
escape = \0 (Auto-Detected)
new_line = \n (Auto-Detected)
header = true (Set By User)
skip_rows = 0 (Auto-Detected)
comment = \0 (Auto-Detected)
strict_mode = true (Auto-Detected)
date_format = (Auto-Detected)
timestamp_format = (Auto-Detected)
null_padding = 0
sample_size = 20480
ignore_errors = false
all_varchar = 0
The Column types set by the user do not match the ones found by the sniffer.
Column at position: 0 Set type: DOUBLE Sniffed type: BIGINT
Column at position: 30 Set type: INTEGER Sniffed type: BIGINT
Column at position: 31 Set type: INTEGER Sniffed type: BIGINT
Column at position: 32 Set type: INTEGER Sniffed type: BIGINT
Column at position: 33 Set type: INTEGER Sniffed type: BIGINT
Column at position: 34 Set type: INTEGER Sniffed type: BIGINT
Column at position: 38 Set type: INTEGER Sniffed type: BIGINT
Column at position: 47 Set type: VARCHAR Sniffed type: TIMESTAMP
Run `rlang::last_trace()` to see where the error occurred.
So clearly, it seems that the double double quote is causing a problem. How could I read the CSV with duckdb_read_csv()?
However, if I use duckDB directly, it works :
cmd = 'CREATE VIEW myNiceTable AS SELECT * FROM read_csv('path_to_csv')'
DBI::dbExecute(conn = con, statement = cmd)
So when scanning a file, duckdb_read_csv()
might fail if it finds different value types?
Metadata
Metadata
Assignees
Labels
No labels