Files w/ quoted values that have commas throw excetion 

**Describe the bug**
  File contains quoted numbder "2,126,000,000".... 
  Throws off index alignment between types extracted in headers and data....

  File "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/venv/lib/python3.10/site-packages/csv_schema_inference/csv_schema_inference.py", line 397, in run_inference
    schemas_result = prl.parallel(records = lines,obj=dtype, d_schema = self.__schema)
  File "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/venv/lib/python3.10/site-packages/csv_schema_inference/csv_schema_inference.py", line 165, in parallel
    return [p.get() for p in results]
  File "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/venv/lib/python3.10/site-packages/csv_schema_inference/csv_schema_inference.py", line 165, in <listcomp>
    return [p.get() for p in results]

**To Reproduce**
Steps to reproduce the behavior:
1. See example below...
"id","country","year","sex","age","suicides_no","population","country-year","HDI for year"," gdp_for_year","gdp_per_capita","generation"
0,"Albania",1987,"male","15-24 years",21,312900,"Albania1987",,"2,156,624,900",796,"Generation X"
1,"Albania",1987,"male","35-54 years",16,308000,"Albania1987",,"2,156,624,900",796,"Silent"
2,"Albania",1987,"female","15-24 years",14,289700,"Albania1987",,"2,156,624,900",796,"Generation X"
3,"Albania",1987,"male","75+ years",1,21800,"Albania1987",,"2,156,624,900",796,"G.I. Generation"
4,"Albania",1987,"male","25-34 years",9,274300,"Albania1987",,"2,156,624,900",796,"Boomers"
5,"Albania",1987,"female","75+ years",1,35600,"Albania1987",,"2,156,624,900",796,"G.I. Generation"

2. See code below...
from multiprocessing import freeze_support, Process
from csv_schema_inference import csv_schema_inference

def main():
  #if the inferred data type is INTEGER and there is a presence of FLOAT on the results , then the result will be FLOAT
  conditions = {"INTEGER":"FLOAT"}
  pathfile = "/home/greg/prj/sdspop/ingest/workflows/schema-on-read/suicide_data.csv"

  csv_infer = csv_schema_inference.CsvSchemaInference(portion=0.9, max_length=100, batch_size = 200000, acc = 0.8, seed=2, header=True, sep=",", conditions = conditions)
  aprox_schema = csv_infer.run_inference(pathfile)
  csv_infer.pretty(aprox_schema)

if __name__ == '__main__':
    freeze_support()
    Process(target=main).start()

**Expected behavior**
 Should have made it to some kind of schema inference.
  e.g. 
0
	name
		Username; Identifier;One-time password;Recovery code;First name;Last name;Department;Location
	type
		STRING
	nullable
		False
....

**Desktop (please complete the following information):**
 - OS: Ubuntu 22.04 and Python 3.10.12
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files w/ quoted values that have commas throw excetion #38

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Files w/ quoted values that have commas throw excetion #38

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions