Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrameCsvReader should ignore spaces after commas in CSV files #116

Open
olekscode opened this issue Aug 2, 2019 · 2 comments
Open

Comments

@olekscode
Copy link
Member

The following line of a CSV file:

first_name, last_name, order_date, amount

Will be parser as:

#('first_name' ' last_name' ' order_date' ' amount')

(with each string except the first one starting with a space).

Spaces after commas in CSV files should be ignored

@olekscode olekscode added the bug label Aug 2, 2019
@AtharvaKhare
Copy link
Contributor

Shouldn't user just use , (comma space) as separator?

@olekscode
Copy link
Member Author

olekscode commented Jul 26, 2021

I think the result should be trimmed from both sides.
Because sometimes even data inside the same file is inconsistent.

In most cases, CSV files have data separated by commas, TSV files separate it by tabs.
But then some people add extra spaces:

Oleks, 25, true

And others don't:

Oleks,25,true

It can be even more troublesome when there are tabs and the space can be invisible.

And then the users will be running into all kinds of problems.
For example, when ' 25' can not be parsed as a number because there is a space.

So I think that it's better to trim whitespace characters from left and right when reading from a CSV.
(but only unless quotes are used! Because if a file contains something like "Oleks", " 25 " then maybe clients want those spaces inside quotes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants