You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If all entries in a column are numeric, then whitespace validation will not find errors in any entries in that column. If a single entry in a column is non-numeric, then whitespace validation will work on all entries in that column. For example:
import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation
schema = Schema([
Column('col1', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
Column('col2', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()])
])
test_data = pd.read_csv(StringIO('''col1,col2
1,3
4,p
2 ,3
3, 9
1 ,3
6,2
'''))
errors = schema.validate(test_data)
for error in errors:
print(error)
The best way to help me with this would be to add a test for LeadingWhitespaceValidation or TrailingWhitespaceValidation in test/test_validation.py that currently fails for this example. Then I can very quickly write a fix for it.
I've had a look into this, and it's not exactly a bug in PandasSchema. The problem is, pd.read_csv does some automatic type conversion, and sees that, because series 1 entirely consists of integers, it should be converted into an integer series, and thus it loses the whitespace.
If you make sure that everything is parsed as a string, by setting the dtype manually, the validations will work as expected:
If all entries in a column are numeric, then whitespace validation will not find errors in any entries in that column. If a single entry in a column is non-numeric, then whitespace validation will work on all entries in that column. For example:
returns
The text was updated successfully, but these errors were encountered: